A scalable pipeline for COVID-19: the case study of
Germany, Czechia and Poland.
Wildan Abdussalam1,* , Adam Mertel1 , Kai Fan1 , Lennart Schüler1,2 ,
Weronika Schlechte-Wełnicz1 and Justin M. Calabrese1,3,4
1
  Center for Advanced Systems Understanding, Helmholtz-Zentrum Dresden-Rossendorf, Untermarkt 20, 02826 Görlitz, Germany
2
  Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research (UFZ), Permoserstraße 15, 04318 Leipzig, Germany
3
  Department of Ecological Modelling, Helmholtz Centre for Environmental Research (UFZ),Permoserstraße 15, 04318 Leipzig, Germany
4
  Department of Biology, University of Maryland, College Park MD, Maryland, USA


                                       Abstract
                                       Throughout the coronavirus disease 2019 (COVID-19) pandemic, decision makers have relied on forecasting models to
                                       determine and implement non-pharmaceutical interventions (NPI). In building the forecasting models, continuously updated
                                       datasets from various stakeholders including developers, analysts, and testers are required to provide precise predictions.
                                       Here we report the design of a scalable pipeline which serves as a data synchronization to support inter-country top-down
                                       spatiotemporal observations and forecasting models of COVID-19, named the where2test, for Germany, Czechia and Poland.
                                       We have built an operational data store (ODS) using PostgreSQL to continuously consolidate datasets from multiple data
                                       sources, perform collaborative work, facilitate high performance data analysis, and trace changes. The ODS has been built not
                                       only to store the COVID-19 data from Germany, Czechia, and Poland but also other areas. Employing the dimensional fact
                                       model, a schema of metadata is capable of synchronizing the various structures of data from those regions, and is scalable
                                       to the entire world. Next, the ODS is populated using batch Extract, Transfer, and Load (ETL) jobs. The SQL queries are
                                       subsequently created to reduce the need for pre-processing data for users. The data can then support not only forecasting
                                       using a version-controlled Arima-Holt model and other analyses to support decision making, but also risk calculator and
                                       optimisation apps [1, 2]. The data synchronization runs at a daily interval, which is displayed at https://www.where2test.de.


1. Introduction                                                                                         realise the data surveillance and outbreak response man-
                                                                                                        agement, which have been implemented in fighting other
In building forecasting models of COVID-19, many re- endemic diseases [4, 5, 6, 7].
searchers employ the training datasets provided by each                                                    To date, the data management have been applied in
country’s representative institutions, e.g., Robert Koch controlling the outbreak of COVID-19 [8, 9, 10, 11, 12,
Institute in Germany. The publicly accessible COVID- 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]. Most of them pro-
19 data provided in raw textual format, such as CSV, vide maps and the prevalent data in the following re-
JSON, and XML are downloaded and analysed by the gional level: (i) National level, e.g., COVID-19 data of
researchers employing either statistical or machine learn- World wide [10], Europe [11, 12, 13], and Latin Amer-
ing approaches. However, the data are unwell struc- ica [14]; (ii) State and county levels, e.g., the COVID-19
tured and require heavy pre-processing as well as in- data warehouse for Italy [15], COVID-19 dashboard for
gestion activities for further analysis. This method is UK [17], the COVID-19 dashboard for Maryland [18],
inherently inefficient due to identical and manual paral- and for Germany [19].; (iii) County level, e.g., Dresden,
lel pre-processing of the RKI data (using e.g. python or Germany [20]. More completed version is provided by
R scripts) performed by each researcher. This reduces the John Hopkins University [21], which serves the dash-
the efficiency of each and everyone’s work as all have to board and the prevalent data for each regional level in
spend hours and days in pre-processing data before com- the USA as well as for most of countries around the
ing to modeling and forecasting. Advanced computing world. Likewise, the similar method in the presence
infrastructures and novel software pipelines are crucial of semi-automatic validation strategy was conducted to
tools to synchronize the data structures which originate check the data quality of daily updated numbers with
from various sources and to extremely reduce heavy pre- governmental/official data sources [22]. However, most
processing [3]. They serve as essential prerequisites to of dashboards and data warehouses have not provided
                                                                                                        the features to let the users perform an inter-country
Proc. of the First International Workshop on Data Ecosystems (DEco’22), top-down spatiotemporal observation, i.e., observing the
September 5, 2022, Sydney, Australia                                                                    inter-country prevalence and simultaneously being able
*
  Corresponding author                                                                                  to observe to the microscopic level (nation → state →
$ w.abdussalam@hzdr.de (W. Abdussalam); j.calabrese@hzdr.de
                                                                                                        county → municipality). The features could provide in-
(J. M. Calabrese)
           © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License sights, for example, to study COVID-19 border dynam-
           Attribution 4.0 International (CC BY 4.0).
    CEUR

           CEUR Workshop Proceedings (CEUR-WS.org)
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                                        ics which have been so far attracted considerable atten-


                                                                                            64
tions [23, 24, 25, 26]. Moreover, they are lack of fore-
casting features, which play a key role in predicting the
future prevalence as well as determining non pharma-
ceutical interventions (NPI). A tremendous number of
forecasting models, e.g., agent-base [27], machine learn-
ing [28, 29], combination model [30, 31], compartment
model [32, 33, 34, 35, 36], time series [37, 38, 39, 40, 41, 42,
43] have employed government datasets to provide essen-
tial inputs for public decisions. However, most of datasets
that were used in those studies are limited to the specific
time window which are likely to produce different results
when the datasets are updated. Establishing a system of
daily-updated-datasets assisted forecasts, therefore, is an
alternative to improve their consistency and precision.
   In this paper, we address the aforementioned issues
by proposing the design of a scalable pipeline which
allow us to perform the top-down spatiotemporal obser-
vation among Germany, Czechia, and Poland as well as
to perform daily forecasts. The method of the pipeline
which consists of extraction of various data sources and
the ODS is described in subsec 2.1. More specifically,
we will describe the dimensional fact database model
and a daily migration process which underline the data
synchronization between various data sources and our               Figure 1: (a) A workflow of data pipeline Hospitals, retire-
database server. We employ the dimensional fact model              ment houses, and schools of Germany, Czechia and Poland
due to more flexibility and versatility in building spa-           update the data of COVID-19 cases, vaccines and tests to the
tiotemporal aggregation functions than the nanocubes               representative government institutions. A daily automatic ETL
model [44, 45]. Next, in subsec 2.2 we will describe the           step is performed to synchronize the data sources and central
time-series forecasting models which are supported by              database of CASUS. A daily and weekly automatic forecast
the presence of the ODS. Furthermore, the automatic                employing, e.g. Arima-Holt model, is applied to provide rapid
system of daily forecasts owing to the presence of the             predictions. The predictions and the actual data are shown
                                                                   in the where2test website; (b) The scalable dimensional fact
pipeline will be laid out in this sub section. In Sec. 3, we
                                                                   model. Datavalues and datavalue types represent measures,
will describe facilities that have been established due to
                                                                   while region types and timeperiod types represent spatial and
the presence of the ODS. In order to demonstrate the               temporal dimensions, respectively.
inter-country top-down spatiotemporal observations, an
analysis will begin from the macroscopic scale in which
the study of the virus spread across the national borders
is described in subsec 3.1. Herein we consider the border           2. Methods
among Germany, Czechia and Poland as a study case.
In subsec 3.2, we explore more microscopic level by ap-             2.1. Data Pipeline
plying a daily-updated-datasets assisted forecast for the          Fig. 1a shows a workflow of the data pipeline. The hos-
prevalence in the state of Saxony, Germany. Last but               pitals, retirement houses and schools register the daily
not least, in subsec 3.3, most microscopic level that we           number of the COVID-19 cases and vaccines to the rep-
will demonstrate is a superspreading event at a slaugh-            resentative government. In order to consolidate these
ter house in Gütersloh, Lower Saxony, Germany. As the              data, the relational database is built based on dimen-
COVID-19 situation begins to enter an endemic phase,               sional fact model [46]. Having established the relational
a study of superspreading event will provide essential             database, the daily automatic extract, transfer and load
information to trace the COVID-19 transmission after a             (ETL) step is performed to migrate and integrate the data
mass event.                                                        sources to the PostgreSQL database of CASUS HZDR
                                                                   (see Suplementary materials 7.1). Next, we create SQL
                                                                   inquiries-based views to be analysed by our researchers
                                                                   using the forecast and machine learning methods. The
                                                                   tested and completed analysis methods are set in the
                                                                   master stage and the other tested methods are set in the


                                                               65
develop stage. Only the forecasting method in the master        2.2. Forecasts
stage is integrated in the automatic pipeline.
                                                               We employ auto regression integrated moving average
   The dimensional fact model is shown in Fig. 1b. The
                                                               (ARIMA) and Holt’s linear trend models to forecast the
model consists of three main concepts: (i) Facts, that
                                                               infected, test, and hospitalised data of COVID-19 for
refer to a subject of study (e.g., the study of infected,
                                                               Saxony (Germany), Czechia, and Poland. The ARIMA
dead, recovered, hospitalised, test and vaccinated cases
                                                               model has been successfully employed in predicting other
due to COVID-19); (ii) Measures, that refer to the quan-
                                                               endemic diseases [47, 48, 49, 50]. The model features
titative data of the concept (i). The measured data are
                                                               suitable prediction based on time analysis series which
stored in the table of datavalues. The tables of datavalues
                                                               is capable of providing short horizon forecast for most
contain the number of infected, dead, recovered, hospi-
                                                               COVID-19 cases around the world [38, 39, 40, 41, 42, 43].
talised, test, and vaccinated cases due to COVID-19 in
                                                               To make the model consistent and avoid overfitting, the
a given time and place. To date, the schema consists of
                                                               order parameter of the ARIMA model is fixed instead of
three datavalues, i.e., datavalues of Germany, Czechia
                                                               using the auto ARIMA model. The ARIMA is improved
and Poland; (iii) Dimensions, that refer to temporal and
                                                               by employing the Holt’s linear trend model [51]. The
spatial attributes. As the measured data are provided in a
                                                               Holt’s model uses the exponential smoothing method to
given time and place, the table of time period types and
                                                               compute the weighted average of the past observation
regions is necessary. The former stores the type of time
                                                               data [52]. The forecasts from the Holt’s linear model have
period which consists of day and week data type; and the
                                                               a trend, so the damped parameter is turned on to avoid
latter stores the necessary information of regions which
                                                               this trend [53, 54, 52]. A self-defined mix function is used
consist of the name, abbreviation, ID of regions, ID of
                                                               to compute the probability parameter m to combine the
region type, geometry and population. The table of re-
                                                               forecasts from two models and minimize the error. The
gions depends on the table of region types. The regions
                                                               Box-Cox transformation is used to normalize the input
are categorised based on their sizes. The order of as-
                                                               data [55, 52].
cending sizes starts from municipality, county, state and
                                                                  Our model provides a weekly forecast at first. In order
nation. For Germany, the order of region type starts from
                                                               to improve the daily variation and provide more real-
Gemeinde, Kreise and Bundesland. Similar to Germany,
                                                               time forecasts, we have built a daily forecast model. As
Poland consist of Gmina, Powiat, and Wojewodztwo. Dif-
                                                               the daily data have a clear weekly variation, the sea-
ferent from Germany and Poland, Czechia consist of 4
                                                               sonal parameters are added to the model; and seasonal
level, Obec, Orp, Okres and Kraj. The spatial and tempo-
                                                               ARIMA (SARIMA) and Holt-Winters’ seasonal model are
ral attributes are connected by means of hierarchies to
                                                               employed for the daily forecasts [56, 51, 57]. Similar to
represent a -to-one relationship between them. The table
                                                               the ARIMA model, the seasonal ARIMA model uses the
of mapping_types contains the hierarchical type of the
                                                               fixed order and seasonal parameters. After comparing
spatial attributes, e.g., for Germany (Gemeinde to Kreise,
                                                               the errors from multiple methods, the additive method is
Kreise to Bundesland), for Czechia (Obec to Orp, Orp to
                                                               selected for the Holt-Winters’ seasonal model. The mix
Okres and Okres to Kraj), and Poland (Gmina to Powiat
                                                               function is also used for the daily forecasts to combine
and Powiat to Wojewodztwo). Next, a many-to-one re-
                                                               the forecasts from two models and improve the forecast-
lationship between those spatial hierarchies are stored
                                                               ing accuracy. For study cases of (S)Arima-Holt model,
in the table of mapping_regions. Moreover, the table of
                                                               in Sec. 3.2, we will provide the number of infections for
timeperiod_types consists of the hierarchical type of the
                                                               Saxony, Germany. In addition to (S)ARIMA-Holt model,
temporal attributes.
                                                               we employ outlier detection to identify and quantify Su-
   Aggregation functions are applicable on the measures
                                                               perspreading events. As suggested in [35], we identify
along the temporal and spatial dimensions. For the for-
                                                               and quantify superspreading events by using time se-
mer dimension, the weekly data are cumulative 7–day
                                                               ries analysis based outlier detection methods. The rate
data. For example, a 7–day case reported on 13.03.2022 is
                                                               of newly infected is modeled by an appropriate model,
an accumulation of the daily case for 07-13.03.2022. More-
                                                               which could be something as simple as a rolling average
over, for the latter dimension, county data are cumulative-
                                                               to more elaborate ones as SIR-based models. The residues
municipality data. Not only accumulating the data from
                                                               of the reported cases is used to identify outliers. At the
the municipality to a county level, in the presence of
                                                               same time, the residues can be used to quantify the size
mapping regions table, it is possible to accumulate the
                                                               of a superspreading event.
data from the county to the state level as well as the state
to the nation level. This allows us to scale the pipeline
to other areas provided that the data of municipality are       3. Results
available from the sources.
                                                               The presence of the pipeline has allowed us to provide
                                                               following facilities: (i) The released data hub for dead and


                                                           66
infected cases of all counties and states in Germany [58],
which allows a collaboration between CASUS research
staffs and other external collaborators. The post-processing
data serve as the clean data of daily infected and dead
cases for county and state levels. In addition, we have
also pre-processed the vaccination and hospitalization
data for the county and municipal levels; (ii) The daily
updated value of background risk for optimisation [1]
and risk calculator apps [2], which defines the chance of
an average person who lives in the focal area, and car-
ries out daily activities, will be infected over a one week
period; (iii) Blog posts which update current COVID-19
                                                             Figure 2: Difference in the pair-wise correlations for regions
situations in Germany. An interesting example of the within a 100 kilometer radius inside and outside the country.
posts would be the relation between the vaccination rate The red color represents the regions with the strongest dif-
and the 7-day incidence in all states of Germany [59]; ference, indicating the spread of the virus across the national
(iv) Forecast- and model-based analysis. We explore the borders.
study cases mentioned in Sec. 1, and begin by investi-
gating of the virus spread across the national borders of
Germany, Czechia, and Poland.                                wise correlations for each region considering the regions
                                                             in the radius of 100 kilometers, (i) within the same coun-
3.1. Analysis of the virus spread across                     try, (ii) outside this country. The difference of these val-
     the national borders                                    ues can be seen in Fig. 2. The bigger difference represents
                                                             regions where the incidence correlates much better than
COVID-19 spread among people. Therefore, human mo- the regions within the same country, indicating a strong
bility is one of the most important factors defining the national border effect on the virus spread.
trend of spatiotemporal spreading of the virus. Under-          In the next step [63], we quantified the mitigation effect
standing human mobility allows us to predict the spa- of the national border in more detail. We picked the state
tiotemporal character of spread, evaluate the government of Saxony in Germany and the neighboring regions in
steps restrictions, and provide effective non-pharmaceutical Czechia. For both countries, we collected and integrated
interventions. Primarily due to the heterogeneity of the the incidence data on the level of single municipalities.
sources and the interest scope of the particular research For each municipality, we constructed a local regression
groups and communities, most of the COVID-19 research model which estimated the effect of three parameters, (i)
stays within the boundaries of one country. While most border presence, (ii) municipality size, and (iii) temporal
human mobility happens in the extent of one country or distance from other municipalities, on the spread of the
region, notably in Europe, the national border’s mitigat- virus. Based on this model, we identified very small-
ing effect is generally diminishing. To study the impact scale areas susceptible to a more intensive inter-national
of the national border, several research papers [60, 61] ap- spread of the COVID-19.
plied various methodologies of geostatistics and geospa-        The top-down approach we selected for the study on
tial modeling. More thorough quantification of the effect the national border effect is possible thanks to the scala-
of border presence and international mobility on the epi- bility of the implemented dimensional-fact model. This
demy requires a data storage integrating heterogeneous principle allows the ODS to comprise various adminis-
datasets across more countries.                              trative levels and combine various relevant topics within
   The presented ODS infrastructure offers a possibility the perspective of spacetime.
to study the spatiotemporal character of the virus spread
on more levels, considering the effect of the national
border. First, for our case study comprising the coun- 3.2. Weekly and daily forecast of
tries of Germany, Poland, and Czechia, we explored the               Arima-Holt and Sarima-Holt
correlation of new cases in the region, the distance and For the case study, we provide a short-time forecast of 7-
the border presence. We observed that the neighbour day incidence up to 4 horizons performed on 13-04-2022
regions tend to have similar incidence values in the ab- using Arima-Holt model for Saxony, Germany. We used
sence of barrier in the form of a national border among a training dataset of 13-04-2022 version which consists of
them. This step followed the research of McMahon et al. the historical weekly data of Saxony and its counties from
[62], which showed a strong spatial autocorrelation of 01-03-2020 to 10-04-2022. The weekly data are automated-
incidence values in the USA.                                 daily-updated data which are aggregated on Sunday (see
   Further, we calculated the average time-lagged pair-


                                                           67
                                                                   a different day, a deviation from the actual data for the
                                                                   following 4 horizons is likely to occur. Additional realisa-
                                                                   tions of Arima-Holt forecast in Saxony and its counties,
                                                                   therefore, were performed to improve statistics. The re-
                                                                   alisations were performed every Wednesday from 05-01-
                                                                   2022 to 18-05-2022 in which the version-control dataset
                                                                   were employed as training and test datasets. An example
                                                                   would be a realisation of the Forecast on 05-01-2022. We
Figure 3: 7-day incidence of infected cases Jan - 8 May 2022 used the weekly data version of 05-01-2022 as its training
for Saxony, Germany. The black dots denote the historical dataset and the weekly data version of the following 1st,
data, the blue line (—) denotes a line guidance for the historical 2nd, 3rd and 4th week as its test datasets. For each region,
data, and the green (—), orange (—), and red line (—) denotes we then recorded a deviation of the forecast result from
the result of forecast using the Arima-Holt model performed
                                                                   the historical data and quantified it as mean absolute
on 10-04-2022, 11-04-2022, and 13-04-2022, respectively. The
grey area shows the lower and upper limits of the forecast for
                                                                   percentage error (MAPE). As shown in Fig. 4, the weekly
13-04-2022.                                                        Arima-Holt provides relatively low MAPE for the first
                                                                   and second horizon. For the third and fourth horizon,
                                                                   however, the range of MAPE tends to be wider than the
                                                                   first and second.
Sec. 2.1). Although we update the data daily, for the
                                                                      Therefore, we performed the Sarima-Holt model to
case of Germany, the current and previous-day data are
                                                                   improve the performance of forecast for the third and
unavailable. In addition, the previous third day data are
                                                                   fourth horizon. Owing to daily-updated data, the version-
still to be updated from the source. When the forecast was
                                                                   control of daily data is employed as the seasonal pa-
performed on Sunday 10-04-2022, the number of infection
                                                                   rameters. In addition to the daily data, the Sarima-Holt
on that day was less than the number of the same day
                                                                   forecast was performed using the same version-control
for the following-day version. As a result, this produces
                                                                   weekly data employed to the Arima-Holt model. For the
inaccurate forecast (see Fig. 3). As the day elapsed, more
                                                                   daily data, we removed the current and two previous-
cases were automatically added and aggregated to the last
                                                                   day data due to zero values for current and yesterday
Sunday data. Consequently, the performed forecast on
                                                                   data, and inconsistent data for the previous third day. We
13-04-2022 provides higher exponent than the one with
                                                                   then compared its performance in the presence and the
the dataset version of 10 and 11-04-2022. Moreover, the
                                                                   absence of the Box-Cox transformation (BCT) used to
dataset of Wednesday consists of relatively-stable version.
                                                                   normalize the input data. As shown in Fig. 4, the Sarima-
Therefore, the forecast is performed every Wednesday
                                                                   Holt model in the absence of the BCT provides lower
due to the consistency of data source for the last Sunday.
                                                                   MAPE than either the Arima-Holt or the Sarima-Holt in
                                                                   the presence of the BCT for not only the first and second
                                                                   horizons, but also the third and four horizons.

                                                                  3.3. Superspreading events
                                                                Superspreding events play an important role in the dis-
                                                                persion dynamics of COVID-19 [64]. However, one of
                                                                the most commonly used epidemiological model types,
                                                                the compartment models, are not able to accuratly cap-
                                                                ture these events [35, 65]. We are currently working on a
Figure 4: Mean absolute percentage error of Arima-Holt          solution to the problem by using outlier detection meth-
(weekly), Sarima-Holt in the presence of Box-cox transfor-      ods on a county level. Many different methods exist and
mation (daily_originT), and Sarima-Holt in the absence of
                                                                they can produce more robust results, when more than
Box-cox transformation (daily_originF) for 1𝑡ℎ - 4𝑡ℎ horizon.
                                                                one timeseries is taken into account. A database as pre-
                                                                sented in this work is very advantageous, as it makes it
  In order to check the four-horizon forecast, we com-          very convenient to query the reported infections from all
pare it to the weekly-historical data updated on 11-05-         neighboring counties and use this additional data to more
2022. The latter consists of relatively stable data from        robustly identify outliers, which might be superspread-
17-04-2022 to 08-05-2022. As shown in Fig. 3, the weekly-       ing events. The largest confirmed superspreading to date
historical data is surprisingly in quantitative agreement       in Germany with 1766 infections happened in a meat pro-
with the four-horizon forecast. However, this agreement         cessing facility in the North Rhine-Westphalian district
occurs occasionally. When the forecast is performed in          of Gütersloh in June 2020. The facilities’ environmental


                                                             68
conditions combined with relatively close physical dis-          The Sarima-Holt model is trained by the daily data, and
tance between workers were likely the main reason for            the variation of the data could make the model more
efficient aerosol transmission [66]. We take this event as       sensitive to the infection change compared to the Arima-
an example to show the result of a Z-score based outlier         Holt model trained by the weekly data. However, the
detection method (Fig. 5).                                       BCT reduces the variation of the daily data, and conse-
                                                                 quently the daily forecasts perform worse than in the
                                                                 absence of the BCT.


                                                                  5. Conclusion
                                                                 Our work has demonstrated the utility of the data pipeline
                                                                 for top-down spatiotemporal analysis. We have first
                                                                 shown the macroscopic analysis, in which the investi-
                                                                 gation of the virus spread across the national border is
                                                                 presented. At more microscopic level, we have demon-
Figure 5: The official reported COVID-19 daily incidence
                                                                 strated data-driven approach due to the presence of the
per 100.000 inhabitants in the district of Gütersloh. A super-
spreading event in a meat processing plant in June 2020 is
                                                                 pipeline which is applied to the prevalence of the county
successfully identified by an outlier detection method based     region. The daily-updated data has improved the preci-
on the Z-score (the black dot).                                  sion of the model for longer horizon. This data-driven
                                                                 epidemic models provide more realistic forecast results
                                                                 than either the parsimonious [34] or more number of
                                                                 parameters with agent-based method [27] due to the us-
4. Discussions                                                   age of daily-updated data. This may contribute to public
                                                                 health policy making, including contributing to public
Our analysis, implementing the pipeline in the presence          health forecasting teams. Last but not least, exploring
of dimensional fact model has allowed us to daily mi-            to lower level of region, we have demonstrated that the
grate the data efficiently due to the functions of spa-          outlier model is applicable to capture the superspreading
tiotemporal aggregation. To provide the weekly data of           event which occurred in 2020. These have shown that
counties, states, and nations, we only migrate the data of       our work is capable of performing top-down analysis as
daily municipalities/counties (depends on the data avail-        well as rapid and precise forecasts due to the presence of
ability of each nation) to the database server which are         the pipeline.
then aggregated to the higher spatiotemporal level. This
model provides more advantages than the nanocubes
model [44, 45]. For the nanocubes model, each spatial
                                                                  6. Data sources
(municipality, county, state and nation) and temporal                 • COVID-19 data for Germany, Czechia and Poland.
(daily and weekly) data are required to be migrated to
the database server. Consequently, this leads to a longer                   – Robert Koch Institute
migration process than the one performed using the di-                      – Czech Ministry of Health
mensional fact model. Moreover, its spatiotemporal map-                     – Polish Ministry of Health
ping enables us to perform an efficient table join among                    – Age-based hospitalisation of state level for
national data which is confirmed by the application on                        Germany (https://github.com/KITmetricsl
the Subsec. 3.1.                                                              ab/hospitalization-nowcast-hub/blob/ma
   The presence of daily-updated data due to the presence                     in/data-truth/COVID-19/).
of the pipeline has allowed us to develop the Sarima-Holt                   – Age-based and type-based doses of vaccine
model. The model shows more robust prediction for                             for county level (https://github.com/rober
longer horizon than the Arima-Holt one. More specifi-                         t-koch-institut/COVID-19-Impfungen_i
cally, the Sarima-Holt in the absence of the BCT outper-                      n_Deutschland/blob/master/Aktuell_De
forms the Arima-Holt model for the third and fourth hori-                     utschland_Landkreise_COVID-19-Impfu
zon. This performance is due to the seasonal-parameter                        ngen.csv).
contribution to the model. As a result, the forecast tends                  – COVID-19 infected, recovered, hospitalised
to better predict for the third and fourth horizon. In con-                   and dead cases of Dresden (http://daten.dr
tradiction, the Sarima-Holt in the presence of the BCT                        esden.de/duva2ckan/files/de-sn-dresden
provides lower performance than the absence one due to                        -corona_-_covid-19_-_fallzahlen_md1_d
less variation of the training data after BCT (see Fig. 7).                   resden_2020ff/content).


                                                             69
      – COVID-19 infected, dead, and test cases 7. Supplementary information
        of Czechia for Municipality level (https:
        //onemocneni-aktualne.mzcr.cz/api/v2/c 7.1. Data workflow
        ovid-19/).
                                                    We use https://www.talend.com/products/talend-open-
      – Age-based and gender-based infected and
                                                    studio/ to perform data migration. The migration be-
        dead cases for county level of Germany
                                                    tween the data sources and the PostgreSQL database of
        (https://experience.arcgis.com/experience
                                                    CASUS HZDR has been performed as follows:
        /478220a4c454480e823b17327b2bf1d4).
      – COVID-19 cases for municipality level of
        Saxony, Germany (https://www.coronavi
        rus.sachsen.de/corona-statistics/rest/inf
        ectionOverview.jsp).
      – COVID-19 cases for county level of Saxony,
        Germany (https://media.githubuserconten
        t.com/media/robert-koch-institut/SARS
        -CoV-2_Infektionen_in_Deutschland/ma
        ster/Aktuell_Deutschland_SarsCov2_Infe
        ktionen.csv)                                Figure 6: Data workflow of the ETL process (see texts for its
      – COVID-19 infected, dead, and test cases description).
        for county level of Poland (https://wojewo
        dztwa-rcb-gis.hub.arcgis.com/pages/dane
        -do-pobrania).
                                                        1. Data acquisition
      – COVID-19 vaccine for county level of Poland        The data are automatically downloaded from sources 6.
        (https://www.gov.pl/web/szczepimysie/ra            They are subsequently stored on the repository
        port-szczepien-przeciwko-covid-19).                of where2test server. The downloaded data serve
      – COVID-19 types in Sachsen (https://www.            as data inputs of a migration process.
        coronavirus.sachsen.de/infektionsfaelle-i       2. Dictionaries and data augmentation
        n-sachsen-4151.html).                              To integrate and further augment data from het-
• Dictionaries of regions.                                  erogeneous sources (various forms, schema, tem-
      – Administrative areas in Germany (https:             poral and spatial extent), we needed to prepare
        //gdz.bkg.bund.de/index.php/default/digi            a list of dictionaries. We formed a dictionary for
        tale-geodaten/verwaltungsgebiete.html).             each spatial level in every country to cover all
      – Administrative areas in Poland (https://gi          regions in our datasets. Here we included the
        s-support.pl/baza-wiedzy-2/dane-do-pob              unique region id, all alternative names, full names,
        rania/granice-administracyjne/)                     geometries, and population numbers. This con-
                                                            cept can be further extended to other values such
      – Administrative areas in Czechia (https://ge
                                                            as socioeconomical parameters, and information
        oportal.cuzk.cz/(S(1nhx02lray0vkrhce1y2
                                                            about the region. This way we are able to main-
        d53d))/Default.aspx?mode=TextMeta&te
                                                            tain the consistency in all datasets and enable
        xt=dSady_RUIAN&side=dSady_RUIAN)
                                                            their integration process. The list of sources used
      – Population numbers in Czech municipali-             for building the dictionaries can be found in sec-
        ties (https://www.czso.cz/csu/czso/pocet            tion Data Sources 6.
        -obyvatel-v-obcich-k-112021)
                                                         3. Data cleaning
      – Postal codes in Germany (https://www.ge             We migrate first timeperiod_types, region_types,
        onames.org/postal-codes/postleitzahle               datavalues_types, and mapping_types. While
        n-deutschland.html)                                 migrating the data to those tables, primary key
      – Population numbers in Poland (https://st            are automatically set by a transformator (The
        at.gov.pl/obszary-tematyczne/ludnosc/lu             script which migrates the data to the postgreSQL
        dnosc/ludnosc-stan-i-struktura-ludnosc              database.). Next, the primary key of those tables
        i-oraz-ruch-naturalny-w-przekroju-teryt             serves as the foreign key of other tables following
        orialnym-stan-w-dniu-30-06-2021,6,30.h              the table relation shown in Fig. 1b. An example
        tml)                                                would be a table of regions which contains intrin-
                                                            sic ID set by representative governments. In order
                                                            to differentiate ID among Germany, Czechia and


                                                  70
       Poland, we add ’DE’, ’CZ’, ’PL’, respectively, fol-
       lowed by the intrinsic ID. For the table of regions,
       the primary key of region_types serves as its
       foreign key. The intrinsic IDs are categorised
       based on the ID of region types. A specific ex-
       ample would be Dresden, whose the intrinsic ID
       14162. After cleaning processes, the intrinsic ID
       will be DE 14162 and categorised to the state level
       of Kreise.
       Having migrated the data to the aforementioned
       tables, the table of mapping_regions is occu-
       pied by the spatial-relation data. It contains the
       foreign key of the mapping type ID. An example
       would be a county Dresden. Dresden are mapped                 Figure 7: Time series of daily infected cases from
       onto the state of Saxony and categorized to the               Aug. 5, 2020 to Apr. 30, 2022 (a) before and (b) after
       mapping type Kreis_To_Bundesland. Next, the                   Box-Cox transformation, respectively.
       table of datavalues for nations is occupied by the
       data input. The datavalues table consists of three
       foreign keys which originate from the tables of        Research (BMBF) and by the Saxon Ministry for Science,
       timeperiod_types, regions, datavalues_types.           Culture and Tourism (SMWK) with tax funds on the basis
       In the presence of these foreign keys, a data merg-    of the budget approved by the Saxon State Parliament.
       ing process is feasible, which is described on the     We thank to Jens Steiner for providing us virtual server
       following item.                                        of HZDR.
    4. Data merging In addition to the aforementioned
       three-foreign keys, date is set as the fourth at-
       tribute which allow us to perform data merging          References
       through inner join of tables. The inner join is
       employed to cleanly merge and avoid duplicated          [1] M. Davoodi, A. Batista, A. Senapati, W. Schlechte-
       data on the table of datavalues. For instance, daily        Welnicz, B. Wagner, J. M. Calabrese, Modeling
       infected data of the lowest-level region for pe-            COVID-19 optimal testing strategies in long-term
       riod of date are migrated to the table of dataval-          care facilities: An optimization-based approach,
       ues_germany. When the data sources are up-                  arXiv (2022). URL: https://arxiv.org/abs/2204.02062.
       dated, they sometimes update the cases of the               doi:10.48550/ARXIV.2204.02062.
       elapsed date. Inner join method allows us to au-        [2] M. Davoodi, A. Senapati, A. Mertel, W. Schlechte-
       tomatically update the value of the elapsed date            Welnicz, J. M. Calabrese, Optimal Workplace Occu-
       by the latest value. Moreover, when the new data            pancy Strategies during the COVID-19 Pandemic,
       with the latest date are present from the source, it        arXiv (2022). URL: https://arxiv.org/abs/2204.01444.
       allows automatic addition of the data to the table.         doi:10.48550/ARXIV.2204.01444.
    5. Data aggregation The presence of daily data of          [3] J. L. Raisaro, others, SCOR: A secure international
       the lowest regions allow us to perform both time            informatics infrastructure to investigate COVID-
       and spatial aggregations. Using functions, the              19, Journal of the American Medical Informatics
       time aggregation from daily to weekly period is             Association 27 (2020) 1721–1726. doi:10.1093/ja
       feasible. Moreover, as mentioned on the Sec. 2,             mia/ocaa172.
       the spatial aggregation from the low to the high        [4] A. v. Wangenheim, A. Savaris, A. F. Borgatto,
       region level is allowable in the presence of the            A. d. S. Inácio, Integrating Online Georefer-
       mapping_regions table.                                      enced Epidemiological Analysis and Visualization
                                                                   into a Telemedicine Infrastructure – First Results,
                                                                   medRxiv (2019). URL: https://www.medrxiv.org/co
       7.2. Additional forecasting results                         ntent/10.1101/19000554v1.full.
                                                               [5] C. Fähnrich, others, Surveillance and Outbreak
Acknowledgments                                                    Response Management System (SORMAS) to sup-
                                                                   port the control of the Ebola virus disease out-
This work was partially funded by the Center of Ad-                break in West Africa, Euro Surveill 20 (2015) 21071.
vanced Systems Understanding (CASUS), which is fi-                 doi:https://doi.org/10.2807/1560-7917.
nanced by Germany’s Federal Ministry of Education and              es2015.20.12.21071.


                                                          71
 [6] R. N. Smith, others, InterMine: a flexible data ware-         /cases.
     house system for the integration and analysis of         [18] g. maryland, Coronavirus Disease 2019 (COVID-19)
     heterogeneous biological data, Bioinformatics 28              Outbreak, 2022. URL: https://coronavirus.maryland
     (2012) 3163–3165. doi:https://doi.org/10.1                    .gov.
     093/bioinformatics/bts577.                               [19] c. d. rki, Robert Koch-Institut: COVID-19-
 [7] C. Pfander, B. Anar, F. Schwach, T. D. Otto, M. Bro-          Dashboard, 2022. URL: https://experience.arcgi
     chet, K. Volkmann, M. A. Quail, A. Pain, B. Rosen,            s.com/experience/478220a4c454480e823b17327b2
     W. Skarnes, J. C. Rayner, O. Billker, A scalable              bf1d4/page/Landkreise/.
     pipeline for highly effective genetic modification of    [20] c. dresden, Corona-Dashboard Dresden, 2022. URL:
     a malaria parasite, Nature Methods 8 (2011) 1078–             https://experience.arcgis.com/experience/d2386f3
     1082. URL: https://doi.org/10.1038/nmeth.1742.                214c1451c81b242be69bb3d50.
     doi:10.1038/nmeth.1742.                                  [21] E. Dong, H. Du, L. Gardner, An interactive web-
 [8] P. Kostkova, others, Data and Digital Solutions               based dashboard to track COVID-19 in real time,
     to Support Surveillance Strategies in the Context             The Lancet Infectious Diseases 20 (2020) 533–534.
     of the COVID-19 Pandemic, Frontiers in Digital                URL: https://doi.org/10.1016/S1473-3099(20)301
     Health 3 (2021). doi:https://doi.org/10.338                   20-1. doi:10.1016/S1473-3099(20)30120-1,
     9/fdgth.2021.707902.                                          publisher: Elsevier.
 [9] J. Budd, others, Digital technologies in the public-     [22] D. Sha, Y. Liu, Q. Liu, Y. Li, Y. Tian, F. Beaini,
     health response to COVID-19, Nature medicine 26               C. Zhong, T. Hu, Z. Wang, H. Lan, Y. Zhou, Z. Zhang,
     (2020) 1183–1192. URL: https://www.nature.com/a               C. Yang, A spatiotemporal data collection of viral
     rticles/s41591-020-1011-4.                                    cases for COVID-19 rapid response, Big Earth Data
[10] F. A. Binti Hamzah, C. Hau, H. Nazri, D. Ligot,               5 (2021) 90–111. URL: https://doi.org/10.1080/2096
     G. Lee, M. Shaib, U. Zaidon, A. Abdullah, M. H.               4471.2020.1844934. doi:10.1080/20964471.202
     Chung, C. Ong, P. Chew, CoronaTracker: World-                 0.1844934, publisher: Taylor & Francis.
     wide COVID-19 Outbreak Data Analysis and Pre-            [23] Han Xiaoyi, Xu Yilan, Fan Linlin, Huang Yi, Xu
     diction (2020). doi:10.2471/BLT.20.255695.                    Minhong, Gao Song, Quantifying COVID-19 impor-
[11] E. Centre, European Centre for Disease Prevention             tation risk in a dynamic network of domestic cities
     and Control, 2022. URL: https://qap.ecdc.europa.eu            and international countries, Proceedings of the Na-
     /public/extensions/covid-19/covid-19.html#globa               tional Academy of Sciences 118 (2021) e2100201118.
     l-overview-tab.                                               URL: https://doi.org/10.1073/pnas.2100201118.
[12] A. Naqvi, COVID-19 European regional tracker,                 doi:10.1073/pnas.2100201118, publisher: Pro-
     Scientific Data 8 (2021) 181. URL: https://doi.org/10         ceedings of the National Academy of Sciences.
     .1038/s41597-021-00950-7. doi:10.1038/s41597             [24] D. Laroze, E. Neumayer, T. Plümper, COVID-19
     -021-00950-7.                                                 does not stop at open borders: Spatial contagion
[13] c. eudata, covid19-eu-data, 2020. URL: https://gith           among local authority districts during England’s
     ub.com/covid19-eu-zh/covid19-eu-data.                         first wave, Social Science & Medicine 270 (2021)
[14] c.-. latinoamerica, Latin America Covid-19 Data               113655. URL: https://www.sciencedirect.com/scie
     Repository by DSRP, 2020. URL: https://github.com             nce/article/pii/S0277953620308741. doi:10.1016/
     /DataScienceResearchPeru/covid-19_latinoameric                j.socscimed.2020.113655.
     a.                                                       [25] M. Grimée, M. Bekker-Nielsen Dunbar, F. Hofmann,
[15] G. Agapito, C. Zucco, M. Cannataro, COVID-                    L. Held, Modelling the effect of a border closure
     WAREHOUSE: A Data Warehouse of Italian COVID-                 between Switzerland and Italy on the spatiotem-
     19, Pollution, and Climate Data, Environmental                poral spread of COVID-19 in Switzerland, Spatial
     Research and Public Health 17 (2020). doi:https:              Statistics (2021) 100552. URL: https://www.scienced
     //doi.org/10.3390/ijerph17155596.                             irect.com/science/article/pii/S2211675321000622.
[16] R. K. Arora, A. Joseph, J. Van Wyk, S. Rocco, A. At-          doi:10.1016/j.spasta.2021.100552.
     maja, E. May, T. Yan, N. Bobrovitz, J. Chevrier,         [26] M. P. Hossain, A. Junus, X. Zhu, P. Jia, T.-H. Wen,
     M. P. Cheng, T. Williamson, D. L. Buckeridge, Sero-           D. Pfeiffer, H.-Y. Yuan, The effects of border control
     Tracker: a global SARS-CoV-2 seroprevalence dash-             and quarantine measures on the spread of COVID-
     board, The Lancet Infectious Diseases 21 (2021)               19, Epidemics 32 (2020) 100397. URL: https://ww
     e75–e76. URL: https://doi.org/10.1016/S1473-309               w.sciencedirect.com/science/article/pii/S1755436
     9(20)30631-9. doi:10.1016/S1473-3099(20)3                     520300244. doi:10.1016/j.epidem.2020.1003
     0631-9, publisher: Elsevier.                                  97.
[17] c. GovUK, Interactive map of cases, 2022. URL: https:    [27] Q.-H. Liu, others, Model-based evaluation of al-
     //coronavirus.data.gov.uk/details/interactive-map             ternative reactive class closure strategies against


                                                         72
     COVID-19, Nat. Com. 13 (2022). doi:10.1038/s4                  09945.
     1467-021-27939-5.                                         [38] S. Roy, G. S. Bhunia, P. K. Shit, Spatial prediction
[28] H. Bastani, others, Efficient and targeted COVID-19            of COVID-19 epidemic using ARIMA techniques in
     border testing via reinforcement learning, Nature              India, Modeling Earth Systems and Environment 7
     559 (2021). URL: https://www.nature.com/articles/              (2021) 1385–1391. URL: https://doi.org/10.1007/s4
     s41586-021-04014-z.                                            0808-020-00890-y. doi:10.1007/s40808-020-0
[29] S. Flaxman, others, Estimating the effects of non-             0890-y.
     pharmaceutical interventions on COVID-19 in Eu-           [39] M.-J. Geng, H.-Y. Zhang, L.-J. Yu, C.-L. Lv, T. Wang,
     rope, Nature 584 (2020) 257. URL: https://www.na               T.-L. Che, Q. Xu, B.-G. Jiang, J.-J. Chen, S. I. Hay,
     ture.com/articles/s41586-020-2405-7.                           Z.-J. Li, G. F. Gao, L.-P. Wang, Y. Yang, L.-Q. Fang,
[30] N. Haug, others, Ranking the effectiveness of world-           W. Liu, Changes in notifiable infectious disease
     wide COVID-19 government interventions, Nature                 incidence in China during the COVID-19 pandemic,
     Human Behaviour 4 (2020) 1303–1312. URL: https:                Nature Communications 12 (2021) 6923. URL: https:
     //www.nature.com/articles/s41562-020-01009-0.                  //doi.org/10.1038/s41467-021-27292-7. doi:10.103
[31] A. Liu, L. Vici, V. Ramos, S. Giannoni, A. Blake, Vis-         8/s41467-021-27292-7.
     itor arrivals forecasts amid COVID-19: A perspec-         [40] Y. Wang, C. Xu, S. Yao, L. Wang, Y. Zhao, J. Ren,
     tive from the Europe team, Annals of Tourism Re-               Y. Li, Estimating the COVID-19 prevalence and
     search 88 (2021) 103182. URL: https://www.scienced             mortality using a novel data-driven hybrid model
     irect.com/science/article/pii/S016073832100044X.               based on ensemble empirical mode decomposition,
     doi:10.1016/j.annals.2021.103182.                              Scientific Reports 11 (2021) 21413. URL: https://doi.
[32] S. Lai, others, Effect of non-pharmaceutical inter-            org/10.1038/s41598-021-00948-6. doi:10.1038/s4
     ventions to contain COVID-19 in China, Nature                  1598-021-00948-6.
     585 (2020) 410. URL: https://www.nature.com/artic         [41] V. K. Sharma, U. Nigam, Modeling and Forecasting
     les/s41586-020-2293-x.                                         of COVID-19 Growth Curve in India, Transactions
[33] D. Fanelli, F. Piazza, Analysis and forecast of                of the Indian National Academy of Engineering 5
     COVID-19 spreading in China, Italy and France,                 (2020) 697–710. URL: https://doi.org/10.1007/s414
     Chaos, Solitons & Fractals 134 (2020) 109761. URL:             03-020-00165-z. doi:10.1007/s41403-020-001
     https://www.sciencedirect.com/science/article/pi               65-z.
     i/S0960077920301636. doi:10.1016/j.chaos.20               [42] A. K. Sahai, N. Rath, V. Sood, M. P. Singh, ARIMA
     20.109761.                                                     modelling & forecasting of COVID-19 in top five
[34] Bertozzi Andrea L., Franco Elisa, Mohler George,               affected countries, Diabetes & Metabolic Syndrome:
     Short Martin B., Sledge Daniel, The challenges of              Clinical Research & Reviews 14 (2020) 1419–1427.
     modeling and forecasting the spread of COVID-19,               URL: https://www.sciencedirect.com/science/arti
     Proceedings of the National Academy of Sciences                cle/pii/S1871402120302903. doi:10.1016/j.dsx.
     117 (2020) 16732–16738. URL: https://doi.org/10                2020.07.042.
     .1073/pnas.2006520117. doi:10.1073/pnas.200               [43] D. Benvenuto, M. Giovanetti, L. Vassallo, S. An-
     6520117, publisher: Proceedings of the National                geletti, M. Ciccozzi, Application of the ARIMA
     Academy of Sciences.                                           model on the COVID-2019 epidemic dataset, Data
[35] L. Schüler, J. M. Calabrese, S. Attinger, Data driven          in Brief 29 (2020) 105340. URL: https://www.scienc
     high resolution modeling and spatial analyses of               edirect.com/science/article/pii/S235234092030234
     the COVID-19 pandemic in Germany, PLOS ONE                     1. doi:10.1016/j.dib.2020.105340.
     16 (2021) e0254660. URL: https://doi.org/10.1371/         [44] L. Lins, J. T. Klosowski, C. Scheidegger, Nanocubes
     journal.pone.0254660. doi:10.1371/journal.po                   for Real-Time Exploration of Spatiotemporal
     ne.0254660, publisher: Public Library of Science.              Datasets, IEEE Transactions on Visualization and
[36] I. Rahimi, F. Chen, A. H. Gandomi, A review on                 Computer Graphics 19 (2013) 2456–2465. doi:10.1
     COVID-19 forecasting models, Neural Computing                  109/TVCG.2013.179.
     and Applications (2021). URL: https://doi.org/10.1        [45] A. Bosworth, J. Gray, A. Layman, H. Pirahesh, Data
     007/s00521-020-05626-8. doi:10.1007/s00521-0                   Cube: A Relational Aggregation Operator General-
     20-05626-8.                                                    izing Group-By, Cross-Tab, and Sub-Totals, Tech-
[37] R. Salgotra, M. Gandomi, A. H. Gandomi, Time                   nical Report MSR-TR-95-22, Institute of Electrical
     Series Analysis and Forecast of the COVID-19 Pan-              and Electronics Engineers, Inc., 1995. URL: https:
     demic in India using Genetic Programming, Chaos,               //www.microsoft.com/en-us/research/publication
     Solitons & Fractals 138 (2020) 109945. URL: https:             /data-cube-a-relational-aggregation-operator-g
     //www.sciencedirect.com/science/article/pii/S096               eneralizing-group-by-cross-tab-and-sub-totals/.
     0077920303441. doi:10.1016/j.chaos.2020.1                 [46] M. Golfarelli, D. Mario, S. Rizzi, The dimensional


                                                          73
     fact model: a conceptual model for data warehouses,             American Statistical Association 77 (1982) 63–70.
     International Journal of Cooperative Information                URL: https://www.tandfonline.com/doi/abs/10.108
     Systems 7 (1998) 215–247. doi:https://doi.or                    0/01621459.1982.10477767. doi:10.1080/016214
     g/10.1142/S0218843098000118.                                    59.1982.10477767, publisher: Taylor & Francis.
[47] E. O. Nsoesie, O. Oladeji, A. S. A. Abah, M. L. Ndeffo-    [57] P. R. Winters, Forecasting Sales by Exponentially
     Mbah, Forecasting influenza-like illness trends in              Weighted Moving Averages, Management Science 6
     Cameroon using Google Search Data, Scientific                   (1960) 324–342. URL: https://doi.org/10.1287/mnsc
     Reports 11 (2021) 6713. URL: https://doi.org/10.1               .6.3.324. doi:10.1287/mnsc.6.3.324, publisher:
     038/s41598-021-85987-9. doi:10.1038/s41598-0                    INFORMS.
     21-85987-9.                                                [58] W. Abdussalam, Post-processing data of daily dead
[48] Y. Chen, Y. Zhang, Z. Xu, X. Wang, J. Lu, W. Hu,                and infected COVID-19 in Germany (2022). URL:
     Avian Influenza A (H7N9) and related Internet                   https://zenodo.org/badge/latestdoi/462876343.
     search query data in China, Scientific Reports 9                doi:DOI:10.5281/zenodo.6336637.
     (2019) 10434. URL: https://doi.org/10.1038/s41598          [59] A. Mertel, M. Laqua, Where2Test visualization high-
     -019-46898-y. doi:10.1038/s41598-019-46898                      lights strong link between pace of vaccinations and
     -y.                                                             incidences, 2022. URL: https://www.where2test.de/
[49] Z. He, H. Tao, Epidemiology and ARIMA model                     blog#vaccination-maps.
     of positive-rate of influenza viruses among chil-          [60] M. Eckardt, K. Kappner, N. Wolf, Covid-19 across eu-
     dren in Wuhan, China: A nine-year retrospective                 ropean regions: The role of border controls (2020).
     study, International Journal of Infectious Diseases        [61] M. Grimée, M. B.-N. Dunbar, F. Hofmann, L. Held,
     74 (2018) 61–70. URL: https://www.sciencedirec                  et al., Modelling the effect of a border closure be-
     t.com/science/article/pii/S1201971218344618.                    tween switzerland and italy on the spatiotemporal
     doi:10.1016/j.ijid.2018.07.003.                                 spread of covid-19 in switzerland, Spatial statistics
[50] Q. Zeng, D. Li, G. Huang, J. Xia, X. Wang, Y. Zhang,            (2021) 100552.
     W. Tang, H. Zhou, Time series analysis of temporal         [62] T. McMahon, A. Chan, S. Havlin, L. K. Gallos, Spa-
     trends in the pertussis incidence in Mainland China             tial correlations in geographical spreading of covid-
     from 2005 to 2016, Scientific Reports 6 (2016) 32367.           19 in the united states, Scientific Reports 12 (2022)
     URL: https://doi.org/10.1038/srep32367. doi:10.1                1–10.
     038/srep32367.                                             [63] A. Mertel, J. Vyskočil, L. Schüler, W. Schlechte-
[51] C. C. Holt, Forecasting seasonals and trends by                 Wełnicz, J. M. Calabrese, Fine-scale variation in the
     exponentially weighted moving averages, Interna-                effect of national border on covid-19 spread: A case
     tional Journal of Forecasting 20 (2004) 5–10. URL:              study of the saxon-czech border region, medRxiv
     https://www.sciencedirect.com/science/article/pi                (2022).
     i/S0169207003001134. doi:10.1016/j.ijforeca                [64] J. E. Lemieux, K. J. Siddle, B. M. Shaw, C. Loreth, S. F.
     st.2003.09.015.                                                 Schaffner, A. Gladden-Young, G. Adams, T. Fink,
[52] R. J. Hyndman, G. Athanasopoulos, Forecasting:                  C. H. Tomkins-Tinch, L. A. Krasilnikova, K. C.
     Principles and Practice., OTexts, 2018. URL: https:             DeRuff, M. Rudy, M. R. Bauer, K. A. Lagerborg,
     //otexts.com/fpp2/.                                             E. Normandin, S. B. Chapman, S. K. Reilly, M. N.
[53] E. S. Gardner, E. McKenzie, Why the damped trend                Anahtar, A. E. Lin, A. Carter, C. Myhrvold, M. E.
     works, Journal of the Operational Research Society              Kemball, S. Chaluvadi, C. Cusick, K. Flowers,
     62 (2011) 1177–1180. URL: https://doi.org/10.1                  A. Neumann, F. Cerrato, M. Farhat, D. Slater, J. B.
     057/jors.2010.37. doi:10.1057/jors.2010.37,                     Harris, J. Branda, D. Hooper, J. M. Gaeta, T. P.
     publisher: Taylor & Francis.                                    Baggett, J. O’Connell, A. Gnirke, T. D. Lieberman,
[54] E. S. Gardner, E. Mckenzie, Forecasting Trends in               A. Philippakis, M. Burns, C. M. Brown, J. Luban, E. T.
     Time Series, Management Science 31 (1985) 1237–                 Ryan, S. E. Turbett, R. C. LaRocque, W. P. Hanage,
     1246. URL: https://doi.org/10.1287/mnsc.31.10.1                 G. R. Gallagher, L. C. Madoff, S. Smole, V. M. Pierce,
     237. doi:10.1287/mnsc.31.10.1237, publisher:                    E. Rosenberg, P. C. Sabeti, D. J. Park, B. L. Maclnnis,
     INFORMS.                                                        Phylogenetic analysis of SARS-CoV-2 in the Boston
[55] V. M. Guerrero, Time-series analysis supported                  area highlights the role of recurrent importation
     by power transformations, Journal of Forecasting                and superspreading events, preprint, Epidemiology,
     12 (1993) 37–48. URL: https://doi.org/10.1002/fo                2020. doi:10.1101/2020.08.23.20178236.
     r.3980120104. doi:10.1002/for.3980120104,                  [65] G. B. Libotte, L. dos Anjos, R. C. Almeida, S. M. C.
     publisher: John Wiley & Sons, Ltd.                              Malta, R. S. Silva, Framework for enhancing the
[56] S. C. Hillmer, G. C. Tiao, An ARIMA-Model-Based                 estimation of model parameters for data with a high
     Approach to Seasonal Adjustment, Journal of the                 level of uncertainty, preprint, Epidemiology, 2020.


                                                           74
     URL: http://medrxiv.org/lookup/doi/10.1101/2020.
     12.17.20248389. doi:10.1101/2020.12.17.202
     48389.
[66] T. Günther, M. Czech-Sioli, D. Indenbirken, A. Ro-
     bitaille, P. Tenhaken, M. Exner, M. Ottinger, N. Fis-
     cher, A. Grundhoff, M. M. Brinkmann, SARS-CoV-2
     outbreak investigation in a German meat process-
     ing plant, EMBO Molecular Medicine 12 (2020)
     e13296. URL: https://doi.org/10.15252/emmm.20
     2013296. doi:10.15252/emmm.202013296, pub-
     lisher: John Wiley & Sons, Ltd.


                                                         75