Modeling and Prediction of COVID-19 Using Hybrid Dynamic
Model Based on SEIRD with ARIMA Corrections

Yaroslav Linder, Maksym Veres and Kateryna Kuzminova
Taras Shevchenko National University of Kyiv, Akademika Hlushkova Ave 4d, Kyiv, 03680, Ukraine


                 Abstract
                 While effective prediction methods of the future dynamics of the COVID-19 pandemic can
                 significantly improve the quality of the outbreak`s containment, the number of such models
                 specifically for Ukraine is rather low. We applied a compartment epidemiological model with
                 heuristics along with machine learning techniques in order to create an effective method
                 of modeling and prediction of the COVID-19 epidemic in Ukraine. The stages of the
                 proposed method are building a SEIRD compartment model with vital dynamics,
                 estimating its parameters, calculating and predicting the difference between the SEIRD
                 model solution and the observed data using the ARIMA model, and adjusting model
                 prediction using this newly obtained data on the residuals. The proposed method was
                 tested on the data on the epidemic`s dynamic in Ukraine obtained from a Ukrainian finance
                 analytics website. The validation results indicate the method`s aptitude to real-world usage.

                 Keywords 1
                 COVID-19, SEIRD, ARIMA, Hybrid Dynamic Model

1. Introduction

    As the coronavirus pandemic continues to rattle the world, humanity craves for means to alleviate
the situation if not overcome the crisis entirely. Quality estimations and predictions of future
dynamics of the disease spread will ensure better prevention and thorough preparation for
exacerbations of the problem (such as expected rises in infection cases after the holidays or lockdown
lifts). Rational use of resources may help avoid future boiling points for the healthcare and other
systems critical to the delivery of the COVID-19 response.
    While the patterns of the epidemic`s dynamics may be similar across countries, each country has
specifics in demographics, economics, epidemic containment methods, amount of available resources,
and cultural particularities, and therefore should be considered separately by researchers and scientists
aiming for creating models with potential for practical usage. As shown in Figure 1, the World
Health Organization reports that Ukraine has one of the highest numbers of daily increase in the
number of infected individuals. Multiple models have been proposed as methods for modeling and
prediction of the epidemic around the world. In contrast, the papers count for Ukraine remains
relatively low. Perfecting the techniques of epidemic modeling specifically for Ukrainian statistics by
independent researchers will accelerate the process of finding optimal tools and algorithms for the
best possible results in models` performance. Networking and spreading awareness on novice helpful
solutions and findings are crucial to this process.
    The SEIR model replicates the “time-history” of any epidemic or pandemic outbreak, and it
presents the model of dynamic interaction between people with four different health conditions or
phases of the pandemic, namely the susceptible (S), exposed (E), infective (I), and recovered (R).

IT&I-2020 Information Technology and Interactions, December 02–03, 2020, KNU Taras Shevchenko, Kyiv, Ukraine
EMAIL: yaroslav.linder@gmail.com (A. 1); mmveres@gmail.com (A. 2); kuzminovakateryna@gmail.com (A. 3)
ORCID: 0000-0003-1076-9211 (A. 1); 0000-0002-8512-5560 (A. 2); 0000-0003-1236-5659 (A. 3)
            ©️ 2020 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                               204
SEIRD model, as a generalization of the SEIR model, has an additional variable – Deceased
individuals. A “Formal Characterization and Model Comparison Validation” based on the SEIRD
model, which uses the data from Korea and Spain, is proposed by Casas et al. [3]. The proposed
model showed the predicted parameterization with empirical evidence and a decision support system
(DSS) is implemented to study the nature of the pandemic in Catalonia [3].


Figure 1: Rating of countries by the number of daily new infection and death cases provided by the
World Health Organization

    A data-driven model to predict the spread of Covid-19 for an upcoming week using the SEIRD
model is studied and tested for datasets obtained from Italy, India, and Russia [2]. The proposed
model [2] produces results in which the parameters are calculated from the data to plan for the future
requirement of PPEs for hospital staff and healthcare devices. Contrarily, the transmission dynamics
of Covid-19 were evaluated based on a SEIRD compartmental modeling approach by Mukaddes et al
[4]. However, external influences such as weather, herd immunity were not considered as a part of the
study. A generalized SEIR model study on the Italian Covid-19 dataset was carried out by Godio et al.
[5] with parameters adjusted via Swarm Optimization Algorithm. The authors [5] claim that the
method followed aims to enhance the reliability of predictions. This research is spearheading in the
regions of Spain and South Korea, however, has its limitations that include the conditions of partial
infections due to exposure [6], or it classifies the category of symptomatic and asymptomatic cases [7]
due to the nature of the epidemic spread.

2. Materials and Methods
2.1.    Database
   The proposed method was tested on the data on the epidemic`s dynamic in Ukraine obtained from
a Ukrainian finance analytics website [8]. The dataset includes daily information on the number of
infected, recovered, and deceased individuals. The data is updated daily, enabling researchers to
update model parameters frequently to achieve the highest accuracy possible. The first available
observation dates back to March 3. Dataset consists of such columns:


                                                                                                   205
          1. Cumulative infected people as of each date (the total number of diagnosed people until
             each date);
          2. Cumulative recovered people from the start of the outbreak (the total number of no longer
             ill people who gained immunity until each date);
          3. Deceased people from the start of the outbreak.

Table 1
Statistics of the COVID outbreak in Ukraine as of key dates in government responses to the COVID-
19 pandemic
   Date\Key statistics        Infected           Increase in        Recovered        Deceased
                                                   Infected
        March 12                  1                    0                 0                0
        March 23                  73                   10                1                3
          April 6               1319                   11               28               38
          May 7                 13691                 507              2396             340
          June 1                24012                 664             19548            1173
          July 22               60995                 829             33172            1534
        August 26              110085                1670             53454            2354
     September 28              201305                2671             88453            3996

    The key dates in the dynamics of the outbreak are stay-at-home advisories enactments and other
government-enforced restrictions (March 12, March 23, April 6, July 22, August 26) and their lifts
(May 7, June 1). The last day of observation used while building this method is September 28. The
observed data as of those dates is reported in Table 1. The data as of later dates (up until October 19)
is used for validation of the proposed method. Since lockdown and other introduced measures didn`t
significantly drop the outbreak`s spread rate, they aren`t considered in the proposed model, and basic
model parameters are proposed to take as fixed. Due to the small number of cases of re-infection, all
recovered individuals are assumed to have absolute immunity against COVID-19.

    0.9
    0.8
    0.7
    0.6
    0.5
    0.4
                                                                                         Mortality Rate
    0.3
                                                                                         Expon. Trend Line
    0.2
    0.1
      0
           26.03.2020
           02.04.2020
           09.04.2020
           16.04.2020
           23.04.2020
           30.04.2020
           07.05.2020
           14.05.2020
           21.05.2020
           28.05.2020
           04.06.2020
           11.06.2020
           18.06.2020
           25.06.2020
           02.07.2020
           09.07.2020
           16.07.2020
           23.07.2020
           30.07.2020
           06.08.2020
           13.08.2020
           20.08.2020
           27.08.2020
           03.09.2020
           10.09.2020
           17.09.2020
           24.09.2020


   Figure 2: COVID-19 mortality rate dynamics and its trend line

    As reflected in Figure 2, the COVID-19 mortality rate has decreased and stabilized over time,
which was reflected in the dynamic model. This can be explained by continuous scientific efforts to
cure the disease more efficiently as well as the proportion of asymptomatic and undiagnosed cases
that aren’t reflected in statistics. The relatively stable mortality rate observed in later months proves
the disease to be lethal to a small portion of the population and is expected to stay at this level or
slightly decrease. The data instances used while working with the model are represented in percents of
the country`s population.

                                                                                                     206
2.2.    The Hybrid Dynamic Model Framework
    Upon investigation, we introduce a novice model based on an enhanced SEIRD model and
ARIMA model. As shown in Figure 2, the stages of the proposed method are building a SEIRD
compartment model with vital dynamics, estimating its parameters, calculating and predicting the
difference between the SEIRD model solution and the observed data using the ARIMA model, and
finally adjusting model prediction using this newly obtained data on the residuals.


Figure 3: The workflow of the proposed algorithm

   This model consists of such stages:
   1. At the first one, we estimate SEIRD model parameters using historical data, trying to lessen
   the difference between the model`s output and observed data. This model is responsible for long-
   term prediction (i.e., 60 days or 100 days).
   2. Calculate residuals between observed infected, recovered, and deceased percentage of the
   population and corresponding solutions of the SEIRD model.
   3. Build three ARIMA models on the time-series of each of these residuals. Prediction of these
   ARIMA models will compensate residuals between the SEIRD model and historical data in order
   to make predictions mode accurate.
   4. Validate the prediction of the obtained model using the data on the number of infected,
   recovered, and deceased individuals as of the most recent days, data on which was not included
   while working with the model on previous stages.

2.3.    SEIRD Model with Vital Dynamics and Dynamic Mortality Rate
    A basic compartment model in epidemiology is the SIR model [9, 10], which studies the
population`s flow between three compartments: Susceptible, Infected, and Recovered. It has already
been applied to the recent COVID-19 pandemic and showed good results [11]. The next level of
complexity is introducing vital dynamics (birth and mortality rates) to the model [12]. Since the
coronavirus disease has quite a long incubation period, it is logical to model the pandemic with
another compartment – Exposed individuals who already are infected but cannot spread the virus
further yet. Such model is called a SEIR compartment model. One more introduced compartment that
completes our compartment structure is Deceased individuals.
    A SEIRD model simulates the flow of the population between Susceptible, Exposed, Infected,
Recovered, and Deceased groups (or compartments). While traditionally compartment models are
built for closed systems, in this method, the total population size is not fixed due to the introduction of
birth and mortality rates. This allows us to model the pandemic more accurately. The COVID-19

                                                                                                       207
mortality rate is represented by an inverse exponential function with two parameters rather than a
constant. Based on the analysis shown in Figure 1, it was proved to be useful to model mortality rate
as an inverse exponential function, which is another heuristic to the proposed method for the same
reason.
    The compartments of the model are as follows:
   •    𝑆(𝑡): Susceptible individuals - stock of healthy people who may be infected; population
   inflow due to births is taken into account.
   •    𝐸(𝑡): Exposed individuals - virus carriers in the latent stage, during which they
   are not virus spreaders. Usually corresponds to an asymptomatic phase of the disease.
   •    𝐼(𝑡): Infectious individuals - virus carriers able to spread the disease to individuals in contact
   with them.
   •    𝑅(𝑡): Recovered individuals - stock of healthy people who are immune to COVID-19.
   •    𝐷(𝑡): Deceased individuals - population loss due to the disease, natural deaths included.
   The model itself is comprised of a system of differential equations:
                                        𝑑𝑆                𝛽𝑆𝐼
                                           = 𝛬𝑁 − µ𝑆 −
                                        𝑑𝑡                 𝑁
                                        𝑑𝐸 𝛽𝑆𝐼
                                           =      − (µ + 𝜎 )𝐸
                                        𝑑𝑡     𝑁
                                           𝑑𝐼
                                           𝑑𝑡
                                              = 𝜎𝐸 − ( ç + µ)𝐼                            (1)
                                    𝑑𝑅
                                       = (1 − µ𝐶𝑂𝑉𝐼𝐷 (𝑡)) 𝐼 − µ𝐼
                                    𝑑𝑡
                                        𝑑𝐷
                                            = µ𝐶𝑂𝑉𝐼𝐷 (𝑡)𝐼
                                        𝑑𝑡
  with constraints at time t=0 S=𝑆0 , E= 𝐸0,I= 𝐼0 ,R = 𝑅0 , D=𝐷0 and parameters
  •     𝛬 – population`s birth rate;
  •     µ – population`s mortality rate;
  •     𝛽 – rate of virus transmission, which is the probability of transmitting disease between a
  susceptible and an infectious individual;
  •     𝜎 – rate of latent individuals becoming infectious (average duration of incubation is 1/𝜎);
  •     ç – recovery rate, which can be initially estimated as = 1/𝐷, where 𝐷 is the average duration
  of infection;
  •     µ𝐶𝑂𝑉𝐼𝐷 (𝑡) – death rate due to COVID-19, which is estimated by an inverse exponential
  formula µ𝐶𝑂𝑉𝐼𝐷 (𝑡) = 𝛼𝑒 −𝜉𝑡 .
  The population size 𝑁(𝑡) = 𝑆(𝑡) + 𝐸(𝑡) + 𝐼(𝑡) + 𝑅 (𝑡) is not fixed due to its global birth and
mortality rates taken into account at any given time t.

2.4.    Parameter Estimation Using Basin-hopping Algorithm
    To use the model proposed in the previous section, firstly, we need to specify its parameters so it
will fit the historical data. Moreover, we estimate not only the model parameters but also initial
conditions for susceptible and exposed compartments of the model. The reason of it that we still don’t
know the percentage of the population that is insusceptible to the virus (they will suffer from the
disease in a mild form and don’t infect others). Regarding the exposed population, we also don’t have
the exact number of exposed passengers that came to Ukraine at the time of the COVID outbreak.
    As soon as the dataset consists of cumulative data, we calculated the number of currently infected
individuals as a difference between cumulative infected and recovered ones. After this step, data was
rescaled from the absolute numbers to the percent of the population.
    To fit model parameters and initial conditions, we use the Basin-hopping algorithm [13]. This
iterative heuristic algorithm is a generalization of the simulated annealing algorithm, which was
inspired by molecular processes that occur in metalwork. The procedure of annealing is used to
achieve the optimal molecular arrangements of metal particles. While cooling, heated material comes
into shape with minimal system energy - and therefore, less or no defect. After choosing an initial

                                                                                                      208
state, the algorithm picks the neighboring state and proceeds to decide on moving to it or staying and
then iterates this process until finding the global optimum or reaching the iterations limit. As a
generalization to simulated annealing algorithm, Basin-hopping global optimization technique
randomly perturbates coordinates and proceed to find the global optimum in a similar manner.
    One of the key reasons for choosing this instrument is the algorithm`s ability to reach global
optima even after finding several local ones, as it is not restricted to the best candidates at each step.
As a measure of quality between differential equation solution and historical data, we use MAE/mean
metrics that were described and investigated in [14]. Thus, as an objective function of the Basin-
hopping algorithm, we select the sum of
                                    𝑀𝐴𝐸(𝐼𝐴 ,𝐼)       𝑀𝐴𝐸(𝑅𝐴 ,𝑅)       𝑀𝐴𝐸(𝐷𝐴 ,𝐷)
                                                 +                +
                                       𝐼𝐴               𝑅𝐴               𝐷𝐴
   where 𝐼𝐴 (𝑡) is the actual percentage of the population that stays infected at day 𝑡, 𝑅𝐴 (𝑡)(2)  is the
actual percentage of the population that overcame the disease till day 𝑡, 𝐷𝐴 (𝑡) is the actual percentage
of the population that was deceased till day 𝑡, 𝐼𝐴 , 𝑅𝐴 and 𝐷𝐴 is the average values of infected,
recovered, and deceased values over time domain, 𝑀𝐴𝐸(⋅,⋅) is calculated according to equation (2).

2.5.    ARIMA Models for Residual Estimation
    In this step, the difference between data by SEIRD algorithm and observed data is estimated and
corrected using the ARIMA model (stands for Auto-Regressive Integrated Moving Average).
    The structure of this model includes autoregression and moving average as the main components.
The autoregression algorithm uses a certain number of past data instances (also called the number of
lagged observations) to make a prediction about variable value at each new point, exploring trends
and co-dependencies of observations.
    Differentiation of raw data is performed to ensure stationarity of variable: each value at time t is
subtracted from the value at time t-1.
    The third part, moving average, also makes use of dependencies in the data, but this time between
an observation and a residual error from applying the moving average algorithm to a number of
lagged observations.
    To each of these parts corresponds a parameter [15], where each parameter is an integer value:
    p: Lag order, or number of past observations considered by the model;
    d: Degree of differencing, or how many times raw observations are differenced;
    q: Order of moving average, or window size for moving average algorithm.
    In our case, an algorithm that finds the best set of parameters and runs statistical tests of
stationarity and seasonality is used.
    The obtained prediction of residuals is subtracted from data predicted by the compartment model
in order to increase its performance.

2.6.    Validation
    During the validation stage, we gather new data that was not used in SEIRD model parameter
estimation and ARIMA models fitting. We will use such measures of quality:
    1. Mean average error, given by equation
                                                 𝑇
                                     1
                         𝑀𝐴𝐸(𝑦, 𝑦̂) = ∑|𝑦(𝑡) − 𝑦̂(𝑡)|                                               (2)
                                     𝑇
                                            𝑡=1
    2. Mean squared error, given by equation
                                            𝑇
                                     1               2
                        𝑀𝑆𝐸 (𝑦, 𝑦̂) = ∑(𝑦(𝑡) − 𝑦̂(𝑡))                                               (3)
                                     𝑇
                                            𝑡=1
    3. Mean squared logarithmic error, given by equation


                                                                                                      209
                          𝑇
                    1
       𝑀𝑆𝐸 (𝑦, 𝑦̂) = ∑(𝑙𝑜𝑔 𝑙𝑜𝑔 (𝑦(𝑡) + 1) −𝑙𝑜𝑔 𝑙𝑜𝑔 (𝑦̂(𝑡) + 1) )2                                  (4)
                    𝑇
                         𝑡=1
    4. Normalized mean average error, given by equation
                                                 𝑀𝐴𝐸 (𝑦, 𝑦̂)
                          𝑁𝑜𝑟𝑚𝑀𝐴𝐸(𝑦, 𝑦̂) =                                                         (5)
                                                   (𝑦 )
    5. Normalized mean squared error, given by equation
                                                 𝑀𝑆𝐸 (𝑦, 𝑦̂)
                          𝑁𝑜𝑟𝑚𝑀𝐴𝐸(𝑦, 𝑦̂) =                                                         (6)
                                                 (𝑦) ⋅ (𝑦̂)
   where (𝑥) denotes mean value of time series 𝑥. Moreover, we calculate maximum deviation
between the main prediction line and two scenarios (optimistic and adverse) that are calculated from
ARIMA models using a 95% confidence level. The equation of this measure is
                                              |𝑦(𝑡) − 𝑦̂(𝑡)|
                         𝑀𝑎𝑥𝐷𝑒𝑣(𝑦, 𝑦̂) =                                                           (7)
                                                  𝑦(𝑡)

3. Results
   In this section, we will provide results of hybrid model approbation on data from the Ukrainian
finance analytics website [8].

3.1.    SEIRD Model

   In this subsection, we estimate some parameters and initial conditions of the SEIRD model using
the Basin-hopping algorithm and build rough long-term predictions of pandemic development. We
optimize only initial values of susceptible and exposed fraction of the population, whilst infected,
recovered, and deceased initial conditions are set to zero. Global birth and death rate are also not
optimized and are set according to actual values for the annual 2020 birth and death rate in Ukraine.

Table 2
Optimized parameters and initial conditions of the SEIRD model
     Parameter                Description              Minimum           Maximum       Optimized
                                                         value             value         value
         𝜎       Rate of latent individuals becoming        0             0.1           0.0047
                                infectious
         𝛽        Probability of transmitting disease       0              1            0.1529
                     between a susceptible and an
                           infectious individual
         ç       Recovery rate, which can be initially      0             0.1           0.0172
                 estimated as = 1/𝐷, where 𝐷 is the
                     average duration of infection
         𝛼         Starting death rate from COVID           0             0.3           0.1695
         𝜉       Decaying speed of death rate due to        0             0.1           0.0121
                      enhancements in treatment
         𝑆0          Initial fraction of susceptible       0.4             1            0.5541
                                population
        𝐸0      Initial fraction of exposed population      0             0.05          0.0008

   In Table 2, boundaries and optimized values for the SEIRD model parameters and initial values
are shown. As we can see from the table, the initial fraction of the susceptible population is more than
half of it - 55%, which correlates with recent research that most of the population will suffer from the

                                                                                                    210
disease in a mild form or even asymptomatically. Interestingly recovery rate is very low, which means
that if a person suffers from the disease in a severe form, it takes a lot of time to recover. The rate of
becoming infectious is also shallow, which proves that it takes a lot of time for the disease to be able
to spread itself since acquiring a new host - the incubation period of COVID-19 is quite large. While
all of the parameters have a real-life context to them and represent rates of transitions between
compartments and initial conditions of the SEIRD model, they were estimated using mathematical
algorithms, and that worked with available data that doesn’t entirely reflect the reality. Therefore, the
estimated values of some parameters such as the incubation period and recovery rate may differ from
the data collected at hospitals and estimates of other researchers.
    The pure SEIRD model can be used for the long-term rough predictions of the pandemic dynamic.


Figure 4: (a) Long-term prediction of the infected and recovered fraction of population (b) Long-term
prediction of the deceased fraction of population


                                                                                                      211
    In Figure 4, long-term predictions for infected, recovered, and deceased fractions of the population
are displayed. Based on the figures, we can conclude that number of infected people will continue to
rise till summer 2021with a relatively stable rate.

Table 3
Quality measures of fitted SEIRD model
   Category /            MAE           MSE                 MSLE         Normalized       Normalized
    measure                                                                MAE              MSE

   Infected         2.03 ⋅ 10−4      4.79 ⋅ 10−8 4.68 ⋅ 10−8           1.66 ⋅ 10−2      3.28 ⋅ 10−4
   Recovered        8.45 ⋅ 10−5      7.94 ⋅ 10−9 7.81 ⋅ 10−9           1.04 ⋅ 10−2      1.20 ⋅ 10−4
   Deceased         8.07 ⋅ 10−6      9.00 ⋅ 10−11 8.99 ⋅ 10−11         1.37 ⋅ 10−2      2.59 ⋅ 10−4

   Based on Figure 4 and Table 3, we can conclude that the SEIRD model fits historical data quite
well. The best fit is observed for recovered and infected compartments of the model. Unnormalized
measures are the lowest for the infected fraction population, which is the most informative data time-
series among the studied ones.

3.2.    ARIMA Models

   At this step, we calculate residuals between the fitted SEIRD model and historical data and train
ARIMA models on the residuals for each category (infected, recovered, deceased). While having its
limitations [16], ARIMA can help capture any non-noisy patterns. To estimate optimal ARIMA
parameters P and Q, we use the Akaike information criterion, and to estimate the optimal D
parameter, we use the Augmented Dickey-Fuller test [17].
Table 4
Parameters of ARIMA models for each category
   Category/parameter          The order of the            The degree of       the order of the
                            autoregressive model         differencing (D)      moving-average
                                      (P)                                          model (Q)
         Infected                      0                        2                      2
        Recovered                      0                        2                      0
         Deceased                      0                        2                      0

    In Table 4, the estimated parameters of ARIMA models for each category are presented. Worth
mentioning that for all three categories, P and Q parameters are the same, which is a good sign that
tells us that the behavior of residuals time series is the same and can be simulated using similar (or
even the same) models. After training ARIMA models, we evaluate predictions for all three categories
60 days ahead.
    The analysis of modeling and prediction of the number of infected individuals (Figure 5) shows
that the number of observed cases of the disease grew steadily during the first half of the outbreak
(mid-July) and is very accurately modeled with our method.
    The deviation of the predicted number of infected individuals from the observed data in the second
half of July and August is most likely caused by the insufficient number of tests for COVID-19
performed during this period.
    The inconsistency in testing and changing levels of quarantine severity explain further deviations
of observed data from the output of the SEIRD model. The prediction, corrected by ARIMA residual
estimation, steadily increases, with optimistic and pessimistic scenarios (lower and upper bounds of
the grey area, respectively) deviating by less than 0.1%.


                                                                                                    212
    As shown in Figure 6, until early August, the losses from COVID-19 are quite accurately modeled.
It is safe to assume that some people who passed away due to the disease were undiagnosed or
misdiagnosed.


Figure 5: The observed number of infected individuals (blue), number of infected individuals
modeled with SEIRD model (yellow), and predicted number of infected individuals (green) by SEIRD
model and corrected by ARIMA residual prediction with 95% confidence interval (grey)


Figure 6: The observed number of deceased individuals (blue), number of deceased individuals
modeled with the SEIRD model (yellow), and predicted number of deceased individuals (green) by
SEIRD model and corrected by ARIMA residual prediction with 95% confidence interval (grey)


                                                                                                213
    Therefore the data on those cases was not taken into account in COVID statistics, which explains
the observed number of deceased people being slightly lower. In later months we observe a gradual
rise – the medical system isn`t well prepared for the pressure of the pandemic and struggles to cope
with the growing inflow of patients.
    Hopefully, there will be a decline in the COVID death rate due to the development and spreading
of treatment protocols and medical research that allow selecting the most effective medicine. In the
meanwhile, despite all the measures of previous months, the predicted number of deceased
individuals rises quite sharply.


Figure 7: The observed number of recovered individuals (blue), number of recovered individuals
modeled with SEIRD model (yellow), and predicted number of recovered individuals (green) by
SEIRD model and corrected by ARIMA residual prediction with 95% confidence interval (grey)

   The proposed method describes the observed number of recovered individuals very accurately
(Figure 7) with some minor deviations, while in the future stages of the outbreak, the number of
people recovered is expected to be lower than the SEIRD model suggests. It can be explained by a
lack of techniques and materials to treat the patients and the already beginning congestion of the
medical system of the country.

3.3.   Validation

   Validation of any method is an essential step that helps understand how the final model will
perform in the future with new previously unseen data. The method was validated on the most recent
data - the last three weeks (from 29.09.2020 to 19.10.2020) of the pandemic. The validation dataset
was taken from the same source and therefore has the same structure.
   As shown in Table 5, all measures of the prediction quality for the infected, recovered, and
deceased fractions of the population are very low. Normalized MAE values show that:
   1. Average difference between the actual number of infected individuals and predicted one is
   only 3.6%;


                                                                                                214
   2. Average difference between the actual number of recovered individuals and predicted one is
   only 11%;
   3. Average difference between the actual number of deceased individuals and predicted one is
   only 8.4%;
   4. Based on the maximum deviation column, we can conclude that for the next 60 days starting
   from the last day of model training:
   5. Maximum deviation between the predicted and actual number of infected individuals will not
   exceed 8.6% with the probability of 95%.
   6. Maximum deviation between the predicted and actual number of recovered individuals will
   not exceed 15.4% with the probability of 95%.
   7. Maximum deviation between the predicted and actual number of deceased individuals will
   not exceed 15.5% with the probability of 95%.

Table 5
Quality measures of the fitted model for validations set
                 MAE             MSE              MSLE        Normalized     Normalized     Max.
                                                                 MAE            MSE       deviation
 Infected    1.13 ⋅ 10−4     2.51 ⋅ 10−8      2.50 ⋅ 10−8     3.59 ⋅ 10−2    2.62 ⋅ 10 −3   8.6%
Recovered    2.76 ⋅ 10−4     9.25 ⋅ 10−8      9.21 ⋅ 10−8     1.1 ⋅ 10−1     1.66 ⋅ 10 −2  15.4%
Deceased     9.28 ⋅ 10−6     1.24 ⋅ 10−10     1.24 ⋅ 10−10    8.41 ⋅ 10−2    1.11 ⋅ 10 −2
                                                                                           15.5%

4. Discussion and Conclusions

    The proposed hybrid model consists of a dynamic SEIRD model with vital dynamics and decaying
COVID mortality rate and three ARIMA models that cancel out dynamic model residuals and
enhance prediction quality. The model was tested on Ukrainian COVID statistic data. Obtained
validation results allow us to draw conclusions that the proposed hybrid model has good prediction
ability and decent performance. Obtained long-term predictions reflect the general dynamic of the
outbreak and are especially useful for the healthcare system workers and government officials.
Obtained short-term predictions allow us not only to forecast the future number of infected,
recovered, and deceased patients but only estimate forecast error under adverse or optimistic
circumstances.
    Key method`s standouts include:
    1. Using a Basin-hopping algorithm to fit parameters and initial conditions of the model for this
    specific disease.
    2. Including into the SEIRD model exponentially decaying mortality rate, which reflects historic
    dynamics over the year of 2020.
    3. Correction of model residuals using the ARIMA model with automatically selected
    parameters.
    Here are some perspective ways of further development of the proposed method:
    1. Parameter estimation with different algorithms and boundaries;
    2. Testing the method on COVID statistics other countries;
    3. Develop alternative methods for residue prediction.
    Enhancing the proposed hybrid model depends on profound research results about COVID-19.
That’s why monitoring recent research in the field and quickly adjusting the model according to the
new data is crucial.
    In conclusion, the proposed method has proved its predictive capability and can be used as an
effective tool for prediction and analysis of the dynamics regarding the number of infected, recovered
and deceased individuals due to the COVID-19 pandemic in Ukraine. The predicted optimistic and
pessimistic scenarios of the infection spread for the nearest future are very similar, so we can


                                                                                                  215
conclude with sufficient confidence. Unfortunately, these conclusions give reasons to believe that the
most difficult times are still ahead of us. Such results are extremely important in terms of planning
disease containment measures on all levels - from governmental to personal. The analysis of obtained
data indicates the forthcoming of a crisis - most importantly, in medical and economical spheres, and
naturally suggests that all possible rational preemptive actions should be taken immediately.

5. References

[1] Hethcote, H.W., 1989. Three basic epidemiological models. In Applied mathematical ecology
     (pp. 119-144). Springer, Berlin, Heidelberg.
[2] Rapolu, T., Nutakki, B., Rani, T.S., and Bhavani, S.D., 2020. A Time-Dependent SEIRD Model
     for Forecasting the COVID-19 Transmission Dynamics. medRxiv.
[3] Fonseca i Casas, P., García Carrasco, V. and Subirana, J., 2020. SEIRD COVID-19 Formal
     Characterization and Model Comparison Validation. Applied Sciences, 10(15), p.5162.
[4] Mukaddes, A.M.M., Sannyal, M., Ali, Q. and Kuhel, M.T., 2020. Transmission Dynamics of
     COVID-19 in Bangladesh-A Compartmental Modeling Approach. Available at SSRN 3644855.
[5] Godio, A., Pace, F. and Vergnano, A., 2020. SEIR Modeling of the Italian Epidemic of SARS-
     CoV-2 Using Computational Swarm Intelligence. International Journal of Environmental
     Research and Public Health, 17(10), p.3535.
[6] Shi, P., Cao, S. and Feng, P., 2020. SEIR Transmission dynamics model of 2019 nCoV
     coronavirus with considering the weak infectious ability and changes in latency duration.
     MedRxiv.
[7] Shaikh, A.S., Shaikh, I.N. and Nisar, K.S., 2020. A mathematical model of covid-19 using
     fractional derivative: Outbreak in India with dynamics of transmission and control.
[8] Ukrainian             finance            analytics         website,          2013.       URL:
     https://index.minfin.com.ua/ua/reference/coronavirus/ukraine/.
[9] T. Harko, F. Lobo, and M. K. Mak, “Exact analytical solutions of the Susceptible-Infected-
     Recovered (SIR) epidemic model and of the SIR model with equal death and birth rates,” Appl.
     Math. Comput., vol. 236, pp. 184–194, 2014, doi: 10.1016/j.amc.2014.03.030.
[10] R. Beckley, C. Weatherspoon, M. Alexander, M. Chandler, A. Johnson, and G. S. Bhatt,
     “Modeling epidemics with differential equations,” 2013.
[11] W. Yang, D. Zhang, P. Liangrong, C. Zhuge, and L. Hong, “Rational evaluation of various
     epidemic models based on the COVID-19 data of China.” 2020, doi:
     10.1101/2020.03.12.20034595.
[12] P. Shi, S. Cao, and P. Feng, “SEIR Transmission dynamics model of 2019 nCoV coronavirus
     with considering the weak infectious ability and changes in latency duration,” medRxiv, 2020,
     doi: 10.1101/2020.02.16.20023655.
[13] D. Wales and J. Doye, “Global Optimization by Basin-Hopping and the Lowest Energy
     Structures of Lennard-Jones Clusters Containing up to 110 Atoms,” J. Phys. Chem. A, vol. 101,
     1998, doi: 10.1021/jp970984n.
[14] S. Kolassa and W. Schütz, “Advantages of the MAD/mean ratio over the MAPE”, Foresight Int.
     J. Appl. Forecast., vol. 6, pp. 40–43, Jan. 2007.
[15] A.      Pankratz,     “Notation     and      the   Interpretation    of    ARIMA    Models”,
     https://doi.org/10.1002/9780470316566.ch5, 6 August 1983
[16] S. Wang, C. Li, A. Lim, “Why Are the ARIMA and SARIMA not Sufficient”, April 2019,
     arXiv:1904.07632.
[17] D. Dickey, “192-30: Stationarity Issues in Time Series Models,” 2005.


                                                                                                  216