Risks of Data Inconsistency in Information Systems Used for Predicting the Pandemics Development Volodymyr Bakhrushin1[0000-0003-3771-5256], Anna Bakurova2[0000-0001-6986-3769], Mariia Pasichnyk3[0000-0002-5179-4272] and Elina Tereschenko4[0000-0001-6207-8071] 1,2,3,4National University «Zaporizhzhia Polytechnic», Zaporizhzhia, Ukraine 1vladimir.bakhrushin@gmail.com,2abaka111060@gmail.com, 3mary.pasechnik@gmail.com, 4elina_vt@ukr.net Abstract. Predicting the pandemics development is based on mathematical mod- els and empirical data. Prediction errors can lead to ineffective decisions, both in terms of protecting human health and in terms of the economy. In this regard, it is important to prevent the risks associated with the irrelevance and inaccuracy of data contained in the information systems used for forecasting. Experience in predicting the development of COVID-19 pandemic shows that primary data are not always suitable for direct application in mathematical models. One of the problems is the reliability of data on cases and deaths. Different countries have different approaches to their detection and registration, which may also change over time. Another problem is the deviation of real dynamics from the assump- tions of the basic models, in particular, due to spatial heterogeneity, changes in quarantine measures and different practices of their observance, and so on. This can result in significant errors in predicting the number of new cases, the number of deaths, the probability and expected parameters of the "second wave", and so on. In this regard, some indicators of pandemic development and possible ap- proaches to eliminate the risks caused with the specifics of the relevant data con- tained in information systems were analyzed. The proposed system of measures to identify and prevent the risks of data incon- sistencies in information systems used to predict the development of pandemics that could be useful in the development of The Risk-Informed Systems Analysis (RISA). Keywords. RISA, Information, COVID-19, prediction, risk, data, reliability, ac- curacy, sources of errors. 1 Introduction The COVID-19 pandemic was one of the most critical events of 2020, resulting in nu- merous casualties, significant economic downturn in most countries and other negative consequences. The choice of effective solutions for the response of public authorities to the challenges of a pandemic requires reliable assessments of various risks, which requires relevant models and data. Therefore, to reduce the risks of ineffective health Copyright © 2020 for this paper by its authors. This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0). care solutions, the economy needs to implement evidence-based policies. This requires reliable information on decision-making issues. For COVID-19, such data were almost non-existent at the initial stage. But researches are gradually emerging that could sig- nificantly change the prognosis of the pandemic and its aftermath in different scenarios and different strategies to prevent pandemic. One key issue whose solution is needed to develop effective measures to counteract the spread of the COVID-19 pandemic is to make a substantial increase in the accuracy of forecasts for future morbidity, mortality, social and economic consequences. At the beginning of the pandemic, such predictions were based mainly on relatively simple mathematical models and limited data sets. But over time, new pandemic data were emerging in different countries that could be used to improve models, increase forecast accuracy, and make better decisions. The difference between this year's pandemic and previous ones is large-scale re- search on the new SARS-CoV-2 coronavirus, testing of people who can be infected, studying the impact of SARS-CoV-2 on various systems of the human body, as well as social, economic and other consequences. One of the results of this research was the creation of large-scale data collection systems (which also include RISA), which are used to make strategic and operational decisions and recommendations at the level of governments and international organizations. However, over time it becomes clear that individual data are not reliable and relevant, similar data from different countries are not always comparable, available data are not always suitable for direct application in mathematical models used to build pandemic forecasts, and so on. This creates the risk of making big mistakes in forecasts and decisions based on them. Analysis and imple- mentation of measures to prevent risks arising in the current conditions of collection, presentation and application of primary data on morbidity and mortality will signifi- cantly increase the effectiveness of strategic and operational decisions to limit the spread of the pandemic. As part of the system risk analysis, The Risk-Informed Systems Analysis (RISA) helps support decision-making in a pandemic related to economics, reliability and se- curity, provides the use of RISA-tools to quantify projected differences by region, re- duce costs by reducing risks. 2 Related Works Prognostic mathematical models are the basis for understanding the development of the pandemic and making effective decisions to prevent its spread [1]. The first solution to the development of the COVID-19 pandemic was robbed based on the results of pre- dictions based on simple SIR, SEIR models and their modifications [2], which may have been prompted by the supervisors of the system and the extraordinary differential equations. Thus, SEIR models include such groups of people: S - Susceptible (number of people who has not been infected and has no immunity); E - Exposed (number of people who are currently infected, but are not contagious); I - Infected (number of peo- ple who are currently infected and are contagious); R - Recovered (number of recovered people who have immunity). According to these models, the dynamics of daily cases of the disease is described by a symmetrical or asymmetric peak, and the dynamics of the total number of cases has the shape of an S-shaped curve. Such models are still used for forecasting in many countries. It was on their basis that strict quarantine measures were introduced at the beginning of the pandemic [3]. However, they are oversimplified and are only suitable for qualitative analysis under certain conditions. In particular, they do not take into account the heterogeneity of the distribution of the active population, the impact of the demographic structure of the population on key indicators and so on. In addition, such models use a set of constants - basic reproduction number, effective contact rate, recov- ery delay, as well as empirical data on the number of people who can be infected, the number of infected and the number of those who recovered or died. However, these constants can be estimated only by indirect methods and are in fact quantities that change in time and space, and some empirical indicators are determined with large er- rors. From the point of view of strategic decision-making, an important disadvantage of these models is that they describe only the "first wave" condition, which is the only one according to such models. But minimizing morbidity and mortality through strict quarantine during the first wave does not answer the question of what will happen after the quarantine is relaxed. And whether the decisions made will remain optimal, given the longer period of time, as well as additional mortality due to stress, limited access to health care and diagnosis, deteriorating quality of life, and so on. Recently, more complex models [1, 4-6] have been increasingly used to predict the development of a pandemic, which, in particular, can take into account a larger number of parameters and their temporal and spatial changes in computer implementation. However, the parameters of such models are determined by the quality of approxima- tion of empirical data, which increases the impact of errors in these data on modeling results and forecasts. Therefore, even for relatively short-term forecasts, they can pro- vide a scatter of results in 1-2 orders of magnitude higher [7]. One of the main problems is the significant underestimation of real data in infor- mation systems on the number of infected. Sample studies for the presence of IgG and memory T cells conducted in different countries, show that the total number of people who were infected and have antibodies to SARS-CoV-2 may be 1 - 2 orders of magni- tude higher than the number officially registered cases: [10-13]. This problem is less critical in terms of forecasting the dynamics, unless there is a significant change in policy or scope of testing. But it is becoming very critical in choosing strategies to counter the pandemic and assess the likelihood and scale of new outbreaks. The latter significantly depend on the proportion of the population that is immune to infection. For its realistic assessment it is necessary to know the proportion of people who have already fallen ill and have immunity. It is also important for strategy selection to assess and compare risks, in particular expected mortality from COVID-19, mortality from other diseases, including side effects from pandemics and quarantine measures, and expected social and economic consequences from different solutions. From this point of view, the indicators of Case fatality rate (CFR) and Infection fatality rate (IFR) are important. The first indicator provides estimates of infection mortality based on primary data on the number of reported infections and deaths. Due to these problems, underes- timation of the real number of infected CFR mortality estimates is significantly over- estimated and for most countries is in the range of 1 - 15%. IFR estimates are more realistic. They are usually obtained on the basis of model parameters identification and sample surveys. According to the latest data, the most probable IFR values are in the range of 0.1 - 1%, and according to some data, this value may be less than 0.1% [14- 16]. The Working Group on Mathematical Modeling of Problems Related to the SARS- CoV-2 Coronavirus Epidemic in Ukraine of the National Academy of Sciences of Ukraine, the National Academy of Medical Sciences of Ukraine and the Taras Shevchenko National University of Kyiv has developed its own mathematical model to the class of deterministic SEIR-models. It allows to take into account the presence of asymptomatic infected persons, takes into account three levels of complexity of the disease for patients with symptoms and allows for short-term prognosis [17]. However, as with most other similar models, attempts to forecast for a longer period of time (more than 1-2 weeks) lead to a significant increase in the range of uncertain-ty. The group of researchers from the Operations Research Center of the Massachusetts Institute of Technology [18] is also based on SEIR and takes into account the possibility of incom- plete detection of infected people, the number of people in con-tact with the infected person during the day, and possible government’s and societies actions. As in other similar models, the forecast is based on official data, which makes it sensitive to the relevance and reliability of this data. Another area of research is the general risk analysis of information systems associ- ated with the COVID-19 pandemic. In particular, [19] states that the main component of risk is uncertainty. The lack and unreliability of information, in particular, has led to inadequate risk assessment and uneffective solutions in China, resulting in the rapid spread of the COVID-19 worldwide [20]. The authors [19] believe that solutions for overcoming the pandemic are too complex and cannot be formalized in the form of certain algorithms. Therefore, they propose to apply adaptive management approaches. Many countries have started to develop their strategies "from scratch", not based on existing knowledge about the development of pandemics, relevant models and experi- ence of countries that have encountered a pandemic in the past [21]. The INFORM collaboration [22] developed the INFORM COVID-19 Risk Index to support decision-making on the allocation of global and regional resources. It assesses the risks of the COVID-19's impact on health and the humanitarian situation that may lead to the need for international assistance. For a country-wide risk assessment, RIKA India has proposed a four-factor model (health, behavior, impact, social policy) [23]. The analysis of the recent studies results shows that to assess the risks of decision- making in a pandemic, the problem of the lack of common protocols for collecting primary information about the global pandemic remains. Different strategies and ap- proaches of different countries to testing, collection, registration of data in information systems make it impossible to directly use primary information according to common models for all countries. There are problems in assessing the effectiveness of quarantine measures in different countries, including due to differences in approaches to collecting primary information. Thus, the inaccuracy and irrelevance of available pandemic data collected in infor- mation systems is a significant risk factor for pandemic fore-casting and decision-mak- ing. Therefore, the presented study is devoted to identifying the risks associated with the available data, as well as developing approaches to reduce their impact on forecast- ing results. 3 Proposed methodology The research methodology provided that forecasting the dynamics of pandemic devel- opment in individual countries/regions can be based not only on the use of classical or modified models, but also on statistical analysis of data available in information sys- tems (total and daily reported cases of infection and death, PCR and ELISA testing, etc.) in those countries/regions where the outbreak started earlier. Despite the differ- ences in the dynamics of indicators due to differences in testing and registration of morbidity and mortality, demographic, social and other differences, etc., statistical analysis of indicators allows to determine the expected range of values at a certain stage of a country's pandemic data relating to countries or regions where these stages oc- curred earlier. Such analysis can also establish the link between the factors influencing the development of the pandemic and the dynamics of the studied indicators. For the COVID-19 pandemic, the maximum of the first pandemic wave in China was in Janu- ary, in South Korea in early March, in Spain, Italy, Luxembourg, New Zealand, Norway and a number of other countries in the second half of March. In Ukraine, on the other hand, the maximum of the first wave was in early May, and in many countries of Asia, Africa and Latin America (Brazil, India, South Africa) it was reached in the first half of July or even not reached yet. The short-term forecast of the time and height of the maximum daily incidence for the first wave of the pandemic in Ukraine made in [24], despite the limited range of reference countries available at that time, agreed well with the actual data and forecast of NASU based on mathematical model [25]. To build such forecasts, it is necessary to use large arrays of data. Data on the COVID-19 pandemic are now available in many databases and information systems. In this study, the source data were taken from [26], where the European Center for Disease Prevention and Control (registered cases of infection and death), Our World in Data (official test reports), and the United Nations, World Bank, Global Burden of Disease, Blavatnik School of Government (other data). Official government resources of Euro- pean countries and the United States, in particular www.cdc.gov, www.cebm.net, www.epicentro.iss.it and others, were used as additional sources. Analysis of the daily morbidity and mortality dynamics shows that it usually be- longs to several typical patterns: symmetrical or asymmetrical isolated peak, peak with a "wide flat top" (plateau), a mixture of several normal peaks or peaks from the plateau. Accordingly, the dynamics of total morbidity and mortality can usually be described as a single S-shaped curve, or the sum of such curves. Based on this, an isolated peak can be considered as the main element of the dynamics of daily cases (Fig. 1). This corre- sponds to the basic SIR and SEIR models. As the main characteristics for the peak description can be taken as its date (tm), height (h), half-width (t2 - t1) - the time interval between the dates when the daily incidence was h/2 and asymmetry (t2 - tm)/(tm - t1). Fig. 1. Daily morbidity peak (Ireland) 31 countries were taken for analysis, where as of July 20, 2020, clear peaks in daily morbidity and mortality were identified. In many countries, these figures are signifi- cantly weekly. Therefore, to determine the characteristics of the maxima, smoothing by the moving average method with a 7-day smoothing interval was used. Data on the dynamics of weekly indicators obtained by grouping daily primary data were also used to clarify the maximum position. To ensure data’s comparability, all morbidity and mortality rates were used per 1 million inhabitants. However, even with such an adjust- ment, the use of data for forecasting needs further analysis, as the available indicators relate to the country as a whole and do not take into account the regional distribution. The importance of taking this into account is illustrated by the results for the US states. Here, after the first maximum, which was reached in early April, and the two-month plateau, the second peak of morbidity began. According to the analysis, this trend is due to the non-simultaneous spread of infection in different states. In March-April, the main contribution to the overall incidence was made by New York, New Jersey, Mas- sachusetts and several other states, but in June-July the number of new cases here de- creased by 5 - 15 times. Instead, the main contributors are California, Texas, and Flor- ida, where the daily number of new cases has increased by more than an order of mag- nitude since March. Data on the regional distribution of key indicators were analyzed in Ukraine in the context of risk analysis associated with the further development of the pandemic. 4 Results and Discussions As noted, one of the key problems in forecasting the development of pandemics is the incorrect data on the total number of infected and lethal. The case fatality rate (CFR) commonly used for decision making is obtained by di- viding the number of mortality by the number of registered patients, or by dividing the number of mortality by the sum of mortality and the number of patients who have re- covered (respectively, lower and upper grades). For countries where the number of ac- tive cases is a small percentage of the total number of reported cases, these CFR esti- mates are close to each other. For example, for China, where the share of active cases on 25.07.2020 is 0.31% of the total, they are equal to 5.53% and 5.55%, respectively. As of February 15, when the share of active cases was 83.8%, and the daily number of new cases was close to the maximum, they differed significantly and were equal to 2.43% and 13.1%, respectively. For the United States, where the share of active cases on April 26, 2020 was 82.1%, these CFR estimates were 5.65% and 31.5%, respec- tively, and as of July 25, 2020, when the share of active cases decreased to 48.8%, they are, respectively, 3.50% and 6.82%. Both estimates are significantly different for dif- ferent countries due to differences in testing policies and different stages of develop- ment of the COVID-19. Therefore, these CFR estimates can be used to short-term pre- dict the development of a pandemic in a particular country in the absence of changes in testing policies, or to compare countries with the same testing policies. But they are unsuitable for decision-making based on estimates of the true proportion of fatal and severe cases. Table 1 shows the data on the share of infected people from the total population, obtained from sample surveys of the population. Table 1. The share of infected people from the total population according to sample surveys Country The part of infected by the The share of The share of results of sample surveys those informed ac- infected accord- [27], % cording to official ing to official data at the end of the data on respective period, 18.07.2020, cal- the calculation ac- culated accord- cording to ing to github.com, % github.com, % Austria 4,7 (18 week) 0,18 0,22 Belgian 2,9 – 6% (mid-April) 0,29 0,55 Bulgaria 4,8 (13 – 17 weeks) 0,019 0,12 Spain 5,0 – 5,47 (17 – 19 weeks) 0,57 0,66 Luxembourg 1,97 (17 – 19 weeks) 0,62 0,88 Finland 1,0 – 4,3 (16 – 23 weeks) 0,13 0,13 The Czech Re- 0,0 – 4,0 (18 week) 0,073 0,13 public As can be seen from the above data, the number of people with antibodies to SARS- CoV-2 coronavirus is 3-54 times higher than the number of officially registered cases of infection. According to the latest data on the study of memory T cells [8-10], the actual number of infected may be 2-3 times higher. However, even with such an adjust- ment, only in some regions the share of the population with immunity to COVID-19 today is approaching 50%. In most cases, it does not exceed 5-10%, which makes prob- able new waves of disease. This assumption is confirmed by a significant increase in morbidity in Bulgaria, Luxembourg, the Czech Republic and a number of other Euro- pean countries in June-February. Another approach to estimating the actual number of infected is based on IFR esti- mates. According to the above data, taking the range of the most probable values of 0.3 - 0.6%, you can get lower and upper estimates of the actual number of cases of infection on 20.07.2020, which are shown in Table 2. Table 2. Lower and upper estimates of the share of infected in the total population, calculated by IFR,% Country Lower esti- Upper esti- Relation to the share of in- mate mate fected, calculated by the number of registered cases Lower esti- Upper esti- mate mate USA 5,4 21,7 4,6 18,4 Brazil 4,7 18,7 4,7 18,9 India 0,3 1,0 3,1 12,3 Spain 7,6 30,4 11,6 46,2 UK 8,3 33,4 19,2 76,8 Italy 7,3 29,0 17,9 71,7 Germany 1,4 5,5 5,6 22,5 France 5,8 23,1 21,6 86,3 Sweden 7,0 27,8 9,1 36,3 Belgium 10,6 42,3 19,2 76,7 Ukraine 0,4 1,7 3,2 12,6 Netherlands 4,5 17,9 14,8 59,3 Poland 0,5 2,2 5,1 20,3 Armenia 2,7 11,0 2,3 9,3 Switzerland 2,8 11,4 7,3 29,3 Moldova 2,1 8,5 4,1 16,3 Serbia 0,7 2,7 2,8 11,3 Austria 1,0 4,0 4,5 18,1 Czechia 0,4 1,7 3,3 13,1 Denmark 1,3 5,3 5,8 23,1 Bulgaria 0,5 2,2 4,3 17,1 Finland 0,7 3,0 5,6 22,3 Luxembourg 2,2 8,9 2,5 9,9 Hungary 0,8 3,1 17,3 69,0 The given data generally correspond to the estimates given in Table 1 according to the data of sample surveys. They also confirm the above conclusions about the significant underestimation of official data on cases of infection, even during applying the lower estimates. At the same time, even using the upper estimates, it can be concluded that there is a high risk of new outbreaks in most of these countries. However, such a con- clusion can be significantly adjusted in view of the following circumstances. Firstly, according to [8], immunity to COVID-19 can have not only individuals who have relapsed into the COVID-19 infection. It can also be found in people who have previously had SARS or other coronavirus infections. Secondly, from the data [28-29], it follows that collective immunity can be formed at significantly lower than 50-70% of the infected population due to the heterogeneity of the system. Thirdly, the analysis of the available data in the information systems shows that in recent months the CFR for the COVID-19 has decreased significantly. This may have various explanations, some of which are related to the decrease in IFR. Also, as noted above, available IFR estimates may be significantly overestimated. Then, even the above estimates obtained using IFR, in some cases may be significantly lower than the actual level of morbidity. The IFR estimates, that mentioned above, are based on sample studies or identifica- tion of pandemic spread patterns. Another approach to estimating IFR can be based on the analysis of the distribution of CFR values. It can be assumed that the lowest CFR values will be observed in the countries with the highest proportion of infected persons. Therefore, they will be closest to the real IFR values. Analysis of the available data shows that for different countries, the CFR can vary from a few hundredths of a percent to more than 10%. Data for some countries are shown in Table 3. Table 3. Indicators characterizing the development of the COVID-19 pandemic for some coun- tries (as of 25.07.2020) Country Number of Number of CFR Number of cases per 1 deaths per 1 tests per 1 mil- million inhab- million inhab- lion inhabit- itants itants ants USA 12830 448 3,50 158610 Great Britain 4387 673 15,3 210452 Italy 4062 581 14,3 106994 Qatar 38691 58 0,15 165494 Sweden 7819 564 7,21 74353 Oman 14430 70 0,49 56851 Ukraine 1437 36 2,51 21437 Singapore 8435 5 0,06 199896 Iceland 5399 29 0,54 353657 As can be seen, there is a large variation in CFR values, due primarily to different ap- proaches to identifying infected individuals. In countries whose strategies provide for the most complete identification of such individuals (Iceland, Singapore, etc.), the CFR is in the range of 0.06 - 0.5%, which can be taken as an empirical upper estimate of IFR. This estimate is also obviously somewhat inflated, as in no country does the test cover all patients, but it confirms the conclusion of other studies that the IFR does not exceed a few tenths of a percent. It should be noted that actual IFR values may change over time due to improvements in treatment protocols, and may vary significantly be- tween countries due to different demographics and different capabilities of health sys- tems. These factors must also be taken into account when forecasting the development of a pandemic, in particular, when applying data from one country to another. In fig. 2 presents data on the distribution by Ukraine and United States regions of officially registered cases per 1 million people. Fig. 2. Distribution by Ukraine and USA regions of the number of officially registered cases per 1 million people in relation to the average levels These data indicate a significant heterogeneity in the regional distribution of morbidity in both countries. For example, on July 18, 2020, the maximum value for Ukraine ex- ceeds the minimum value by 119 times, and the third quartile exceeds the first by 6.2 times. For the United States, similar ratios are 94 and 2.2. This indicates the incorrect use of averages to predict the further development of the pandemic based on models such as SIR, SEIR. In addition, the average level of officially registered morbidity in Ukraine as of July 18, 2020 is about 1.4, and in the United States - about 11.6 people per 1 million inhabitants. Even taking into account the fact that the number of tests per 1 million inhabitants is 7.2 times higher than in Ukraine, this gives grounds to conclude that there is a risk of a significant increase in morbidity in Ukraine, which may mainly occur at the expense of regions where today the incidence rate is the lowest. As noted above, the main element that can be used to describe an outbreak of a pan- demic are the peaks in daily morbidity and mortality. Analysis of github.com data al- lowed us to identify 31 countries where clear peaks can be identified. This work did not take into account countries, in particular France, where data were repeatedly corrected by formally attributing large numbers of unaccounted cases (or subtracting erroneously credited cases) to certain dates, as well as some other countries for which there are doubts about the reliability or the reliability of statistical data, in particular, due to the low incidence rate as of 20.07.2020. Of these 31 countries, only cases were considered for Liechtenstein, as all data on mortality in information systems are formally assigned to one date. The analysis shows that the data on peak heights and the total number of infected and dead at the dates tm and t2 per 1 million inhabitants have a significant (within 3 orders of magnitude) variance. This is due to different testing policies and sometimes the introduction of quarantine measures. In some countries, such as South Korea and Thailand, small domestic sources of infection have been rapidly tracked and isolated. Therefore, even on 26.07.2020 in these countries the number of officially registered cases is about 0.028% and 0.0047%. In Qatar, on the other hand, the number of reported cases exceeded 2% of the total population during the first outbreak. However, the available data analysis in the information systems indicates a signifi- cant correlation between the individual parameters of the peaks. For example, for reg- istered cases, the coefficients of determination for the linear model are equal to: 0.99 for the relationship between the total number of cases on the date t2 and tm, 0.76 for the relationship between the daily and the total number of cases on the date t2, 0,44 for the relationship between the time of outbreak (the number of days between the date when the total number of cases was 30 people per 1 million inhabitants, and the date tm) and the half-width of the peak. For fatalities, the first two figures are 0.86 and 0.73, respectively. Instead, the link between similar peaks in morbidity and mortality is much weaker. In particular, for the heights of the corresponding maxima, it is equal to 0.16, for the total number of cases on the dates of the corresponding maxima - <0.01. This may be due to the fact that the absolute data on the number of registered cases deviate signifi- cantly from the actual number of infected. However, their understatement is signifi- cantly different in different countries due to different approaches to testing and case registration. Therefore, mortality data are more reliable for predicting the dynamics of pandemic outbreaks than data from reported cases. To estimate the total number of in- fected, which is important for estimating the likelihood and extent of new outbreaks, more reliable estimates can be obtained using mortality data and IFR estimates based on sample data than on the number of reported cases. Table 4 shows the statistical characteristics of some more stable indicators of mor- bidity and mortality. Table 4. Distribution quarters of separate parameters of peaks characterizing daily numbers of new cases and dead Quarter Growth time Half-width Asymmetry The ratio of the total number of cases on the date t2 and tm New cases min -3 13 0,5 1,37 0,25 14 20 1,0 1,65 0,50 18 24 1,42 1,82 0,75 26 31 2,0 2,23 max 82 62 4,8 3,20 Dead Number of Half-width Asymmetry The ratio of the days be- total number of tween peaks cases on the date of mortality t2 and tm and infec- tion min -6 7 0,27 1,23 0,25 4 18 1,06 1,70 0,50 9 28,5 1,5 2,00 0,75 13 35 2,3 2,34 max 28 52 6,3 5,83 As can be seen from these data, the variance between the values of the given indicators of peak morbidity and mortality is much (1-2 orders of magnitude) smaller than the variance between the absolute values of the indicators. This makes it possible to use them to more accurately predict the development of a pandemic. 5 Conclusions The results indicate that there are significant risks associated with data contained in information systems used to predict the development and decision-making of the spread of the COVID-19 pandemic. This risk reasons are: 1. Systematic errors in the primary data concerning the registration of cases of infection and mortality from corona-viral infection. Data on cases of infection are significantly underestimated, which affects the risk assessments of new outbreaks of the pan- demic. Mortality data can be both underestimated and overestimated. This affects IFR estimates, but in any case they are more reliable than infection data. Because of this, estimates of general morbidity obtained by indirect methods may be more rele- vant. 2. Significant regional heterogeneity of cases of infection, which affects the possibility of their direct application as parameters of mathematical models of pandemic devel- opment, which leads to an increase in the risk of significant modeling errors. To reduce this risk, it is necessary to use models that take into account the available heterogeneities and empirical data, for individual regions, rather than the whole country. 3. Wrong strategic and operational decisions that can either increase mortality from coronavirus infection due to insufficient countermeasures, or increase the risk of negative social and economic consequences, including increased mortality due to pandemic and quarantine stress, complications of chronic diseases, limited access to medical care, etc. To reduce such risks, it is necessary to develop special optimiza- tion models that use more powerful information systems that contain verified data not only on epidemiological indicators, but also other data needed to correctly assess the socio-economic consequences. To reduce these risks, it is necessary to adjust the data used in predictive models, in particular through the use of more reliable data on lethal and severe cases to estimate the number of infected, as well as estimates of the number of infected on the basis of sample studies of immunity in the population. The second method to improve forecasts and improve the efficiency of decisions made on their basis is the using of statistical estimates based on the use of information systems data on similar indicators of coun- tries where the pandemic is similar, but significantly ahead of the country for which it is made forecast. It is important to increase the accuracy of forecasts to take into account in mathematical models the heterogeneity of pandemic development by region and the use of regional data in modified models. The work was made by non-governmental organization "system research". References 1. Giulia Giordano, Franco Blanchini, Raffaele Bruno, Patrizio Colaneri, Alessandro Di Fil- ippo, Angela Di Matteo & Marta Colaneri (2020) Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy. https://www.nature.com/arti- cles/s41591-020-0883-7 [Accessed 16 August 2020]. 2. José M. Carcione, Juan E. Santos, Claudio Bagaini and Jing Ba (2020) A simulation of a COVID-19 epidemic based on a deterministic SEIR model. https://www.frontiersin.org/ar- ticles/10.3389/fpubh.2020.00230/full [Accessed 16 August 2020]. 3. Christina Atchison, Leigh Bowman, Jeffrey Weaton, Natsuko Imai, Rozlyn Redd, Philippa Pristera, Charlotte Vrinten, Helen Ward (2020) Report 10: Public Response to UK Govern- ment Recommendations on COVID-19: Population Survey, 17-18 March 2020 https://www.imperial.ac.uk/media/imperial-college/medicine/mrc-gida/2020-03-20- COVID19-Report-10.pdf [Accessed 16 August 2020]. 4. Kaihao Liang (2020) Mathematical model of infection kinetics and its analysis for COVID- 19, SARS and MERS, Infection, Genetics and Evolution, 82, doi: 10.1016/j.mee- gid.2020.104306. 5. Stelios Bekiros, Dimitra Kouloumpou (2020) SBDiEM: A new mathematical model of in- fectious disease dynamics, Chaos, Solitons & Fractals, 136, doi: 10.1016/j.chaos.2020.109828. 6. Muhammad Altaf Khan, Abdon Atangana, Modeling the dynamics of novel coronavirus (2019-nCov) with fractional derivative, Alexandria Engineering Journal, 2020, doi: 10.1016/j.aej.2020.02.033. 7. Forecast of the COVID-19 epidemic in Ukraine in the period April 20-27, 2020 (in Ukr) http://files.nas.gov.ua/PublicMessages/ Documents/0/2020/04/200422180841445-9406.pdf Accessed 16 August 2020. 8. Le Bert, N., Tan, A.T., Kunasegaran, K. et al. SARS-CoV-2-specific T-cell immunity in cases of COVID-19 and SARS, and uninfected controls. Nature (2020). doi: 10.1038/s41586-020-2550-z 9. Li, J., Wang, J., Kang, A.S. et al. Mapping the T cell response to COVID-19. Sig Transduct Target Ther 5, 112 (2020). doi: 10.1038/s41392-020-00228-1 10. Yang, L. T., Peng, H., Zhu, Z. L., Li, G., Huang, Z. T., Zhao, Z. X., Koup, R. A., Bailer, R. T., & Wu, C. Y. (2006). Long-lived effector/central memory T-cell responses to severe acute respira-tory syndrome coronavirus (SARS-CoV) S antigen in recovered SARS patients. Clinical immunology (Orlando, Fla.), 120(2), 171–178. doi: 10.1016/j.clim.2006.05.002 11. COVID-19 Antibody Seroprevalence in Santa Clara County, https://www.medrxiv.org/con- tent/10.1101/2020.04.14.20062463v1.full.pdf [Accessed 16 August 2020]. 12. Up to 2.7 million in New York may have been infected, antibody study finds, https://www.nbcnewyork.com/news/local/new-york-virus-deaths-top-15k-cuomo-ex- pected-to-detail-plan-to-fight-nursing-home-outbreaks/2386556/?fbclid=IwAR0i- J3TQ3idewt47akwVCQWkQU-AE4SOH0AExtM2koOYh3iLjS3W199MPg [Accessed 16 August 2020]. 13. Spread of SARS-CoV-2 in Austria, https://www.sora.at/uploads/media/Austria_COVID- 19_Prevalence_BMBWF_SORA_20200410_EN_Version.pdf [Accessed 16 August 2020]. 14. How deadly is the coronavirus? Scientists are close to an answer, https://www.na- ture.com/articles/d41586-020-01738-2 [Accessed 16 August 2020]. 15. Gideon Meyerowitz-Katz, Lea Merone (2020) A systematic review and meta-analysis of published research data on COVID-19 infection-fatality rates, doi: 10.1101/2020.05.03.20089854 16. Global Covid-19 Case Fatality Rates, https://www.cebm.net/covid-19/global-covid-19- case-fatality-rates [Accessed 16 August 2020]. 17. Counteraction COVID-19, http://www.nas.gov.ua/EN/Activity/covid/Pages/wg.aspx [Ac- cessed 16 August 2020]. 18. Michael Lingzhi Li, Hamza Tazi Bouardi and other (2020) Overview of DELPHI Model V3 – COVIDAnalytics https://www.covidanalytics.io/DELPHI_documentation_pdf [Accessed 16 August 2020]. 19. David Adam, Special report: The simulations driving the world’s response to COVID-19. Nature 580, 316-318 (2020), doi: 10.1038/d41586-020-01003-6 20. Noah C Peeri , Nistha Shrestha, Md Siddikur Rahman, Rafdzah Zaki, Zhengqi Tan, Saana Bibi, Mahdi Baghbanzadeh, Nasrin Aghamohammadi, Wenyi Zhang and Ubydul Haque. The SARS, MERS and novel coronavirus (COVID-19) epidemics, the newest and biggest global health threats: what lessons have we learned? International Journal of Epidemiology, 2020, 1–10 doi: 10.1093/ije/dyaa033 21. Yuri Bruinen de Bruina, Anne-Sophie Lequarrea, Josephine McCourta, Peter Clevestigb, Filippo Pigazzanic, Maryam Zare Jeddid, Claudio Colosioe, Margarida Goularta (2020) In- itial impacts of global risk mitigation measures taken during the combatting of the COVID- 19 pandemic, doi: 10.1016/j.ssci.2020.104773 22. Inform COVID-19 Risk Index Version 0.1.2 - Results and Analysis (17 April 2020). https://reliefweb.int/report/world/inform-covid-19-risk-index-version-012-results-and- analysis-17-april-2020 [Accessed 16 August 2020]. 23. Ranit Chatterjee, Sukhreet Bajwa, Disha Dwivedi, Repaul Kanji, Moniruddin Ahammed, Rajib Shaw (2020) COVID-19 Risk Assessment Tool: Dual application of risk communica- tion and risk governance, doi: 10.1016/j.pdisas.2020.100109 24. Bakurova, A., Pasichnyk, M., Tereschenko, E. & Bakhrushin, V. (2020) Data Analysis and Predicting of COVID-19 in Ukraine, doi: 10.13140/RG.2.2.35305.11369. 25. Forecast of the COVID-19 epidemic in Ukraine in the period April 20-27, 2020, (in Ukr), http://files.nas.gov.ua/PublicMessages/Documents/0/2020/04/200422180841445-9406.pdf [Accessed 16 August 2020]. 26. COVID-19. Public data. https://github.com/owid/covid-19-data/tree/master/public/data [Accessed 16 August 2020]. 27. Immune responses and immunity to SARS-CoV-2, https://www.ecdc.europa.eu/en/covid- 19/latest-evidence/immune-responses [Accessed 16 August 2020]. 28. Weitz, J.S., Beckett, S.J., Coenen, A.R. et al. Modeling shield immunity to reduce COVID- 19 epidemic spread. Nat Med 26, 849–854 (2020). doi: 10.1038/s41591-020-0895-3 29. Tom Britton, Frank Ball, Pieter Trapman. A mathematical model reveals the influence of population heterogeneity on herd immunity to SARS-CoV-2. Science 369, 846-849 (2020). doi: 10.1126/science.abc6810