=Paper=
{{Paper
|id=Vol-2753/paper2
|storemode=property
|title=Identifying Explosive Epidemiological Cases with Unsupervised Machine Learning
|pdfUrl=https://ceur-ws.org/Vol-2753/paper2.pdf
|volume=Vol-2753
|authors=Serge Dolgikh
|dblpUrl=https://dblp.org/rec/conf/iddm/Dolgikh20
}}
==Identifying Explosive Epidemiological Cases with Unsupervised Machine Learning==
Identifying Explosive Epidemiological Cases with Unsupervised Machine Learning Serge Dolgikha,b a Solana Networks, 301 Moodie Dr., Ottawa, K2H9C4, Canada b National Aviation University, 1 Liubomyra Huzara Ave, 1, Kyiv, 03058, Ukraine Abstract An analysis of a combined dataset of epidemiological statistics of national and subnational jurisdictions, aligned at approximately two months after the first local exposure to Covid-19 with unsupervised machine learning methods such as Principal Component Analysis and deep autoencoder dimensionality reduction allows to clearly separate milder background cases from those with more rapid and aggressive onset of the epidemics. The analysis and findings of this study can be used in evaluation of possible epidemiological scenarios and as an effective modeling approach to identify possible negative epidemiological scenarios and design corrective and preventative measures to avoid developments with potentially heavy impact. Keywords 1 Infectious diseases, epidemiology, Covid-19, machine learning, unsupervised learning 1. Introduction An analysis of factors that can influence the course of the development of the epidemics in a given jurisdiction is both a challenging and interesting undertaking given the number of potential factors and their interaction. For example, a possible link between the effects of Covid-19 pandemics and a number of epidemiological factors including universal immunization program against tuberculosis with BCG vaccine was proposed in Miller et al. [1] and further investigated in [2-4]. Other factors, such as: gender and ethnicity; age demographics; social habits such as smoking; and others were investigated in a number of studies [5,6] and others. However, given the large number of factors that may have influence on the out-come of the epidemics in each case, identification of the most influential ones may represent certain challenge due to the number, complexity and interaction of contributing factors. In this work we attempt an analysis of the combined dataset of nation-al and subnational reporting jurisdictions adjusted and aligned at the same time point of approximately two months after the first local exposure to the Covid-19 epidemics with the methods of unsupervised machine learning. The unsupervised dimensionality / redundancy reduction methods such as Principal Component Analysis (PCA) [7] and unsupervised deep artificial neural network models such as autoencoders (AE) [8] allow to analyze the distribution of case data points in the informative parameter spaces identified by these methods and to at-tempt and in many instances, identify characteristic regions associated with the variable of interest, such as in this work, the severity of the epidemiological scenario in the jurisdiction. Establishing combinations of the latent and observable parameters that identify such regions can be used to evaluate and predict the risks of heavier epidemiological impacts in the jurisdiction proactively with the opportunity to make necessary corrections before the explosive onset of the epidemics would cause heavy costs to the society. IDDM’2020: 3rd International Conference on Informatics & Data-Driven Medicine, November 19–21, 2020, Växjö, Sweden EMAIL: sdolgikh@nau.edu.ua (S. Dolgikh) ORCID: 0000-0001-5929-8954(S. Dolgikh) ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 2. Methodology As the experience of the pandemics to the day shows, timing can be a critical factor in the development of the epidemics and an accurate analysis of the corresponding statistical data. To ensure correctness of the analysis in the study we used two approaches: 1) data aligned with respect to the duration of the exposure in the reporting jurisdiction, i.e., the dataset composed mainly of the cases that have the same or similar time of the exposure. Where it is not the case 2) time-based adjustment of the data is performed so that the statistical records are taken at the same or similar time of local exposure. To simplify timing analysis, the global zero time of the start of the Covid-19 pandemics was defined in [2] as: TZ = 31.12.2019. The exposure time in the study in the format TZ + y months is relative to this time point. A number of known factors was expected to have strong influence on the course of the epidemics in the cases was identified in [1-3] and other studies, including: the time of the local exposure; demographics; social, tradition, lifestyle; the level of economic and social development; the quality and efficiency of the healthcare system and not in the least, the quality of public health policy making and execution. The methodology is based on processing the input data expressed as a set of observable parameters that were identified and described in the study with unsupervised machine learning methods to identify and extract a smaller set of informative features. In many cases, evaluating distributions of data in the representations of informative components such as principal components in PCA or dimensionality reduction with neural network autoencoder models allowed to identify and separate characteristic classes of cases in the observable data by essential latent parameters that can be linked to the observed outcome. 2.1. Data Evidently, the time of the local exposure to the epidemics is one of the critical parameters of the impact, so the case data was adjusted and aligned at a similar phase in the development of the epidemics, chosen based on the availability of data at approximately, local Time Zero + two months, i.e. approximately two months after the first local exposure to the infection. In the study this translates to the beginning of April, 2020 for Wave 1 cases (LTZ in January, 2020) and beginning of May for Wave 2 (LTZ end of February to early March, 2020). A combined dataset of approximately forty cases was thus constructed based on the conditions outlined in [2], essentially, bringing together the cases with similar social and economic parameters to minimize the number of potentially influencing factors along with the expectation of certain minimal level of exposure to the epidemics and reliability of the reported data. The dataset was constructed from the publicly available current data on the epidemics impact per case, i.e., reporting jurisdiction. It comprises the current value of the epidemics impact recorded in the jurisdiction (case) and measured in in mortality per capita m(t) (M.p.c.), per million of population, and a number of observable parameters selected as described further in this section with the hypothesis of a certain level of correlation between the observable parameter set and the severity of the outcome. On the relative scale of impact by jurisdiction, the “explosive” cases were normally identified as those with relative M.p.c. (i.e., relative to the maximum among all reporting jurisdictions worldwide) of around and above 0.5. This subgroup of cases included all commonly reported cases of high epidemics impact at the time of writing. In evaluation of distribution in the coordinates of principal components two higher impact clusters of cases were identified by relative impact: explosive cases with relative M.p.c. above 0.8 group included the well-known first wave cases: Italy; Spain and New York with the highest impact worldwide observed to date. In the second group were six somewhat milder-impact cases, namely: United Kingdom; France; Belgium; Netherlands; Ireland and Quebec (Canada), with relative M.p.c. in the range from 0.6 to 0.8. The impact parameter was not used in the training of the unsupervised learning models (i.e. excluded from the training dataset) but only for identification of the regions of interest (i.e. higher epidemiological impact) in the latent representations produced by the models as a result of training. 2.2. Observable Parameters The examples of factors of influence can include, among others: genetic differences; population density, social traditions and cultural practices, past widespread public policy such as immunization; smoking habits and of course the epidemiological policy of the jurisdiction aimed at controlling the spread of the disease. In addition to the common measurable factors such as population density, age demographics, smoking prevalence a number of additional factors with potential impact on the severity of the epidemics pattern were considered in this study as de-scribed in this section. A common comment for some of them is that due to limitation of time and resources, a rating scale approach was chosen for those factors that can-not or would be challenging to measure directly. Understandably, such an approach can be influenced by subjective perceptions; however, we believe that more robust and objective techniques can be developed over time improving the quality of the analysis and the resulting conclusions. Connectivity: intended to measure the intensity of international and regional connections in the jurisdiction of the case, for example, international, inter and intra-regional travel and migration, tourism; seasonal and work-related migration and so on; more intensive connection hubs can be expected to have higher exposure to the pandemics increasing the probability of a heavier impact. Social proximity: intended to reflect the closeness of inter-personal connections in the case, again in multiple spheres and domains, for example: family connections; socializing practices and traditions; the intensity of business connections; lifestyle practices; social events and others. Again, as was commented previously modeling such a complex factor as a single value parameter may open the analysis to the vulnerability of subjectiveness; yet we believed that it could be important for the analysis and improvements to make its evaluation, by case more objective and accurate are possible in the future studies. We also used three rating parameters intended to measure the policy of the juris-diction as relates to the response to the pandemics. They are: 1) epidemiological preparedness of the public healthcare system to an intensive and rapid development of an epidemics; 2) the effectiveness of the policy response; and 3) the timeliness of the public health epidemiological response. Epidemiological preparedness: intended to measure the preparedness of the health care system to handle a rapid onset of a large-scale epidemics. This parameter is intended to be specific to epidemiological situation rather than the general state of the health care system, its technological level, funding and so on). Effectiveness of policy response: intended to indicate the quality of the public health policy in controlling the epidemics based on available scientific data at the time including its clarity and availability for understanding and following by the general population facilitating its preparedness to participate. While some concerns can be expressed that this factor can be influenced by post-impact considerations with potential post-factum correlated with the outcome, we believe that with the accurate approach these risks can be minimized. For example, it is evident that an unclear or misleading policy message could be highly detrimental to the intended effect and one doesn’t need the outcome to judge such policy parameters objectively at the time the decision is made and before the outcome is recorded. Timeliness: measures the relative timing of introduction of the epidemiological policy to the local exposure and development of the epidemics. Universal BCG immunization record: indicates the record of a current or previous immunization program according to classification introduced in [9]. The detailed definition is provided in the Appendix. Finally, epidemiological impact: was measured in Covid-19 caused mortality per 1 million capita relative to the world’s maximum value at the time of the analysis. Due to a large spread within the range of the impact of the epidemics in the dataset, the logarithmic scale was also used in evaluation of the impact of the epidemics represented by Measured Value parameter (MV) being the logarithm of mortality per capita (in cases per 1M of population in the jurisdiction). 𝑀𝑜𝑟𝑡𝑎𝑙𝑖𝑡𝑦, 𝑐𝑎𝑠𝑒𝑠 𝑀𝑉(𝑙𝑜𝑐𝑎𝑙𝑖𝑡𝑦, 𝑡) = 𝑙𝑜𝑔( ) (1) 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛, 𝑀𝑖𝑙𝑙𝑖𝑜𝑛 It needs to be noted that in the framework of unsupervised analysis, epidemiological impact is not known a priori and for that reason it was not used in the evaluation of data with the selected methods. It was used however to analyze distributions obtained with the models and identify regions of potential interest, such as combinations of observable parameters associated with the areas of higher epidemiological impact. The resulting dataset of 40 national and subnational cases with the identified observable parameters and the recorded epidemiological impact at the time of preparation is presented in Table 1, Appendix. Reservations and qualifications: 1. Consistency and reliability of data reported by the national, regional and local health administrations. 2. Alignment in the time of reporting may not be consistent between all jurisdictions due to possible differences in reporting practices. Sources: World BCG atlas [10] Google coronavirus map [11] World statistical data [12,13] National and subnational jurisdictions Covid-19 information [14-16] and other. 2.3. Unsupervised Machine Learning To evaluate the hypothesis of the correlation between the identified parameters and the epidemiological outcome in the cases in the dataset, several common machine learning methods were used: 1. Linear regression. 2. Principal Component Analysis and identification of principal informative factors. 3. Unsupervised deep neural network-based dimensionality reduction and selection of dominant informative factors. The first method produces a best fit linear approximation of the resulting effect series with a minimum total deviation from the trend [17]. Principal Component Analysis [7] produces a linear transformation of the data to the coordinates with the highest variation. The method is based on the characteristics of the data and does not require prior knowledge of the recorded outcome. A deep neural network autoencoder (method 3) performs a non-linear dimensionality reduction of the observable data to the lower-dimensional representation with identified informative features. The diagram of the architecture of the unsupervised autoencoder model is given in Figure 1. Figure 1: Redundancy reduction with deep neural network autoencoder model The structure of the deep neural network model used in this work is described in detail in [18]. In the unsupervised training phase, the model is trained to reproduce the input da-ta with good accuracy and thus does not require labels marked with the outcome; the same applies to PCA. Achieving an improvement in the accuracy of reproduction of the input data, that can be measured by a number of training metrics indicates that the model has learned some essential characteristics of the initial distribution. The aim of unsupervised learning is thus to minimize the deviation of the original training sample from its regeneration created by the model. 3. Results In this section we present the results of the analysis of the dataset with methods outlined in the previous section with a brief discussion. 3.1. Linear Regression Linear regression with 9 identified observable parameters produced a trend with a strong correlation score to the recorded impact, with the value of 0.9 of out 1.0 maximum. The factors with the highest influence on the regression trend measured in logarithmic M.p.c. are shown in Table 1. Table 2 Linear Regression analysis Factor Linear regression score Correlation Policy, timing 0.534 0.906 Connection 0.196 0.697 Policy, effectiveness 0.094 0.856 BCG immunization 0.092 0.686 Social proximity 0.078 0.794 Policy factors were expected to have a strong influence on the outcome of the case that is confirmed by the results of the linear regression analysis. As well, the importance of other factors such as connection intensity, social proximity culture, BCG immunization and smoking was observed. 3.2. Principal Component Analysis Principal component analysis identified three principal components with overall influence above 95% as shown in Table 2. The highest influence factors in the PCA analysis were mostly aligned with the results of the linear regression analysis: policy-timing, connection hub, social proximity, BCG and smoking prevalence. PCA transformation is inherently unsupervised method of learning, meaning that the prior known outcome labels are not required to learn the principal components as well as representation of the input data in the coordinates of identified principal component eigenvectors. By plotting the data in the coordinates of the identified principal component vectors, interesting results can be obtained by indicating the cases with the highest recorded impact of the epidemics. Table 2 Covid-19 Principal component analysis Eigenvector Observable parameters Weight Axis 1 Policy-time, BCG 0.570 Axis 2 BCG, smoking 0.166 Axis 3 Connection hub, social 0.127 proximity Figure 2 shows visualizations of the distribution of the dataset of epidemiological cases in the coordinates of the three principal components with the highest variation identified by PCA analysis. The cases and approximated region of the highest-impact cluster is shown in blue, defining the region of principal coordinate values with the highest recorded impact of the epidemics; in a similar way, cluster with medium impact (6 cases) is shown in magenta. A clear separation of the higher-impact case clusters from the general background cases can be clearly observed in the diagrams. It allowed to identify the region where the cases with potentially higher impact including the “explosive” pattern are distributed in the latent coordinates of the principal component representation. Figure 2: Higher impact cluster identification with PCA A straightforward linear transformation then allows to obtain the corresponding region of interest in the initial, observable parameter space, with the possibility to identify the combinations of the observable parameters that can be linked to the outcomes with higher epidemiological impacts. 3.3. Analysis with Unsupervised Autoencoder A similar approach can be demonstrated with an unsupervised neural network autoencoder model that reduces the number of parameters by compressing the observable data space into a lower- dimensional representation in an unsupervised training process aimed at improving the accuracy of regeneration form the compressed representation. Models of a similar type were used to create structured unsupervised representations of different data types via unsupervised autoencoder training with minimization of generative error [8]. The dimensionality of the unsupervised representation for the models in the study that is defined by the size of its central encoding layer was chosen based on the results of the Principal Component Analysis in the previous section, indicating three most informative components. Presented in Figure 3 are direct visualizations of the distributions of data in the unsupervised representation created by a trained autoencoder model. Figure 3: Identification of higher impact cluster identification with deep autoencoder The highest impact cluster of three cases is shown in green whereas the medium one (6 cases), in orange. Again, a similar pattern of clear separation of higher-impact cases from the general background can be observed with these models, in full agreement with the results of PCA analysis in the previous section. It is worth noting that as with PCA, autoencoder models though essentially non-linear, also allow to identify the higher-impact regions in the coordinates of the observable parameters. This can be achieved by forward-propagating through the generative part of the model the identified region of interest, defined by a set of characteristic points in the latent representation, defining the corresponding region in observable parameters. The combinations of observable parameters that produce the effect of interest can be identified proactively, and used in development of an effective preventative or mitigating epidemiological policy. 4. Conclusion The methods of unsupervised machine learning can be effective in identifying and separating the informative features in complex general data. In this work, two different methods of unsupervised learning applied independently, consistently demonstrated good separation of cases with higher Covid- 19 epidemiological impact from the general background. The analysis and the findings of the study can be used in evaluation of possible epidemiological scenarios in jurisdiction based on evaluation of the factors identified and discussed in this work, as well as those that can be added in the subsequent studies. Further research and development in the identified direction has a potential of producing effective modeling tools to identify the areas of potential epidemiological risk in the public healthcare policy and design corrective and / or preventative measures to avoid the heavier impact scenarios. Further studies can be focused on improving the accuracy of measurement of the identified observable parameters as well as introducing additional ones, leading to higher accuracy and confidence of the evaluation. 5. References [1] Miller A., Reandelar M-J., Fasciglione K., Roumenova V., Li Y., Otazu G.H. Correlation between universal BCG vaccination policy and reduced morbidity and mortality for COVID-19: an epidemiological study, medRxiv doi: 10.1101/2020.03.24.20042937 (2020). [2] Dolgikh S., Further evidence of a possible correlation between the severity of Covid-19 and BCG immunization, preprint MedRxiv, doi: 10.1101/2020.04.07.20056994v2 (2020). [3] Sharma A., Sharma S.K., Shi Y., et al. BCG vaccination policy and preventive chloroquine usage: do they have an impact on COVID-19 pandemic? Cell death & disease, 11(7), 1-10 (2020). [4] Yitbarek K., Abraham G., Girma T. et al. The effect of Bacillus Calmette–Guérin (BCG) vaccination in preventing sever infectious respiratory diseases other than TB: implications for the COVID-19 pandemic. Vaccine 38(41), 2020, 6374–6380 (2020). [5] Ebina-Shibuya, R., Horita, N., Namkoong, H., Kaneko, T. National policies for paediatric universal BCG vaccination were associated with decreased mortality due to COVID-19. Respirology (Carlton, Vic.), https://europepmc.org/article/pmc/pmc7323121 (2020). [6] Dayal, D., Gupta, S. Connecting BCG vaccination and COVID-19: additional data. Medrxiv doi: 10.1101/2020.04.07.20053272 (2020). [7] Jolliffe I.T., Principal Component Analysis, Series: Springer Series in Statistics, 2nd edition, Springer, NY (2002). [8] Bengio Y., Learning deep architectures for AI, Foundations and Trends in Machine Learning, 2(1), 1–127 (2009). [9] Zwerling A., Behr M.A., Verma A., Brewer T.F., Menzies D., Pai M., The BCG World Atlas: a database of global BCG vaccination policies and practices. PLOS Medicine, doi: 10.1371/journal.pmed.1001012, (2011). [10] BCG World Atlas online, URL: http://www.bcgatlas.org/ [11] Coronavirus data and map, URL: https://www.google.com/covid19-map/ (4.04.2020). [12] Our World in Data: World smoking prevalence, URL: https://ourworldindata.org/smoking (4.04.2020). [13] Worldometers: Population data, URL: https://www.worldometers.info/world-population/ (4.04.2020). [14] Canada Covid-19 Situation Update, URL: https://www.canada.ca/en/public- health/services/diseases/2019-novel-coronavirus-infection.html?topic=tilelink (4.04.2020). [15] CDC Covid-19 Advice, URL: https://www.cdc.gov/coronavirus/2019-ncov/index.html (2020). [16] NHS Covid-19 Advice, URL: https://www.nhs.uk/conditions/coronavirus-covid-19/ (2020). [17] Freedman D., Statistical Models: Theory and Practice. Cambridge University Press (2005). [18] Prystavka P., Cholyshkina O., Dolgikh S., Karpenko D., Automated object recognition system based on aerial photography. In: 10th International Conference on Advanced Computer Information Technologies ACIT-2020 Deggendorf, Germany (2020). Appendix Time-adjusted Dataset of Epidemiological Cases Table 1 Epidemiological Case Dataset adjusted at LTZ + 3 months Case Policy p- p- p- Conn Bcg Smo Den Soc Age Impact prep qlty tme Taiwan 0 0 0 0.1 0 0.34 0.3 0.2 0.3 0.001 Japan 0.1 0.1 0 0.6 0 0.674 0.3 0.2 0.5 0.002 Singapore 0 0 0 0.4 0 0.33 0.5 0.3 0.25 0.004 Australia 0.2 0.2 0 0.2 0.3 0.298 -0.5 0.3 -0.4 0.005 South 0.1 0.2 0 0.2 0 0.996 0.3 0.2 0 0.013 Korea Finland 0.3 0.2 0.1 0.1 0.3 0.418 -0.2 0.2 0.3 0.017 Canada 0.4 0.2 0.2 0.3 0.8 0.354 -0.5 0.4 0 0.023 Ontario 0.4 0.2 0.25 0.3 0.8 0.258 -0.2 0.4 0 0.025 (Canada) Germany 0.3 0.2 0.2 0.5 0.2 0.608 0.2 0.4 0.5 0.052 Sweden 0.3 0.3 0.3 0.1 0.6 0.412 0.0 0.3 0 0.148 UK 0.5 0.7 0.7 0.7 0.8 0.398 0.2 0.5 0 0.248 France) 0.5 0.5 0.6 0.7 0.6 0.596 0.2 0.7 -0.2 0.371 Belgium 0.5 0.4 0.5 0.7 1 0.53 0.2 0.5 0 0.429 Spain 0.8 0.7 0.8 0.5 0.8 0.584 0.2 0.8 0.5 0.965 Italy 0.8 0.8 0.9 0.7 1 0.566 0.2 0.8 0.5 0.969 USA 0.5 0.5 0.5 0.3 1 0.39 -0.2 0.4 -0.4 0.095 New York 0.8 0.8 0.9 1 1 0.25 0.5 0.8 -0.5 1.000 (USA) California 0.5 0.3 0.2 0.5 1 0.226 0.1 0.4 -0.5 0.040 (USA) Slovakia 0.2 0.2 0.2 0 0 0.794 0.2 0.2 -0.1 0.016 Argentina 0.4 0.3 0.3 0 0 0.478 -0.2 0.3 -0.5 0.019 Chile 0.2 0.2 0.1 0 0 0.76 0.1 0.2 -0.5 0.050 Ukraine 0.6 0.4 0.3 0 0 0.94 0.2 0.4 0.1 0.027 Poland 0.3 0.2 0.1 0.2 0 0.648 0.2 0.3 -0.1 0.066 Moldova 0.6 0.4 0.3 0 0 0.56 0.2 0.4 -0.4 0.125 Czechia 0.3 0.2 0.1 0.1 0 0.766 0.2 0.25 0 0.082 Croatia 0.3 0.2 0.1 0 0 0.74 0.2 0.25 0.5 0.068 Albania 0.3 0.2 0.1 0 0 0.8 0.2 0.25 -0.5 0.038 Greece 0.2 0.1 0 0.4 0 1 0.2 0.5 0.5 0.049 Israel 0.1 0.1 0.1 0.4 0.3 0.382 0.2 0.2 -0.5 0.094 Prairies (1) 0.3 0.2 0.2 0 0.6 0.292 -0.3 0.2 -0.3 0.016 (Canada) Quebec 0.6 0.4 0.5 0.3 0.8 0.304 -0.2 0.5 0.3 0.912 (Canada) Norway 0.2 0.2 0.2 0.2 0.2 0.452 -0.2 0.25 -0.1 0.138 Denmark 0.2 0.2 0.2 0.1 0.3 0.352 0.2 0.25 0.1 0.303 Switzerlan 0.2 0.2 0.2 0.2 0.3 0.51 0.2 0.25 0.25 0.603 d Austria 0.2 0.2 0.2 0.2 0.2 0.704 0.2 0.25 0.4 0.238 Portugal 0.3 0.3 0.3 0.3 0 0.63 0.2 0.5 0.5 0.355 Ireland (2) 0.4 0.3 0.5 0.4 0.2 0.444 0.2 0.6 -0.4 0.653 Netherlan 0.3 0.4 0.4 0.5 1 0.524 0.3 0.25 0.4 0.774 ds 1 Manitoba and Saskatchewan provinces, Canada 2 Inconsistencies in implementation of universal BCG policy, [19] Observable factors Policy p-prep: health care preparedness, range 0 .. 1, lower to higher preparedness p-qlty: response measures, range 0 .. 1, lower to higher epidemiological policy quality; p-tme: response timing, range 0 .. 1, timely to delayed Conn: connection intensity, range 0 .. 1, lower to higher connection intensity Bcg: BCG immunization record, range 0 .. 1. The value of 0 indicates current or very recent universal immunization policy; the value of 1 indicates no effective immunization policy and equivalent cases [2]. A value between 0 and 1 indicates a previous universal immunization policy relative to the time after cessation. Smo: smoking prevalence in the population. In the cases with large disparity between genders and so on, the higher of values was taken. Den: population density. Due to significant variability in population density between the cases in the dataset, a logarithmic band scale was used; additionally, in cases with very large territory, a negative offset was added to account for non-homogeneousness of the distribution of individual cases and the delay in propagation of the epidemics due to geographical distance. A higher granularity analysis of national jurisdictions with very high geographical spread can be attempted in a future study. Age: age demographics, median age, logarithmic band of the deviation from the dataset mean, range: - 0.5 .. 0.5. Outcome parameter Impact: the epidemiological impact in the jurisdiction at the time of analysis measured as relative mortality per 1 Million capita (R.mpc), relative to world’s highest at the time.