<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Information Technology and Interactions, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Epidemiological Factor Analysis: Identifying Principal Factors with Machine</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Serge Dolgikh</string-name>
          <email>sdolgikh@nau.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oksana Mulesa</string-name>
          <email>Oksana.mulesa@uzhnu.edu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Aviation University</institution>
          ,
          <addr-line>1 Liubomyra Huzara Ave, 1, Kyiv, 03058</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Solana Networks</institution>
          ,
          <addr-line>301 Moodie Dr., Ottawa, K2H9C4</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Uzhghorod National University</institution>
          ,
          <addr-line>Narodna sq., 3, Uzhhorod, 88000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>0</volume>
      <fpage>2</fpage>
      <lpage>03</lpage>
      <abstract>
        <p>Based on a set of Covid-19 statistical data of national and subnational jurisdictions at the time point of approximately two months after the local onset of the pandemics (early April, 2020), an analysis of the factors with strong influence on the reported local outcomes was performed with several different statistical methods. The consistent conclusion of the analysis with the available statistical data confirms epidemiological policy and management as the dominant factors in the outcome. Other factors with significant influence on the development of epidemiological scenarios among the considered were current or recent universal Bacille Calmette-Guérin (BCG) immunization record and the prevalence of smoking in the population. The methods proposed in the study can be used to evaluate principal factors at a number of future time points to reach a confident conclusion. Infectious diseases, epidemiology, Covid-19, machine learning, statistical analysis</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A possible link between the effects of Covid-19 pandemics such as the rate of incidence and the
severity of cases on one hand; and a universal immunization program against tuberculosis with
Bacille Calmette-Guérin (BCG
vaccine and universal BCG immunization program
or UBIP,
hereinafter) was suggested in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and further investigated in a number of works, offering a novel and
interesting perspective on a possibility of relations between certain characteristics of jurisdictions and
development of epidemics. A number of factors with potential influence on the epidemiological
outcome have been discussed at length, such as population density, age demographics and other.
Identification of factors of significance for the development of epidemics, and methods allowing such
identifications can provide important inputs to development of effective policy.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Problem Statement</title>
      <p>A common challenge in the analysis of statistical data related to a developing situation, such as in
this work, the developing epidemiological scenario related to a dangerous infection with potentially
high impact on health and safety of population, economy and the society as a whole is evaluation of
methods and models with the objective of identifying the approaches that could be most effective in
describing the process that is being studied. Such a choice may itself depend on the problem and the
data. For one time series the best approach can be an autoregression model, for another, Brown model
or Winters models and so on.</p>
      <p>To avoid or reduce the possible ambiguity related to the selection of the method of analysis of
statistical data, in this work we used several common methods of statistical analysis specifically,</p>
      <p>2020 Copyright for this paper by its authors.
evaluation and ranking of factor influence with an expectation that if consistency between the results
of different methods can be achieved, it would enhance the confidence in the result that can be
essential for the development of reliable and effective policies based on the conclusions of factor
analysis.</p>
      <p>With a variety of statistical methods and techniques used to evaluate the correlation hypothesis as
discussed above, we set out to provide an analysis of principal factors influencing the development of
the epidemics in the national and subnational jurisdictions based on the available data for the first
group of countries that were exposed to Covid-19 pandemics in late January – beginning of February,
2020. This objective is approached by applying several commonly used methods of factor analysis
and ranking, looking for consistency of results between different methods. A consistency between the
results of different methods would improve confidence in the findings, providing a grounded and
reliable statement of their influence on the epidemiological outcome, and providing a confident and
informative input to the situation analysis and development of policy.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Literature Review</title>
      <p>
        Miller et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] provided one of the first indications of the possible link between BCG
immunization and milder course of the epidemics in the national jurisdiction. This link was further
investigated in a number of works with a consistent, with varying level of confidence, conclusion of
the significance of the correlation hypothesis. In [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ], a strong correlation between the BCG
immunization record and Covid-19 mortality in a number of culturally and socially similar European
countries was observed (R2 = 0.88; P = 8 × 10−7), indicating that every 10% increase in the BCG
index was associated with a 10.4% reduction in Covid-19 mortality. The results imposed strong
constraints on the null hypothesis (that is, of no correlation between a current or previous UBIP in the
jurisdiction and Covid-19 impact, suggesting that BCG may have a certain broad protective effect
resulting in a milder epidemiological scenario.
      </p>
      <p>
        A similar conclusion is supported by the results in [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ] establishing a strong correlation between a
current and previous record of a consistent UBIP, and lower values of Covid-19 outcomes in the
reporting jurisdictions, measured by infection incidence and the resulting mortality “The results …
show that countries without a universal BCG policy (such as Belgium, Italy, the United States, and
the Netherlands) have increased incidence of COVID-19 (2810.9 ± 497.1 (mean ± SEM) per million)
compared with countries with ongoing national BCG policy (570.9 ± 155.6 (mean ± SEM) per
million)” (Sharma et al., [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]).
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] qualitative and quantitative analysis of distribution of Covid-19 impacts among national and
subnational jurisdictions in Europe, North America and Middle East was performed with a number of
observations consistently pointing to a possibility of a correlation between UBIP and a milder type of
the epidemiological scenario, while in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] a quantitative statistical analysis of the significance of the
correlation at two time points imposed strong constraints on the null hypothesis excluded with a
Pvalue below 0.0001.
      </p>
      <p>
        The task of modeling and forecasting time-series processes of different nature is essential and
arises in different fields such as planning [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the study of the dynamics of climate change [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and
importantly in the current situation, health science and epidemiology. It involves the stages of
identification of parameters that can be measured; collection of representative sets of data; and
application of methods of analysis of data allowing to identify the factors with the highest influence
on the observed outcome.
      </p>
      <p>
        Known models and methods of factor analysis are based on using integrated information about the
background of the predicted processes [
        <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
        ]. Among the tasks of forecasting an important place is
occupied by the methods of factor estimation and time-series analysis that includes a variety of
methods and approaches including fuzzy sets [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], expert models and methods [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], genetic and
neural network methods [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ] and other.
      </p>
      <p>In application and analysis of the results with a wide array of methods of factor correlation,
significance and ranking consistency between the observed results is of primary importance as it
allows to distinguish and differentiate between spurious effects and / or artifacts of the particular
method or dataset, a genuine effect representing a reliable relation between a set of influencing factors
and the outcome of interest.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>
        The development of the Covid-19 epidemics in the national and subnational jurisdictions up to the
present point clearly shows that timing considerations can play major role in the epidemiological
scenario observed in any given case, and for that reason can be crucial in an accurate analysis of the
corresponding statistical data. To ensure the validity of the analysis from this perspective, in this work
two approaches were used: 1) the data was synchronized, or aligned with respect to the duration of the
development of the epidemics in a given jurisdiction, that is, the cases in the dataset were selected on
the basis of having similar time of the exposure to the pandemics. And in the cases where it was not
the case, 2) the statistical data of the case was resynchronized with respect to the reporting time point
to the same or similar time of development in the local jurisdiction. To simplify the synchronization,
the starting time point (time zero) of the global Covid-19 pandemics was defined in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as: December
31, 2019 (31.12.2019). The period of local exposure to the epidemics is shown in the format TZ + y
months is relative to start of the pandemics.
      </p>
      <p>
        The analysis of scientific publications, in particular [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ], showed that the following factors have a
strong influence on the development of the epidemic including but not limited to the following: the
time of the local development of the epidemics; traditions, social and lifestyle factors; demographics
including gender and age; the level of the economic and social development; quality standard and
epidemiological efficiency of the public healthcare system, and not in the least, the quality of public
health policy making and execution.
      </p>
      <p>In order to reduce the number of factors, national and subnational statistics from countries and
regions with similar social and economic situations were selected. The aim of the study was to
develop methods of reliable factor analysis and ranking by influence and verify that they can be
effective in identifying principal factors in the development of epidemiological scenarios. The data
and the methods are described in detail in this section.</p>
      <p>In conclusion it needs to be noted that the intent of the work at this stage in the development of the
situation was not to offer definitive answers as to the importance and ranking of certain factors of
influence but rather to establish an approach and a platform for repeated and continuous analysis at
different points in the time series of the cases as the situation develops that would allow to make a
confident conclusion about the epidemiological and social factors with strong influence on the course
of the epidemics.
4.1.</p>
    </sec>
    <sec id="sec-5">
      <title>Data</title>
      <p>In the first stage of the analysis we are going to use only the cases of the first wave of the
pandemics with the local arrival at approximately TZ + 1 month (i.e., end of January, 2020). Those
cases had sufficient time to develop by the time of collection of statistical data for the analysis. To
ensure consistency of the data for the analysis and reduce the number of potential factors of influence
of the group of identified Wave 1 cases, a subset of national and subnational cases satisfying the
following consistency criteria was selected:
1. The countries in the dataset were at the similar level of development, thus excluding the
influence of the factors such as the level of prosperity and development.
2. Sufficient level of confidence in the timeliness and accuracy of the statistical data
provided by reporting jurisdictions.
3. A certain minimal level of local exposure to the developing epidemics identified by a
minimum threshold number of cases.</p>
      <p>Based on these selection criteria and publicly available epidemiological information from a
number of trusted sources as indicated below, the dataset of 18 cases (Table 1) was constructed. The
data included one provincial jurisdiction in Canada (Ontario), one state (California) and one
municipal jurisdiction in the USA (New York City) and given the high geographical variation of the
impacts, data with more detailed geographical breakdown is expected in the future studies. The time
point at which the data was collected was TZ + 3 months, i.e. approximately two months of the local
development of the epidemics in the selected group of jurisdictions.</p>
      <p>The selection of national and subnational cases in the dataset allowed to exclude from
consideration several common factors. Among of them were the time of arrival of the epidemics to
the jurisdiction and local exposure; the level of prosperity and development; to a considerable degree,
demographics (although one related factor, the median age was used in the analysis) thus helping to
narrow down the number of potential factors with higher influence on the epidemiological scenario
developing in the jurisdiction of the case.</p>
      <p>In the preliminary analysis of the potential factors we found no obvious solutions to eliminating
the influence of the policy management including quality of the policy making and execution of
epidemics control policies and decisions; that in its turn includes a number of subfactors such as:
general preparedness, effective deployment and management plans, sufficient resources, informed and
trained personnel, effective and evidence-based policy making and execution and others. Due to time
and resource constraints at the time of preparation of the analysis, the only available solution was
found to model these parameters with a combined rating-type factor intended to reflect the overall
efficiency of the public health policy.</p>
      <p>The value of the factor was assigned manually based on the available information. An essential
caveat here is that such an assignment could potentially and implicitly include some level of
correlation with the observed outcomes, however at the short time of preparation of this analysis it
was the only option available. We expect that future works should be able to develop more precise
approaches and methods for evaluation of policy effectiveness.</p>
    </sec>
    <sec id="sec-6">
      <title>4.1.1. Influencing Factors</title>
      <p>The following set of factors was considered in the analysis that follows:
1. Policy: a ranking parameter measuring the effectiveness of the epidemiological policy in the
jurisdiction, range: 0 – 0.5, from most to less effective. The factors in evaluation of this parameter
were: timeliness of response; clarity and consistency of the policy; and epidemiological preparedness
of the public healthcare system to handle the onset of the epidemics. Given the challenges described
earlier, an objective evaluation of this parameter will require further work.</p>
      <p>
        2. UBIP level: defined in the range 0 – 0.5, with 0 representing band A [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] (i.e. a current
universal BCG immunization program) and 0.5 – no UBIP (band C). The values in between were
assigned in proportion to the time lag between the cessation of UIP and the time of the analysis. Some
corrections were made for the cases where immunization was administered at an older age or only
within a short time for example Spain (16 years).
      </p>
      <p>3. Smoking prevalence: range 0 – 0.5, defined as the rate of smoking in per-cent in the
population. Where a significant gender difference existed in the population with respect to this factor,
the higher value was taken as it’s expected to have a greater influence on the outcome.</p>
      <p>4. Population density: the total population in the jurisdiction per 1 sq.km of the total area, divided
by 100; we recognize that in some cases such as of very large area, averaging population over the area
may lead to less consistent results; a more detailed analysis with more precisely defined geographic
boundaries of the cases is intended for a future study.</p>
      <p>5. Age demographics: the median age in the reporting national or subnational jurisdiction, divided
by 100.</p>
      <p>Epidemiological Outcome</p>
      <p>Given considerable differences in testing practices between the reporting jurisdictions, particularly
in the early phase of the epidemics, mortality per capita was chosen as a more stable and reliable
indicator of the impact of the epidemics per case. Given the large spread in the range of
epidemiological outcomes between the cases in the dataset, a logarithmic scale was used in the
evaluation of the impact of the epidemics represented by Measured Value parameter (MV) as the
logarithm of mortality per capita (in cases per 1M of population in the jurisdiction):
,  ) = 

⁡(

  , 
⁡( )
)
(1)</p>
      <sec id="sec-6-1">
        <title>Taiwan</title>
      </sec>
      <sec id="sec-6-2">
        <title>Japan</title>
      </sec>
      <sec id="sec-6-3">
        <title>Singapore</title>
      </sec>
      <sec id="sec-6-4">
        <title>Australia</title>
      </sec>
      <sec id="sec-6-5">
        <title>South Korea</title>
      </sec>
      <sec id="sec-6-6">
        <title>Finland</title>
      </sec>
      <sec id="sec-6-7">
        <title>Canada</title>
      </sec>
      <sec id="sec-6-8">
        <title>Ontario (Can.)</title>
      </sec>
      <sec id="sec-6-9">
        <title>Sweden</title>
      </sec>
      <sec id="sec-6-10">
        <title>Germany UK</title>
      </sec>
      <sec id="sec-6-11">
        <title>France</title>
      </sec>
      <sec id="sec-6-12">
        <title>Italy</title>
      </sec>
      <sec id="sec-6-13">
        <title>Belgium</title>
      </sec>
      <sec id="sec-6-14">
        <title>California</title>
        <p>NYC (USA)
USA
Sources:</p>
        <p>
          Epidemiological outcome (incidence and mortality) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]
World BCG atlas [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]
World data: smoking [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], world population data [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]
National and subnational jurisdictions Covid-19 information [
          <xref ref-type="bibr" rid="ref19 ref20 ref21 ref22">19-22</xref>
          ].
        </p>
        <p>Reservations and qualifications:
1. Consistency and reliability of data: the statistics on the current epidemiological outcomes
reported by the national, regional and local health administrations can be affected by specific
practices and policies of reporting jurisdictions.</p>
        <p>2. An exact alignment in the time of reported data could not be confidently ascertained due to
differences in the reporting practices between the jurisdictions.</p>
        <p>Finally, it is essential to note that the analysis that follows provides a statement for a single point
in the time series and that the dataset would be updated in the future at a number of points in the
course of development of the epidemics. Repeating the analysis at a number of time points in the
series should be able to provide more confident statement about the influence of specific factors on
the development of the epidemiological scenario.
4.2.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Factor Analysis Methods</title>
      <p>Several statistical methods were used to evaluate the influence of the selected factors to measure
the consistency of obtained results.</p>
      <p>
        1. Calculation of correlation between the resulting effect (MV) and specific factor;
2. Linear regression by single factor and a combination of factors [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]
3. Evaluation of factor importance with Random Forest regression [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]
4. Evaluation of factor influence or rank with SelectKBest, a feature ranking method in sklearn
machine learning and data analysis library [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
      </p>
      <p>Method 1 calculates the correlation coefficient between the outcome variable (MV) and the factor
of interest. An absolute value closer to 1 indicates stronger correlation between the resulting effect
and the factor.</p>
    </sec>
    <sec id="sec-8">
      <title>5. Results 5.1.</title>
    </sec>
    <sec id="sec-9">
      <title>Single Factor Analysis</title>
      <p>Method 2 produces the best fit linear approximation of the selected factors on the series of the
recorded outcome (MV) and the total deviation from the trend. Comparing the error for different
combinations of influencing factors can show which of the factors were most effective in
approximating the resulting outcome.</p>
      <p>Methods 3 and 4 produce ranking of factors with the highest influence on the value of the outcome
variable.</p>
      <p>In this section we present the results of individual and multi-factor analysis as well as a brief
discussion of the findings.</p>
      <p>The influence of selected individual factors as defined in Section 2.1.1 is shown in Table 2:</p>
      <sec id="sec-9-1">
        <title>Factor</title>
      </sec>
      <sec id="sec-9-2">
        <title>Correlation, MV</title>
      </sec>
      <sec id="sec-9-3">
        <title>Policy</title>
      </sec>
      <sec id="sec-9-4">
        <title>UBIP</title>
      </sec>
      <sec id="sec-9-5">
        <title>Smoking</title>
      </sec>
      <sec id="sec-9-6">
        <title>Age demographics</title>
        <p>As can be seen from the results in the table, all methods produced consistent results with the same
rating of the evaluated factors. Apart from the policy factor for which as already discussed, a strong
correlation can be expected, the strongest influence factor for the data in the analysis were universal
BCG immunization (UBIP), with a strong positive correlation value of 0.81, and the smoking
prevalence, at 0.32.</p>
        <p>
          The latter can be expected to be a factor of significance in the epidemics due to already established
link with a number of conditions, including respiratory [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]; as a standalone factor it did not show a
strong influence on the recorded outcome, however it can have noticeable influence as a secondary
factor as discussed in the next section.
        </p>
        <p>
          In the light of the information about generally less severe outcomes for younger population [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] a
stronger negative correlation of the epidemiological outcome with the age demographics could have
been hypothesized and expected; however the results of the single factor analysis with linear
regression can be explained by a competition of factors, such as: 1) higher susceptibility of the older
population group favoring the negative correlation of the recorded outcome with the median age in
the jurisdiction of the case, versus higher social contact and mobility of the younger population, that
can and was shown in a number of cases, to stimulate the spread of the epidemics and thus, driving
the trend in the opposite direction.
        </p>
        <p>The opposing trends would be more likely to produce a less pronounced overall influence of the
age demographics on the epidemiological outcome in the jurisdiction, and correspondingly, a lower
than expected value of the significance for this factor. A more detailed and specific study will be
needed to investigate the interaction of these factors in sufficient detail.
5.2.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Multiple Factor Analysis</title>
      <p>In this section the cumulative effect of the combination of factors with the highest significance of
the correlation with the measured epidemiological outcome as established in the previous section,
namely: the epidemiological policy in the jurisdiction; the record of universal immunization (UBIP);
and smoking prevalence on the epidemiological outcome, measured as discussed previously, by
logarithmic mortality per capita of the overall population in the jurisdiction, was evaluated with
multiple factor linear regression.</p>
      <p>The combination of factors was calculated as a weighed sum of factor values. In the first iteration
of the analysis the weights of the factors were assigned uniformly due to insufficient historical data
for more precise evaluation of weights.</p>
      <p>Notes</p>
      <p>In the cases with very large geographic area and correspondingly, low population density, a
correction offset was added to account for a slower rate of development of the epidemics as follows:
Canada, Australia: 0.2; Finland, Ontario, USA: 0.1; adding this correction did not change the outcome
of the analysis essentially.</p>
      <p>The results of the multi-factor analysis are presented in Table 3.</p>
      <p>As can be observed immediately from the results, the combination of three factors with the highest
single-factor influence: the policy, BCG immunization and smoking prevalence had the highest
correlation, and the lowest linear regression error with the recorded epidemiological outcome.</p>
      <p>The results also confirm UBIP as the second most influential factor among the considered, with
the data available at the time. Indeed, the highest decrease in the correlation coefficient value after
removing a factor from the cumulative sum was seen for the policy (11.6%) confirming it as the most
influential factor among the considered, and the lowest, smoking prevalence (1.6%). Removing UBIP
from the cumulative sum of the factors resulted in the correlation decrease of 8.1%, noticeably higher
than other secondary factors among the considered.</p>
      <p>The findings of this analysis can be illustrated by plotting the dependency of the epidemiological
outcome (Y-axis) on the cumulative value of the dominant factors identified in the single-factor
analysis (X-axis).</p>
      <p>The diagram on the left side shows the functional relationship of epidemiological outcome in
mortality per capita in the cases in the studied dataset with the weighted sum of the principal factors
identified in the single factor analysis; whereas the one on the right shows the dependency of the
logarithmic impact value (1) on the combined value of principal factors. A clear exponential trend can
be seen in the left-side diagram vs. a linear one on the right, confirming the conclusions of Sections
3.1 and 3.2 on the significance of the identified principal factors of influence established with the
selected methods in the earlier sections.</p>
      <p>A number of outlier cases with higher than the trend impact can be seen clearly in the diagram on
the right as well; these can be attributed to ternary and other factors as well as the possibility of
statistical fluctuations that appear to be common occurrence with Covid-19; a detailed analysis of the
other potential factors of influence will require further study.
5.3.</p>
    </sec>
    <sec id="sec-11">
      <title>Specific Cases</title>
      <p>Some observations on the influence of specific influencing factors such as UBIP and smoking can
be derived from comparison of specific cases in the dataset. While at the time of writing these cases
were anecdotal and may not be sufficient for a statistically confident conclusion, they can provide
some directions and rationale for further studies.</p>
      <p>
        For example, comparing the incidence and the recorded epidemiological outcome between
countries of Northern Europe, with similar development, cultural, demographics and some of the
other identified factors show strong correlation with the period after cessation of UBIP in the country
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A similar pattern can be seen by comparing national cases with differences in immunization
programs in southern Europe.
In another example, all countries in the Asia group have similar values of most known factors,
including the alignment in the time of the epidemics onset, the policy and BCG immunization (all
countries are in the UBIP group A). The analysis of the case data clearly shows that countries with
higher smoking prevalence: South Korea and Japan have recorded higher impact of the epidemics
than those with lower smoking rates (Taiwan, Singapore).
      </p>
      <p>A similar pattern can be seen with some cases in South America. Neighboring countries, with
similar values of the other considered factors but with significantly different smoking rates such as:
Ecuador – Peru, Chile – Argentina also show significant difference in Covid-19 impact.</p>
      <p>Understandably, statistical fluctuations are certainly possible with a relatively small dataset used in
the study and a confident conclusion can be reached with monitoring and repeated analysis of this
trend over an extended period at a number of different time points.</p>
      <p>These observations may point at a possibility of significance of some of the secondary factors such
as smoking prevalence in the population in the earlier example for the overall epidemiological
outcome, however a confident conclusion would require an analysis with more data and will be
attempted in a future work.</p>
    </sec>
    <sec id="sec-12">
      <title>6. Conclusion</title>
      <p>The approaches in epidemiological factor analysis demonstrated in this work with an early
Covid19 epidemiological dataset of selected national and subnational jurisdictions and based on a number
of well-known methods of data and factor analysis can be used in identification of factors with the
strong influence on the development of the epidemics. This information can be instrumental in
development of effective responses and policies in public health care system to minimize the impact
of the epidemics and protect the population.</p>
      <p>
        The findings confirm the importance of clear, timely and evidence-based epidemiological policy
[
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] as the factors with the highest influence on the development of the epidemiological scenario.
This finding is consistently produced by all methods of analysis used in the study.
      </p>
      <p>
        The results reported in this work offer additional arguments in support of the hypothesis of some
form of general population-wide protection effect against Covid-19 as an effect of previous universal
immunization program with Bacillus Calmette–Guérin vaccine (BCG), that has been reported in a
number of earlier results [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ], adding arguments to the rationale for further studies of the possible
correlation and the mechanisms of such general protection with potential benefits that may extend
beyond Covid-19 pandemics.
      </p>
      <p>Additionally, the analysis pointed at significance of secondary factors such as smoking prevalence
consistently confirmed by several independent methods. The findings of this study can be
instrumental in development of epidemiological models, forecasting epidemiological scenarios and as
an input to development of effective policy to control and contain the spread of the infection, with
potential applications beyond Covid-19.</p>
      <p>In conclusion, the authors would like to emphasize that the results reported in this study should not
be taken as a definitive statement of a correlation between the investigated factors and the resulting
effect as they relate to a single point in the time series of epidemiological scenarios in the considered
cases. Rather, they are relevant as an evaluation of methods and demonstration of an approach that
can be applied repeatedly over a time series of epidemiological data, allowing to reach confident
conclusions by establishing and analyzing the trend over an extended period of time.</p>
    </sec>
    <sec id="sec-13">
      <title>7. Acknowledgements</title>
      <p>The authors are grateful to the colleagues at the Information Technology Department, National
Aviation University and Uzhhorod National University for valuable discussion of the methods and
findings of this study.</p>
      <p>This work received no specific funding.</p>
    </sec>
    <sec id="sec-14">
      <title>8. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <surname>M-J. Reandelar</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Fasciglione</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Roumenova</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>G.H.</given-names>
          </string-name>
          <string-name>
            <surname>Otazu</surname>
          </string-name>
          .
          <article-title>Correlation between universal BCG vaccination policy and reduced morbidity and mortality for COVID-19: an epidemiological study</article-title>
          , medRxiv
          <year>2020</year>
          .
          <volume>03</volume>
          .24.20042937.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dolgikh</surname>
          </string-name>
          .
          <article-title>Further evidence of a Possible Correlation Between the Severity of Covid-19</article-title>
          and
          <string-name>
            <given-names>BCG</given-names>
            <surname>Immunization</surname>
          </string-name>
          , MedRxiv doi: 10.1101/
          <year>2020</year>
          .04.07.
          <issue>20056994v1</issue>
          <year>April 2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Escobar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Molina-Cruz</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Barillas-Mury BCG vaccine protection from severe coronavirus disease 2019 (COVID-19)</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          ,
          <volume>117</volume>
          (
          <issue>30</issue>
          ),
          <year>2020</year>
          ,
          <fpage>17720</fpage>
          -
          <lpage>17726</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bucci</surname>
          </string-name>
          , E. Carafoli, G. Melino,
          <string-name>
            <surname>G. Das.</surname>
          </string-name>
          <article-title>BCG vaccination policy and preventive chloroquine usage: do they have an impact on COVID-19 pandemic? Cell death</article-title>
          &amp; disease,
          <volume>11</volume>
          (
          <issue>7</issue>
          ),
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yitbarek</surname>
          </string-name>
          , G. Abraham,
          <string-name>
            <given-names>T.</given-names>
            <surname>Girma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tilahun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woldie</surname>
          </string-name>
          .
          <article-title>The effect of Bacillus CalmetteGuérin (BCG) vaccination in preventing sever infectious respiratory diseases other than TB: implications for the COVID-19 pandemic</article-title>
          .
          <source>Vaccine 2020</source>
          <volume>38</volume>
          (
          <issue>41</issue>
          ),
          <year>2020</year>
          ,
          <fpage>6374</fpage>
          -
          <lpage>6380</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dolgikh S</surname>
          </string-name>
          . Covid-19
          <string-name>
            <surname>vs</surname>
            <given-names>BCG</given-names>
          </string-name>
          :
          <article-title>Statistical Significance Analysis</article-title>
          , MedRxiv, doi: 10.1101/
          <year>2020</year>
          .06.08.
          <year>20125542v2</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Kuharev</surname>
            ,
            <given-names>V.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sally</surname>
            ,
            <given-names>V.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erpert</surname>
            <given-names>A.M.</given-names>
          </string-name>
          <article-title>Economic-mathematical methods and models in the planning and management</article-title>
          . Kiev: Vishcha School,
          <volume>328</volume>
          (
          <year>1991</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Kozadaev</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arzamasians</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          <article-title>Prediction of time series with the apparatus of artificial neural networks. The short-term forecast of air temperature</article-title>
          .
          <source>Bulletin of the University of Tambov. Series: Natural and Technical Sciences, №3, is 11</source>
          ,
          <fpage>299</fpage>
          -
          <lpage>304</lpage>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Snytiuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Ye</surname>
          </string-name>
          . Forecasting. Models. Methods. Algorithms: Tutorial. K. Maklaut,
          <volume>364</volume>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Mulesa</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <article-title>Information Technology for time series forecasting with considering fuzzy expert evaluations, 12th international scientific</article-title>
          and technical conference “Computer Science and Information Technologies - CSIT
          <year>2017</year>
          ”
          <article-title>(Lviv</article-title>
          , Ukraine),
          <fpage>105</fpage>
          -
          <lpage>108</lpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Mendel</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          <article-title>Method counterparts in predicting short time series: expert-statistical approach</article-title>
          .
          <source>Machine Telemechanics</source>
          ,
          <volume>4</volume>
          ,
          <fpage>143</fpage>
          -
          <lpage>152</lpage>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Zaichenko</surname>
            ,
            <given-names>Y.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohammed</surname>
            ,
            <given-names>Shapovalenko N.V.</given-names>
          </string-name>
          <article-title>Fuzzy neural networks and genetic algorithms in problems of macroeconomic forecasting</article-title>
          .
          <source>Scientific news</source>
          ,
          <volume>4</volume>
          ,
          <fpage>20</fpage>
          -
          <lpage>30</lpage>
          (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Kasabov</surname>
            ,
            <given-names>N. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <article-title>DENFIS: dynamic evolving neural-fuzzy inference system and its application for time-series prediction</article-title>
          .
          <source>IEEE Transactions in Fuzzy Systems</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ),
          <fpage>144</fpage>
          -
          <lpage>154</lpage>
          (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Zwerling</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Behr</surname>
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verma</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brewer</surname>
            <given-names>T.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menzies</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pai</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>The</surname>
            <given-names>BCG</given-names>
          </string-name>
          <article-title>World Atlas: a database of global BCG vaccination policies and practices</article-title>
          .
          <source>PLOS Medicine, doi: 10.1371/journal.pmed.1001012</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>BCG</given-names>
            <surname>World</surname>
          </string-name>
          <string-name>
            <surname>Atlas</surname>
          </string-name>
          , URL: http://www.bcgatlas.org/
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <article-title>Coronavirus data and map</article-title>
          , URL: https://www.google.com/covid19-map/ (4.
          <fpage>04</fpage>
          .
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17] Our World in Data: World smoking prevalence, URL: https://ourworldindata.org/smoking (4.
          <fpage>04</fpage>
          .
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Worldometers</surname>
          </string-name>
          :
          <article-title>Population data</article-title>
          , URL: https://www.worldometers.info/world-population/ (4.
          <fpage>04</fpage>
          .
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Canada</surname>
            Covid-19
            <given-names>Situation</given-names>
          </string-name>
          <string-name>
            <surname>Update</surname>
          </string-name>
          , URL: https://www.canada.ca/en/publichealth/services/diseases/2019-novel
          <article-title>-coronavirus-infection.html?topic=tilelink (4</article-title>
          .
          <fpage>04</fpage>
          .
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Taiwan</given-names>
            <surname>Center for Disease Control</surname>
          </string-name>
          Covid-
          <volume>19</volume>
          information, URL: https://www.cdc.gov.tw/En/Category/ListContent/bg0g_VU_Ysrgkes_
          <source>KRUDgQ (30.03</source>
          .
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>CDC</given-names>
            <surname>Covid</surname>
          </string-name>
          <article-title>-</article-title>
          19
          <string-name>
            <surname>Advice</surname>
          </string-name>
          , URL: https://www.cdc.gov/coronavirus/2019-ncov/index.html (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>NHS</surname>
          </string-name>
          Covid-
          <volume>19</volume>
          Advice, URL: https://www.nhs.uk/conditions/coronavirus-covid-
          <volume>19</volume>
          / (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Freedman</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <source>Statistical Models: Theory and Practice</source>
          . Cambridge University Press (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <article-title>Random Forest regression, sklearn-kit</article-title>
          , URL: https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.
          <article-title>html?highlight=ra ndom%20forest#sklearn</article-title>
          .ensemble.RandomForestRegressor
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <article-title>SelectKBest feature ranking and selection</article-title>
          , URL: sklearn-kit https://scikitlearn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Johns</given-names>
            <surname>Hopkins</surname>
          </string-name>
          <article-title>Medicine: Smoking and respiratory diseases</article-title>
          , URL: https://www.hopkinsmedicine.org/health/conditions-and
          <article-title>-diseases/smoking-and-respiratorydiseases</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Levin</surname>
            <given-names>A.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cochran</surname>
            <given-names>K.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walsh</surname>
            <given-names>S.P.</given-names>
          </string-name>
          ,
          <article-title>Assessing the age specificity of infection fatality rates for COVID-19: meta-analysis &amp; public policy implications</article-title>
          ,
          <source>National Bureau of Economic Research</source>
          , working paper No.
          <volume>27597</volume>
          ,
          <year>July 2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Zeka</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tobias</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leonardi</surname>
            <given-names>G.</given-names>
          </string-name>
          , et al.
          <article-title>Responding to COVID-19 requires strong epidemiological evidence of environmental and societal determining factors</article-title>
          .
          <source>The Lancet</source>
          ,
          <volume>4</volume>
          (
          <issue>9</issue>
          ),
          <fpage>375</fpage>
          -
          <lpage>376</lpage>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>