<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Statistical Approach for COVID-19 Pandemic Data Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sunday Adeola Ajagbe</string-name>
          <email>saajagbe@pgschool.lautech.edu.ng</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ismaila Muritala</string-name>
          <email>MuritalaI@unizulu.ac.za</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pragasen Mudali</string-name>
          <email>MudaliP@unizulu.ac.za</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthew O Adigun</string-name>
          <email>adigunm@unizulu.ac.za</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Abiola Ajimobi Technical University</institution>
          ,
          <addr-line>Ibadan</addr-line>
          ,
          <country country="NG">Nigeria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Osun State University</institution>
          ,
          <addr-line>Osogbo</addr-line>
          ,
          <country country="NG">Nigeria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Zululand</institution>
          ,
          <addr-line>Kwadlangezwa</addr-line>
          ,
          <country country="ZA">South Africa</country>
        </aff>
      </contrib-group>
      <fpage>165</fpage>
      <lpage>174</lpage>
      <abstract>
        <p>This study investigated the statistical insight of big data analytics for efective pandemic analysis. The COVID19 pandemic dataset, containing 5010 negatives and 4990 positives, was obtained from a prominent source known as the Kaggle repository. The dataset was categorized into four classes: COVID-19, Normal, Pneumonia, and Tuberculosis using diverse statistical metrics of image characteristics: mean intensity of a color channel, homogeneity, dissimilarity, correlation, and density. The objective of the statistical analysis was to validate the use of big data analysis for pandemic preparedness, with a working hypothesis that the intensity of the colour channel images would be diferent (p &lt; 0.05) between COVID-19 patients and patients with other health conditions. A statistical package for the Social Sciences version 20 was used for the analysis. The outcomes show that the mean intensity of color channels (Blue, Green, Red) in COVID-19 and Tuberculosis cases was greater in the blue and green channels than in Normal and Pneumonia cases. Pneumonia and normal cases exhibited comparable and reduced mean intensities across all three channels. The Normal condition exhibited the largest mean contrast, whereas Pneumonia displayed the lowest. The mean dissimilarity and homogeneity indicate that tuberculosis demonstrated the highest dissimilarity, signifying greater diversity in pixel intensity. COVID-19 and Pneumonia exhibited comparable homogeneity; however Tuberculosis demonstrated more homogeneity. The mean correlation indicates that Normal images exhibited the highest correlation, whilst Tuberculosis demonstrated the lowest. The mean density study indicates that normal cases demonstrated the highest mean density, whereas tuberculosis exhibited the lowest mean density. In conclusion, our findings established the advantages of statistics for modeling and analyzing big data in the pandemic domain, covers the state-of-the-art for practical analysis.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Big data analytics</kwd>
        <kwd>Pandemic analysis</kwd>
        <kwd>Statistical analysis</kwd>
        <kwd>Machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Big Data Analytics have become common in many areas, from research to practice, and even in everyday
life. It is a prominent tool in pandemic preparedness and control as an aspect of the fourth industrial
revolution (4IR) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. COVID-19 was first found in Wuhan, China, in December 2019 and has spread
worldwide. Chinese and global epidemiological studies show person-to-person transfer [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Sneezing
and coughing spread harmful COVID-19. COVID-19 was spread by touching infected surfaces or hands
and then their lips, nose, or eyes. Statisticians must prove their outcomes are not random [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. Failure
prevention and planning are often driven by statistics [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. Failures in pandemic and infectious diseases
control can lead to economic loss and loss of life.
      </p>
      <sec id="sec-1-1">
        <title>Recently, Mwamnyange et al., [8] proposed a big data analytics platform for childhood infectious</title>
        <p>
          illness surveillance and response. The system helps healthcare professionals track, monitor, and analyze
infectious illness reports via social media to prevent and control child-related diseases. The proposed
technique was validated using use-case scenarios and performance comparisons [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. However, statistical
insights might be gained from data analytics for efective infectious diseases detection, but it was not
considered. To provide some concrete examples, artificial intelligence (AI) was used to analyze incident
cases on the website. The AI Incident database (2021) compiles news articles from several sources.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>We discovered that 72 of the 126 occurrences that have been reported thus far may be connected</title>
        <p>
          to dependability problems. Based on the 72 reliability-related incidents, despite the ubiquity, it is
also observed that out of those 72 occurrences, 29 incidences resulted in fatalities or serious injuries,
demonstrating that dependability problems may cause significant damage [
          <xref ref-type="bibr" rid="ref10 ref3">3, 10</xref>
          ]. From a diferent
angle, public trust is necessary for the widespread use of AI technology.
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>Big data analytics are needed to manage infectious diseases. Massive data sets require statistics to</title>
        <p>comprehend, analyze, and predict infectious disease patterns. AI technology is still growing and has
dificulties. Challenges include dependability, safety, durability, and trustworthiness. Dependability is
important because AI systems must be trusted. The system may work as intended. Resource allocation
and public health crisis response depend on these criteria. Examine reliability, strategy, and failures.</p>
      </sec>
      <sec id="sec-1-4">
        <title>Recently developed AI dependability statistical analysis, especially for statisticians, highlights statistical</title>
        <p>
          research concerns. The authors discussed AI dependability’s out-of-distribution detection, training set
influence, adversarial attacks, model correctness, and uncertainty quantification. A few examples show
study options. A recent study explored AI reliability evaluation, data gathering, and test preparation,
and how to improve system designs for AI dependability. Since data is vital for reliability assessment
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Data on infectious diseases are initially analyzed using descriptive statistics and AI applications.
        </p>
      </sec>
      <sec id="sec-1-5">
        <title>The key variables, like mean intensity of a color channel, standard error of the mean (SEM), mean correlation, mean homogeneity, mean dissimilarity, mean correlation, and mean density, are crucial to pandemic analysis. Statisticians can map some variables to the geographic and demographic spread of pandemic diseases using big data sources search patterns.</title>
      </sec>
      <sec id="sec-1-6">
        <title>The objective of this study is to investigate the feasibility of efective pandemic investigation using statistical analysis. Specifically, the statistical package for Social Sciences was used for the analysis: mean intensity of a Color channel, SEM, mean correlation, mean homogeneity, mean dissimilarity, mean correlation, and mean density.</title>
      </sec>
      <sec id="sec-1-7">
        <title>This study is structured as follows. Section 2 provides a literature review to emphasize the necessity</title>
        <p>of the present investigation. Section 3 delineates the approach, attributes of the dataset, and the research
instrument. Section 4 presents a comprehensive analysis of statistical insights on big data analytics
for the efective detection of infectious diseases. Section 5 finishes the research and proposes future
endeavors for the implementation of efective pandemic control utilizing statistical tools.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>Big Data Analytics is crucial for advancing the Fourth Industrial Revolution and tackling the challenges
presented by the pandemic. The Fourth Industrial Revolution is a significant transformation
characterized by the convergence of physical, digital, and biological technologies. Currently, Big Data propels
substantial progress in automation and robotics, enabling these systems to analyze operational data for
enhanced eficiency and flexibility. It also enables the IoT, where interconnected devices across diverse
platforms generate vast data that, when analyzed, can improve operational eficiencies and predictive
maintenance capabilities. Moreover, in smart manufacturing, Big Data enhances the optimization of
production processes, minimizes downtime, and enables real-time customization of output. During the</p>
      <sec id="sec-2-1">
        <title>COVID-19 pandemic, Big Data has proven its essential significance across multiple crucial sectors. It</title>
        <p>
          has enabled efective epidemiological surveillance by analyzing data from several sources, including
travel records, social media, and government reports, to track the virus’s spread [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>In pandemic management, analytics have enhanced hospital resource allocation, forecasted patient</title>
        <p>outcomes, and tailored treatment options. Furthermore, the rapid progress and distribution of
vaccinations have been expedited by Big Data, which has enabled the analysis of massive clinical trial data. It
has also impacted public policy, aiding governments in developing specific lockdown measures and
informed reopening strategies based on real-time data concerning infection rates and public sentiment.</p>
      </sec>
      <sec id="sec-2-3">
        <title>The use of Big Data in addressing COVID-19 and advancing Fourth Industrial Revolution technologies</title>
        <p>has accelerated the adoption of digital solutions, including telemedicine, remote work, and online
education, which are vital to the current digital transformation. This integration highlights the essential role
of Big Data in crisis management and as a core component of digital and industrial advancement in the</p>
      </sec>
      <sec id="sec-2-4">
        <title>Fourth Industrial Revolution, demonstrating its capacity for comprehensive, real-time decision-making across public health, economic, and industrial sectors [4, 11].</title>
      </sec>
      <sec id="sec-2-5">
        <title>Hong et al. [11] explored statistical methodologies for assessing AI reliability, particularly for au</title>
        <p>tonomous systems. The authors introduce the "SMART" statistical framework to evaluate the robustness
of AI models used in various domains, including healthcare. The study highlights how AI reliability
decreases in out-of-distribution settings and suggests that statistical models outperform deep learning in
long-term failure prediction. However, it lacks real-world infectious disease applications and validation
on epidemiological datasets.</p>
      </sec>
      <sec id="sec-2-6">
        <title>Another study highlights AI’s role in infectious disease monitoring and projection, emphasizing data-driven approaches for disease surveillance. It demonstrates high sensitivity (91%) in predicting disease outbreaks through AI models but sufers from a lack of statistical validation, poor interpretability, and limited disease-specific analysis [12].</title>
      </sec>
      <sec id="sec-2-7">
        <title>Fei et al. [13] also provided a broad overview of big data analytics applied to healthcare, particularly</title>
        <p>during COVID-19. It discusses statistical inference challenges in high-dimensional data, highlighting
the efectiveness of Bayesian models in long-term trend prediction. However, it has a limited focus on
disease classification and weak integration of statistical methods in real-time applications.</p>
        <p>
          Mwamnyange et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] developed a big data analytics framework for tracking childhood infectious
diseases using a modified MapReduce algorithm. While it enhances early outbreak detection through
social media analytics and reduces processing time for large epidemiological datasets, it lacks statistical
validation of detection accuracy and focuses more on data processing eficiency than disease
classification. Olaboye et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] presented a paper on policy and technical framework for using big data analytics
in epidemic forecasting. It improves forecasting accuracy with multi-source data integration, but has
limited classification models for specific diseases and underutilizes machine learning for outbreak
detection.
        </p>
      </sec>
      <sec id="sec-2-8">
        <title>Panah et al. [14] investigated the integration of AI and big data analysis with public health infras</title>
        <p>tructure for early detection of infectious disease outbreaks. The authors propose an AI-driven data
integration framework leveraging social media, wearable technology, and traditional clinical databases.</p>
      </sec>
      <sec id="sec-2-9">
        <title>While the paper emphasizes AI’s potential in public health surveillance, it lacks a detailed statistical validation framework and real-time adaptive learning methodologies.</title>
      </sec>
      <sec id="sec-2-10">
        <title>Adegoke et al. [15] presented a paper that provides a comprehensive review of data analytics models</title>
        <p>used for disease outbreak prediction. It evaluates traditional statistical methods such as time-series
forecasting and Bayesian inference, alongside AI-based models. The study highlights the importance
of integrating epidemiological data with environmental and social media data for robust predictions.</p>
      </sec>
      <sec id="sec-2-11">
        <title>However, it does not propose a standardized statistical framework for evaluating model performance across diferent data sources.</title>
      </sec>
      <sec id="sec-2-12">
        <title>Piontti et al. [16] explored computational models for infectious disease forecasting, focusing on data</title>
        <p>science methodologies such as agent-based modeling and network theory. It provides insights into
how big data can enhance disease spread predictions, but lacks emphasis on statistical uncertainty
quantification and model generalizability across varying outbreak conditions.</p>
      </sec>
      <sec id="sec-2-13">
        <title>Zhou et al. [17] investigated an integrated health big data system in China for infectious disease prevention and control to detect dengue fever, tuberculosis (TB), and vaccination gaps in migrant children. The system outperformed traditional surveillance methods by identifying more cases of TB and dengue and more children with incomplete vaccinations.</title>
      </sec>
      <sec id="sec-2-14">
        <title>Michael and Krishnan [18] studied how big data analytics afects healthcare prediction insights. The</title>
        <p>study refines predictive models for disease progression, risk assessment, and individualized treatment
using EHRs, medical imaging, genomic data, and wearable sensors. Case studies show it reduces hospital
remissions and optimizes healthcare resource management. AI-driven healthcare analytics is lauded,
but real-time flexible models and scalability across healthcare systems are not.</p>
        <p>Table 1 contains a summary of the review of the related studies on big data analytics for pandemic
Summary of the review of the related studies on big data analytics for pandemic classification.</p>
        <p>Methodology Dataset Type Results Limitations
SMART framework, survival analy- Simulated AI reliability AI reliability decreases in No epidemiological
validasis, hazard functions datasets out-of-distribution; statisti- tion, not focused on disease
cal models perform better in detection
failure prediction
AI-based ML (SVM, RF, Neural Net- Syndromic surveillance, 91% sensitivity in outbreak Lacks statistical validation,
works), social media and genomic search trends, clinical data prediction poor model interpretability
data fusion
High-dimensional statistical mod- COVID-19 epidemiological Bayesian models outper- Limited disease
classificaels (LASSO, Bayesian Inference), data, EHRs form AI in long-term trend tion focus, weak integration
computational epidemiology prediction of statistical methods
MapReduce, Hadoop-based data in- Clinical records, social me- Faster epidemiological data Lacks statistical validation,
tegration dia, surveillance data processing, early outbreak prioritizes data processing</p>
        <p>detection over classification
AI-driven data integration, public Social media, wearable tech, Enhanced real-time disease Lacks statistical
validahealth surveillance, wearable data clinical databases tracking tion, no adaptive learning
fusion methodology
Investigation of statistical and AI Epidemiological, environ- Identifies gaps in predictive No standardized framework
models (Bayesian inference, time- mental, social media data analytics models for model comparison
series, ML)
Agent-based modelling, network Global epidemiological and Insights into disease spread Lacks statistical uncertainty
theory, computational epidemiol- mobility data via big data quantification, limited
ogy model generalizability
Investigates an integrated health Empirical study with com- Health data from clinics, AI-based screening, big data
big data system for detecting and parative analysis. hospitals, and government analytics, disease detection
preventing infectious diseases. records in China. algorithms.</p>
        <p>Investigates the role of data analyt- Case study analysis with EHRs, medical imaging, ge- ML predictive modeling,
ics in predictive healthcare, focused machine learning models ap- nomic sequencing, wearable statistical analysis, Apache
on risk assessment and treatment plied to patient data. sensor data from City Hos- Hadoop, Spark,Python,
personalization. pital. Tableau.
classification.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Materials and Methods</title>
      <p>
        The decision-making process is significantly enhanced by the application of statistical methods, even in
instances involving extensive data sets, sometimes referred to as big data [
        <xref ref-type="bibr" rid="ref1 ref10">1, 10</xref>
        ]. To implement data
processing techniques in certain sectors, it is essential to delineate the many types of data, encompassing
volume, variety, and velocity. Distinct types of data might be produced from multiple sources, and
it is essential to create systems capable of managing the characteristics of data. Figure 1 illustrates
the established architecture for big data analytics aimed at the eficient classification of infectious
diseases by statistical methodologies. The developed architecture depicts a complete classification and
detection procedure of the presence of COVID-19 using a synthetic dataset. It consists of three phases:
(i) Pandemic dataset acquisition and source phases, (ii) Data preparation phase, and (iii) Statistical
analysis of COVID-19 pandemic data diagnosis.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Pandemic Dataset Acquisition and Source</title>
        <p>The COVID-19 pandemic dataset utilized in this investigation was obtained from a prominent
source known as the Kaggle database
(https://www.kaggle.com/datasets/rishanmascarenhas/covid19temperatureoxygenpulse-rate). The distribution of the pandemic datasets sample in this analysis which
had values ranging from 0 to 9. The details include the ID, Oxygen, Pulse Rate, Temperature, and results
(+/−) . In total, the dataset contains 5010 negative and 4990 positive examples that support replicability,
making it large enough to be classified as big data. Figure 2 shows the distribution of the datasets by
classes and evidence of balanced classes using explorative data analysis. There is no ethical concern
regarding this dataset because it was sourced from opensource and has been deanonymized to protect
privacy and security.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data Preparation</title>
        <p>
          Framing the research as a classification problem: Let  = {(1, 1), (2, 2), ...(, )} constitute
the collection of training cases of size d.  = {1, 2..., } be the set of labels where  is a feature
with corresponding  label, and  is a set of features according to classification task definition. The
preliminary phase of this data analysis and categorization commenced with the data preparation;
the dataset sufers from missing data and duplicated data. This was succeeded by a crucial step of
eliminating non-relevant data from the converted raw dataset. Duplication and missing values were
addressed using an imputation technique, wherein the meaning of the respective column was calculated
and used to replace the missing values. The issue was resolved through the identification of essential
aspects, conducted in accordance with [
          <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Statistical analysis of pandemic data diagnosis</title>
        <sec id="sec-3-3-1">
          <title>The statistical data analysis using one-way analysis of variance. Means were separated using Tukey.</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>Normality test was carried out using the Shapiro-Wilk test, and homogeneity of variances was verified</title>
          <p>with Levene’s test implemented in SPSS v 21.0 (IBM Corp., Armonk, NY, USA). This was accomplished
using Intel(R) Core (TM) i7-4600U CPU running from 2.10GHz to 2.70GHz on a Windows 10 professional
with 8GB RAM. The statistical analysis was to validate the use of big data analysis for pandemic
preparedness, with a working hypothesis that the intensity of the colour channel images would be
diferent ( &lt; 0.05) between COVID-19 patients and patients with other health conditions.</p>
          <p>In statistics, statistical significance denotes the likelihood that the outcomes of a study or experiment
are attributable to factors other than random chance. Various metrics and tests are typically employed
to ascertain the statistical significance of a result. This research utilized primary criteria for statistical
significance contingent upon the specific type of pandemic dataset examined. Statistical significance is
a crucial concept; nonetheless, it is essential to recognize that statistical significance does not equate
to practical relevance. Always consider the context of the findings and additional metrics, such as
efect size, when evaluating results. The equations shown below describe these metrics and express
them statistically, using mean intensity of a Color channel, SEM, mean correlation, mean homogeneity,
mean dissimilarity, mean correlation, mean correlation and mean density. The detailed description of
statistical metrics used, with accompanying formulas are presented in this section.
(1)
(2)
(3)
(4)
(5)
(6)
(7)
where  is the intensity of the pixel in the color channel, and N is the total number of pixels.
Equation 2 shows the Standard Error of the Mean (SEM) for pandemic classification.
 =</p>
          <p>1 ∑︁</p>
          <p>=1
 = √</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>Equation 1 shows the average intensity of a color channel in the pandemic classification.</title>
          <p>where  is the standard deviation of pixel intensities and  is the number of pixels for pandemic</p>
        </sec>
        <sec id="sec-3-3-4">
          <title>Equation 3 shows the Gray Level Co-occurrence Matrix (GLCM), a statistical method for examining texture that considers the spatial relationship of pixels. Mean Contrast (from Gray Level Co-occurrence</title>
          <p>= ∑︁  (, )( − ) 2
,
where  (, ) is the probability of intensity pairs in the GLCM for pandemic classification.</p>
        </sec>
        <sec id="sec-3-3-5">
          <title>Equation 4 shows Mean Homogeneity;</title>
        </sec>
        <sec id="sec-3-3-6">
          <title>Equation 5 shows Dissimilarity Equation 6 shows Correlation</title>
          <p>= ∑︁</p>
          <p>(, )
1 + | − |
 = ∑︁  (, )| − |
,
,
 =
∑︀( − 
)( −   ) (, )|
  
where  and  are the means and standard deviations of intensity levels for pandemic classification.</p>
        </sec>
        <sec id="sec-3-3-7">
          <title>Equation 7 shows Mean Density (assuming it refers to pixel density in a binary image)</title>
          <p>=
∑︀</p>
          <p>where  represents pixel values in a binary image for pandemic analysis.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Result and Analysis</title>
      <sec id="sec-4-1">
        <title>4.1. Results</title>
        <sec id="sec-4-1-1">
          <title>This section presents the results attained from the statistical analysis of the pandemic dataset. The</title>
          <p>task is conceptualized based on the statistical tools called for a detailed analysis of the COVID-19
pandemic dataset. A statistical analysis was conducted to categorize infectious diseases, including</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>COVID-19. A model was developed utilizing data from Kaggle, encompassing steps of data gathering,</title>
          <p>preparation, analysis via SPSS, and essential statistical measures (intensity, SEM, contrast, homogeneity,
dissimilarity, correlation, and density). The findings are articulated through sophisticated statistical
metrics, including the mean intensity of a color channel, the standard error of the mean (SEM), mean
correlation, mean homogeneity, mean dissimilarity, repeated mean correlation, and mean density.
4.2. Analysis of big data analytics for pandemic classification.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>Statistics on the pandemic dataset in this study are discussed in this subsection. It delineates the</title>
          <p>statistical analysis of four pandemic data scenarios/conditions. Data is divided into COVID-19, Normal,
Pneumonia, and Tuberculosis. For each occurrence, it includes statistical measurements of image
attributes, including the mean and standard error. It is a clear presentation of results with mean values
and standard errors; Table 2 is the comparison of the blue channels in images across diferent health
conditions, and Table 3 presents the comparison of green channels in the image obtained for diferent
health conditions. While Table 4 compares the red channel in the image obtained for diferent health
conditions, Table 5 compares the contrast of the image obtained for diferent health conditions. The
focus of Table 6 is the comparison of dissimilarity of the image obtained for diferent health conditions,
and Table 7 shows the comparison of homogeneity of the image obtained for diferent health conditions.
Table 8 compares the correlation of the images obtained for diferent health conditions, and finally,
Table 9 compares the correlation of the images obtained for diferent health conditions. In COVID-19
and TB cases, blue and green channels have a higher mean intensity than Normal and Pneumonia
channels. All three channels show similar and reduced mean intensities for pneumonia and normal
patients.</p>
          <p>Case Sample size Mean intensity ± SEM p-value
COVID-19 554 134.87 ± 0.94
Normal 1779 122.15 ± 0.33 &lt; 0.001
Pneumonia 4061 122.91 ± 1.05
Tuberculosis 729 133.77 ± 1.05
a,b: indicate significant diference (p&lt;0.05); CI: confidence interval.</p>
          <p>95% CI
124.29-125.24</p>
          <p>Case Sample size Mean intensity ± SEM p-value
COVID-19 554 134.59 ± 0.94
Normal 1779 122.15 ± 0.33 &lt; 0.001
Pneumonia 4061 122.91 ± 0.31
Tuberculosis 729 129.22 ± 1.02
a,b,c: indicate significant diference (p&lt;0.05); CI: confidence interval.</p>
          <p>95% CI
123.80-124.74</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>The blue, green, and red channel images from COVID-19 patients showed significantly ( &lt; 0.001)</title>
          <p>the highest intensity, compared to patients with diferent health conditions as indicated in Table
2, 3 and 4, respectively. Mean contrast of the image from Covid-19 patients (134.36 ± 4.37) was
significantly lowered ( &lt; 0.001) compared with mean contrast of the image from the normal patients
(163.81 ± 1.08) in Table 5.</p>
        </sec>
        <sec id="sec-4-1-5">
          <title>Additionally, the mean dissimilarity and homogeneity indicate that Pneumonia and tuberculosis</title>
          <p>demonstrate the highest dissimilarity (0.98 ± 0.01) and homogeneity (0.38 ± 0.00) , respectively,
signifying greater diversity in pixel intensity as shown in Table 6 and 7. COVID-19 and Pneumonia
exhibit comparable homogeneity (0.26 − 0.27 ; Table 7). The mean correlation indicates that Normal
images exhibit the highest correlation (12.06 ± 0.01) as shown in Table 8. The mean density study
indicates that normal cases demonstrate the highest mean density (19.20 ± 0.12 ; Table 9). Meanwhile,
the mean contrast, dissimilarities, homogeneity, correlation and density were lower ( &lt; 0.05) for</p>
        </sec>
        <sec id="sec-4-1-6">
          <title>COVID-19 patients than normal patients.</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion Recommendation</title>
      <sec id="sec-5-1">
        <title>The statistical data analysis has changed infectious and pandemic illness research and management.</title>
        <p>Pattern identification and analysis of infectious and pandemic illness epidemics have underutilized
statistical approaches. The global pandemic data study lacks statistical analysis. For pandemic analysis,
state-of-the-art statistical methods like mean intensity of a color channel, SEM, mean correlation,
homogeneity, dissimilarity, correlation, and density provide depth of finding for informed
decisionmaking. This research shows that pandemics are straightforward to control and minimize mortality if
statistical methods are used to analyze the pandemic dataset to advise doctors, policymakers, and other
stakeholders on disease spread. This study examines statistics, data science, machine learning, and AI
in relation to environmental science, natural sciences, medicine, and technology. It shows how statistics
may model and analyze huge data in pandemics, covers the state-of-the-art for practical analysis, and
shows how to implement methodologies. The paper makes a more substantial contribution to the
ifeld of pandemic analytics. Future research may compare results using theoretical machine learning
techniques, ML modeling, and validation, to validate the result of this investigation. Multiple data
sources enable a more complete picture of disease spread. Maps of COVID-19 hotspots were created
using social media, contact tracing apps, and real-time hospital data.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <sec id="sec-6-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Ajagbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mudali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Adigun</surname>
          </string-name>
          ,
          <article-title>Assessing data-driven of discriminative deep learning models in classification task using synthetic pandemic dataset</article-title>
          ,
          <source>in: Southern African Conference for Artificial Intelligence Research</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>282</fpage>
          -
          <lpage>299</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Akinlade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Vakaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dridi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ortiz-Rodriguez</surname>
          </string-name>
          ,
          <article-title>Semantic segmentation of the lung to examine the efect of covid-19 using unet model</article-title>
          ,
          <source>in: International Conference on Applied Machine Learning and Data Analytics</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>52</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O.</given-names>
            <surname>Ugbomeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Yiye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ibeke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Ezenkwu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alkhayyat</surname>
          </string-name>
          ,
          <article-title>Machine learning algorithms for stroke risk prediction leveraging on explainable artificial intelligence techniques (xai)</article-title>
          ,
          <source>in: 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT)</source>
          , volume
          <volume>1</volume>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Yiye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ugbomeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Ezenkwu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ibeke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alkhayyat</surname>
          </string-name>
          ,
          <article-title>Investigating key contributors to hospital appointment no-shows using explainable ai</article-title>
          ,
          <source>in: 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT)</source>
          , volume
          <volume>1</volume>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Ajagbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mudali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Adigun</surname>
          </string-name>
          ,
          <article-title>An empirical assessment of discriminative deep learning models for multiclassification of covid-19 x-</article-title>
          <string-name>
            <surname>rays</surname>
          </string-name>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. B.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <article-title>Applied statistics in the era of artificial intelligence: A review and vision</article-title>
          , arXiv preprint arXiv:
          <volume>2412</volume>
          .10331 (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Florez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Online dashboard and data analysis approach for assessing covid-19 case and death data</article-title>
          ,
          <source>F1000Research</source>
          <volume>9</volume>
          (
          <year>2020</year>
          )
          <fpage>570</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mwamnyange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. T.</given-names>
            <surname>Luhanga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Thodge</surname>
          </string-name>
          ,
          <article-title>Big data analytics framework for childhood infectious disease surveillance and response system using modified mapreduce algorithm</article-title>
          ,
          <source>International Journal of Advanced Computer Science and Applications (IJACSA)</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G. T.</given-names>
            <surname>Igwama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Olaboye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Maha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Ajegbile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Abdul</surname>
          </string-name>
          ,
          <article-title>Big data analytics for epidemic forecasting: Policy frameworks and technical approaches</article-title>
          ,
          <source>International Journal of Applied Research in Social Sciences</source>
          <volume>6</volume>
          (
          <year>2024</year>
          )
          <fpage>1449</fpage>
          -
          <lpage>1460</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Sadiku</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Ajayi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. O.</given-names>
            <surname>Sadiku</surname>
          </string-name>
          ,
          <article-title>Predictive analytics for supply chain</article-title>
          ,
          <source>Int J Trend Res Dev (IJTRD) 12</source>
          (
          <year>2025</year>
          )
          <fpage>112</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Freeman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <source>Statistical perspectives on reliability of artificial intelligence systems</source>
          ,
          <source>Quality Engineering</source>
          <volume>35</volume>
          (
          <year>2023</year>
          )
          <fpage>56</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z. S.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q. Zhang,</surname>
          </string-name>
          <article-title>Artificial intelligence for infectious disease big data analytics, Infection, disease</article-title>
          &amp; health
          <volume>24</volume>
          (
          <year>2019</year>
          )
          <fpage>44</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ryeznik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sverdlov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. K.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <article-title>An overview of healthcare data analytics with applications to the covid-19 pandemic</article-title>
          ,
          <source>IEEE Transactions on Big Data</source>
          <volume>8</volume>
          (
          <year>2021</year>
          )
          <fpage>1463</fpage>
          -
          <lpage>1480</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Rasouli Panah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Integration of ai and big data analysis with public health systems for infectious disease outbreak detection</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B. O.</given-names>
            <surname>Adegoke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Odugbose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Adeyemi</surname>
          </string-name>
          ,
          <article-title>Data analytics for predicting disease outbreaks: A review of models and tools</article-title>
          ,
          <source>International journal of life science research updates [online] 2</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pastore</surname>
          </string-name>
          y Piontti,
          <string-name>
            <given-names>N.</given-names>
            <surname>Perra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Samay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vespignani</surname>
          </string-name>
          ,
          <article-title>Infectious disease spreading: From data to models</article-title>
          ,
          <source>in: Charting the Next Pandemic: Modeling Infectious Disease Spreading in the Data Science Age</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. W. J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <article-title>Infectious diseases prevention and control using an integrated health big data system in china</article-title>
          ,
          <source>BMC infectious diseases 22</source>
          (
          <year>2022</year>
          )
          <fpage>344</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Michael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Krishnan</surname>
          </string-name>
          ,
          <article-title>Big data analytics for big outcomes in healthcare</article-title>
          ,
          <source>in: 2019 ASEE Midwest Section Conference</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>