1. Introduction

A Statistical Approach for COVID-19 Pandemic Data Analysis

Sunday Adeola Ajagbe

saajagbe@pgschool.lautech.edu.ng 0 2

Ismaila Muritala

MuritalaI@unizulu.ac.za 1 2

Pragasen Mudali

MudaliP@unizulu.ac.za 2

Matthew O Adigun

adigunm@unizulu.ac.za 2 0 Abiola Ajimobi Technical University , Ibadan , Nigeria 1 Osun State University , Osogbo , Nigeria 2 University of Zululand , Kwadlangezwa , South Africa

165 174

This study investigated the statistical insight of big data analytics for efective pandemic analysis. The COVID19 pandemic dataset, containing 5010 negatives and 4990 positives, was obtained from a prominent source known as the Kaggle repository. The dataset was categorized into four classes: COVID-19, Normal, Pneumonia, and Tuberculosis using diverse statistical metrics of image characteristics: mean intensity of a color channel, homogeneity, dissimilarity, correlation, and density. The objective of the statistical analysis was to validate the use of big data analysis for pandemic preparedness, with a working hypothesis that the intensity of the colour channel images would be diferent (p < 0.05) between COVID-19 patients and patients with other health conditions. A statistical package for the Social Sciences version 20 was used for the analysis. The outcomes show that the mean intensity of color channels (Blue, Green, Red) in COVID-19 and Tuberculosis cases was greater in the blue and green channels than in Normal and Pneumonia cases. Pneumonia and normal cases exhibited comparable and reduced mean intensities across all three channels. The Normal condition exhibited the largest mean contrast, whereas Pneumonia displayed the lowest. The mean dissimilarity and homogeneity indicate that tuberculosis demonstrated the highest dissimilarity, signifying greater diversity in pixel intensity. COVID-19 and Pneumonia exhibited comparable homogeneity; however Tuberculosis demonstrated more homogeneity. The mean correlation indicates that Normal images exhibited the highest correlation, whilst Tuberculosis demonstrated the lowest. The mean density study indicates that normal cases demonstrated the highest mean density, whereas tuberculosis exhibited the lowest mean density. In conclusion, our findings established the advantages of statistics for modeling and analyzing big data in the pandemic domain, covers the state-of-the-art for practical analysis.

eol>Big data analytics Pandemic analysis Statistical analysis Machine learning

1. Introduction

Big Data Analytics have become common in many areas, from research to practice, and even in everyday life. It is a prominent tool in pandemic preparedness and control as an aspect of the fourth industrial revolution (4IR) [ 1 ]. COVID-19 was first found in Wuhan, China, in December 2019 and has spread worldwide. Chinese and global epidemiological studies show person-to-person transfer [ 2, 3 ]. Sneezing and coughing spread harmful COVID-19. COVID-19 was spread by touching infected surfaces or hands and then their lips, nose, or eyes. Statisticians must prove their outcomes are not random [ 4, 5 ]. Failure prevention and planning are often driven by statistics [ 6, 7 ]. Failures in pandemic and infectious diseases control can lead to economic loss and loss of life.

Recently, Mwamnyange et al., [8] proposed a big data analytics platform for childhood infectious

illness surveillance and response. The system helps healthcare professionals track, monitor, and analyze infectious illness reports via social media to prevent and control child-related diseases. The proposed technique was validated using use-case scenarios and performance comparisons [ 9 ]. However, statistical insights might be gained from data analytics for efective infectious diseases detection, but it was not considered. To provide some concrete examples, artificial intelligence (AI) was used to analyze incident cases on the website. The AI Incident database (2021) compiles news articles from several sources.

We discovered that 72 of the 126 occurrences that have been reported thus far may be connected

to dependability problems. Based on the 72 reliability-related incidents, despite the ubiquity, it is also observed that out of those 72 occurrences, 29 incidences resulted in fatalities or serious injuries, demonstrating that dependability problems may cause significant damage [ 3, 10 ]. From a diferent angle, public trust is necessary for the widespread use of AI technology.

Big data analytics are needed to manage infectious diseases. Massive data sets require statistics to

comprehend, analyze, and predict infectious disease patterns. AI technology is still growing and has dificulties. Challenges include dependability, safety, durability, and trustworthiness. Dependability is important because AI systems must be trusted. The system may work as intended. Resource allocation and public health crisis response depend on these criteria. Examine reliability, strategy, and failures.

Recently developed AI dependability statistical analysis, especially for statisticians, highlights statistical

research concerns. The authors discussed AI dependability’s out-of-distribution detection, training set influence, adversarial attacks, model correctness, and uncertainty quantification. A few examples show study options. A recent study explored AI reliability evaluation, data gathering, and test preparation, and how to improve system designs for AI dependability. Since data is vital for reliability assessment [ 10 ]. Data on infectious diseases are initially analyzed using descriptive statistics and AI applications.

The key variables, like mean intensity of a color channel, standard error of the mean (SEM), mean correlation, mean homogeneity, mean dissimilarity, mean correlation, and mean density, are crucial to pandemic analysis. Statisticians can map some variables to the geographic and demographic spread of pandemic diseases using big data sources search patterns. The objective of this study is to investigate the feasibility of efective pandemic investigation using statistical analysis. Specifically, the statistical package for Social Sciences was used for the analysis: mean intensity of a Color channel, SEM, mean correlation, mean homogeneity, mean dissimilarity, mean correlation, and mean density. This study is structured as follows. Section 2 provides a literature review to emphasize the necessity

of the present investigation. Section 3 delineates the approach, attributes of the dataset, and the research instrument. Section 4 presents a comprehensive analysis of statistical insights on big data analytics for the efective detection of infectious diseases. Section 5 finishes the research and proposes future endeavors for the implementation of efective pandemic control utilizing statistical tools.

2. Literature Review

Big Data Analytics is crucial for advancing the Fourth Industrial Revolution and tackling the challenges presented by the pandemic. The Fourth Industrial Revolution is a significant transformation characterized by the convergence of physical, digital, and biological technologies. Currently, Big Data propels substantial progress in automation and robotics, enabling these systems to analyze operational data for enhanced eficiency and flexibility. It also enables the IoT, where interconnected devices across diverse platforms generate vast data that, when analyzed, can improve operational eficiencies and predictive maintenance capabilities. Moreover, in smart manufacturing, Big Data enhances the optimization of production processes, minimizes downtime, and enables real-time customization of output. During the

COVID-19 pandemic, Big Data has proven its essential significance across multiple crucial sectors. It

has enabled efective epidemiological surveillance by analyzing data from several sources, including travel records, social media, and government reports, to track the virus’s spread [ 11 ].

In pandemic management, analytics have enhanced hospital resource allocation, forecasted patient

outcomes, and tailored treatment options. Furthermore, the rapid progress and distribution of vaccinations have been expedited by Big Data, which has enabled the analysis of massive clinical trial data. It has also impacted public policy, aiding governments in developing specific lockdown measures and informed reopening strategies based on real-time data concerning infection rates and public sentiment.

The use of Big Data in addressing COVID-19 and advancing Fourth Industrial Revolution technologies

has accelerated the adoption of digital solutions, including telemedicine, remote work, and online education, which are vital to the current digital transformation. This integration highlights the essential role of Big Data in crisis management and as a core component of digital and industrial advancement in the

Fourth Industrial Revolution, demonstrating its capacity for comprehensive, real-time decision-making across public health, economic, and industrial sectors [4, 11]. Hong et al. [11] explored statistical methodologies for assessing AI reliability, particularly for au

tonomous systems. The authors introduce the "SMART" statistical framework to evaluate the robustness of AI models used in various domains, including healthcare. The study highlights how AI reliability decreases in out-of-distribution settings and suggests that statistical models outperform deep learning in long-term failure prediction. However, it lacks real-world infectious disease applications and validation on epidemiological datasets.

Another study highlights AI’s role in infectious disease monitoring and projection, emphasizing data-driven approaches for disease surveillance. It demonstrates high sensitivity (91%) in predicting disease outbreaks through AI models but sufers from a lack of statistical validation, poor interpretability, and limited disease-specific analysis [12]. Fei et al. [13] also provided a broad overview of big data analytics applied to healthcare, particularly

during COVID-19. It discusses statistical inference challenges in high-dimensional data, highlighting the efectiveness of Bayesian models in long-term trend prediction. However, it has a limited focus on disease classification and weak integration of statistical methods in real-time applications.

Mwamnyange et al. [ 8 ] developed a big data analytics framework for tracking childhood infectious diseases using a modified MapReduce algorithm. While it enhances early outbreak detection through social media analytics and reduces processing time for large epidemiological datasets, it lacks statistical validation of detection accuracy and focuses more on data processing eficiency than disease classification. Olaboye et al. [ 9 ] presented a paper on policy and technical framework for using big data analytics in epidemic forecasting. It improves forecasting accuracy with multi-source data integration, but has limited classification models for specific diseases and underutilizes machine learning for outbreak detection.

Panah et al. [14] investigated the integration of AI and big data analysis with public health infras

tructure for early detection of infectious disease outbreaks. The authors propose an AI-driven data integration framework leveraging social media, wearable technology, and traditional clinical databases.

While the paper emphasizes AI’s potential in public health surveillance, it lacks a detailed statistical validation framework and real-time adaptive learning methodologies. Adegoke et al. [15] presented a paper that provides a comprehensive review of data analytics models

used for disease outbreak prediction. It evaluates traditional statistical methods such as time-series forecasting and Bayesian inference, alongside AI-based models. The study highlights the importance of integrating epidemiological data with environmental and social media data for robust predictions.

However, it does not propose a standardized statistical framework for evaluating model performance across diferent data sources. Piontti et al. [16] explored computational models for infectious disease forecasting, focusing on data

science methodologies such as agent-based modeling and network theory. It provides insights into how big data can enhance disease spread predictions, but lacks emphasis on statistical uncertainty quantification and model generalizability across varying outbreak conditions.

Zhou et al. [17] investigated an integrated health big data system in China for infectious disease prevention and control to detect dengue fever, tuberculosis (TB), and vaccination gaps in migrant children. The system outperformed traditional surveillance methods by identifying more cases of TB and dengue and more children with incomplete vaccinations. Michael and Krishnan [18] studied how big data analytics afects healthcare prediction insights. The

study refines predictive models for disease progression, risk assessment, and individualized treatment using EHRs, medical imaging, genomic data, and wearable sensors. Case studies show it reduces hospital remissions and optimizes healthcare resource management. AI-driven healthcare analytics is lauded, but real-time flexible models and scalability across healthcare systems are not.

Table 1 contains a summary of the review of the related studies on big data analytics for pandemic Summary of the review of the related studies on big data analytics for pandemic classification.

Methodology Dataset Type Results Limitations SMART framework, survival analy- Simulated AI reliability AI reliability decreases in No epidemiological validasis, hazard functions datasets out-of-distribution; statisti- tion, not focused on disease cal models perform better in detection failure prediction AI-based ML (SVM, RF, Neural Net- Syndromic surveillance, 91% sensitivity in outbreak Lacks statistical validation, works), social media and genomic search trends, clinical data prediction poor model interpretability data fusion High-dimensional statistical mod- COVID-19 epidemiological Bayesian models outper- Limited disease classificaels (LASSO, Bayesian Inference), data, EHRs form AI in long-term trend tion focus, weak integration computational epidemiology prediction of statistical methods MapReduce, Hadoop-based data in- Clinical records, social me- Faster epidemiological data Lacks statistical validation, tegration dia, surveillance data processing, early outbreak prioritizes data processing

detection over classification AI-driven data integration, public Social media, wearable tech, Enhanced real-time disease Lacks statistical validahealth surveillance, wearable data clinical databases tracking tion, no adaptive learning fusion methodology Investigation of statistical and AI Epidemiological, environ- Identifies gaps in predictive No standardized framework models (Bayesian inference, time- mental, social media data analytics models for model comparison series, ML) Agent-based modelling, network Global epidemiological and Insights into disease spread Lacks statistical uncertainty theory, computational epidemiol- mobility data via big data quantification, limited ogy model generalizability Investigates an integrated health Empirical study with com- Health data from clinics, AI-based screening, big data big data system for detecting and parative analysis. hospitals, and government analytics, disease detection preventing infectious diseases. records in China. algorithms.

Investigates the role of data analyt- Case study analysis with EHRs, medical imaging, ge- ML predictive modeling, ics in predictive healthcare, focused machine learning models ap- nomic sequencing, wearable statistical analysis, Apache on risk assessment and treatment plied to patient data. sensor data from City Hos- Hadoop, Spark,Python, personalization. pital. Tableau. classification.

3. Materials and Methods

The decision-making process is significantly enhanced by the application of statistical methods, even in instances involving extensive data sets, sometimes referred to as big data [ 1, 10 ]. To implement data processing techniques in certain sectors, it is essential to delineate the many types of data, encompassing volume, variety, and velocity. Distinct types of data might be produced from multiple sources, and it is essential to create systems capable of managing the characteristics of data. Figure 1 illustrates the established architecture for big data analytics aimed at the eficient classification of infectious diseases by statistical methodologies. The developed architecture depicts a complete classification and detection procedure of the presence of COVID-19 using a synthetic dataset. It consists of three phases: (i) Pandemic dataset acquisition and source phases, (ii) Data preparation phase, and (iii) Statistical analysis of COVID-19 pandemic data diagnosis.

3.1. Pandemic Dataset Acquisition and Source

The COVID-19 pandemic dataset utilized in this investigation was obtained from a prominent source known as the Kaggle database (https://www.kaggle.com/datasets/rishanmascarenhas/covid19temperatureoxygenpulse-rate). The distribution of the pandemic datasets sample in this analysis which had values ranging from 0 to 9. The details include the ID, Oxygen, Pulse Rate, Temperature, and results (+/−) . In total, the dataset contains 5010 negative and 4990 positive examples that support replicability, making it large enough to be classified as big data. Figure 2 shows the distribution of the datasets by classes and evidence of balanced classes using explorative data analysis. There is no ethical concern regarding this dataset because it was sourced from opensource and has been deanonymized to protect privacy and security.

3.2. Data Preparation

Framing the research as a classification problem: Let = {(1, 1), (2, 2), ...(, )} constitute the collection of training cases of size d. = {1, 2..., } be the set of labels where is a feature with corresponding label, and is a set of features according to classification task definition. The preliminary phase of this data analysis and categorization commenced with the data preparation; the dataset sufers from missing data and duplicated data. This was succeeded by a crucial step of eliminating non-relevant data from the converted raw dataset. Duplication and missing values were addressed using an imputation technique, wherein the meaning of the respective column was calculated and used to replace the missing values. The issue was resolved through the identification of essential aspects, conducted in accordance with [ 1, 3 ].

3.3. Statistical analysis of pandemic data diagnosis The statistical data analysis using one-way analysis of variance. Means were separated using Tukey. Normality test was carried out using the Shapiro-Wilk test, and homogeneity of variances was verified

with Levene’s test implemented in SPSS v 21.0 (IBM Corp., Armonk, NY, USA). This was accomplished using Intel(R) Core (TM) i7-4600U CPU running from 2.10GHz to 2.70GHz on a Windows 10 professional with 8GB RAM. The statistical analysis was to validate the use of big data analysis for pandemic preparedness, with a working hypothesis that the intensity of the colour channel images would be diferent ( < 0.05) between COVID-19 patients and patients with other health conditions.

In statistics, statistical significance denotes the likelihood that the outcomes of a study or experiment are attributable to factors other than random chance. Various metrics and tests are typically employed to ascertain the statistical significance of a result. This research utilized primary criteria for statistical significance contingent upon the specific type of pandemic dataset examined. Statistical significance is a crucial concept; nonetheless, it is essential to recognize that statistical significance does not equate to practical relevance. Always consider the context of the findings and additional metrics, such as efect size, when evaluating results. The equations shown below describe these metrics and express them statistically, using mean intensity of a Color channel, SEM, mean correlation, mean homogeneity, mean dissimilarity, mean correlation, mean correlation and mean density. The detailed description of statistical metrics used, with accompanying formulas are presented in this section. (1) (2) (3) (4) (5) (6) (7) where is the intensity of the pixel in the color channel, and N is the total number of pixels. Equation 2 shows the Standard Error of the Mean (SEM) for pandemic classification. =

1 ∑︁

=1 = √

Equation 1 shows the average intensity of a color channel in the pandemic classification.

where is the standard deviation of pixel intensities and is the number of pixels for pandemic

Equation 3 shows the Gray Level Co-occurrence Matrix (GLCM), a statistical method for examining texture that considers the spatial relationship of pixels. Mean Contrast (from Gray Level Co-occurrence

= ∑︁ (, )( − ) 2 , where (, ) is the probability of intensity pairs in the GLCM for pandemic classification.

Equation 4 shows Mean Homogeneity; Equation 5 shows Dissimilarity Equation 6 shows Correlation

= ∑︁

(, ) 1 + | − | = ∑︁ (, )| − | , , = ∑︀( − )( − ) (, )| where and are the means and standard deviations of intensity levels for pandemic classification.

Equation 7 shows Mean Density (assuming it refers to pixel density in a binary image)

= ∑︀

where represents pixel values in a binary image for pandemic analysis.

4. Result and Analysis 4.1. Results This section presents the results attained from the statistical analysis of the pandemic dataset. The

task is conceptualized based on the statistical tools called for a detailed analysis of the COVID-19 pandemic dataset. A statistical analysis was conducted to categorize infectious diseases, including

COVID-19. A model was developed utilizing data from Kaggle, encompassing steps of data gathering,

preparation, analysis via SPSS, and essential statistical measures (intensity, SEM, contrast, homogeneity, dissimilarity, correlation, and density). The findings are articulated through sophisticated statistical metrics, including the mean intensity of a color channel, the standard error of the mean (SEM), mean correlation, mean homogeneity, mean dissimilarity, repeated mean correlation, and mean density. 4.2. Analysis of big data analytics for pandemic classification.

Statistics on the pandemic dataset in this study are discussed in this subsection. It delineates the

statistical analysis of four pandemic data scenarios/conditions. Data is divided into COVID-19, Normal, Pneumonia, and Tuberculosis. For each occurrence, it includes statistical measurements of image attributes, including the mean and standard error. It is a clear presentation of results with mean values and standard errors; Table 2 is the comparison of the blue channels in images across diferent health conditions, and Table 3 presents the comparison of green channels in the image obtained for diferent health conditions. While Table 4 compares the red channel in the image obtained for diferent health conditions, Table 5 compares the contrast of the image obtained for diferent health conditions. The focus of Table 6 is the comparison of dissimilarity of the image obtained for diferent health conditions, and Table 7 shows the comparison of homogeneity of the image obtained for diferent health conditions. Table 8 compares the correlation of the images obtained for diferent health conditions, and finally, Table 9 compares the correlation of the images obtained for diferent health conditions. In COVID-19 and TB cases, blue and green channels have a higher mean intensity than Normal and Pneumonia channels. All three channels show similar and reduced mean intensities for pneumonia and normal patients.

Case Sample size Mean intensity ± SEM p-value COVID-19 554 134.87 ± 0.94 Normal 1779 122.15 ± 0.33 < 0.001 Pneumonia 4061 122.91 ± 1.05 Tuberculosis 729 133.77 ± 1.05 a,b: indicate significant diference (p<0.05); CI: confidence interval.

95% CI 124.29-125.24

Case Sample size Mean intensity ± SEM p-value COVID-19 554 134.59 ± 0.94 Normal 1779 122.15 ± 0.33 < 0.001 Pneumonia 4061 122.91 ± 0.31 Tuberculosis 729 129.22 ± 1.02 a,b,c: indicate significant diference (p<0.05); CI: confidence interval.

95% CI 123.80-124.74

The blue, green, and red channel images from COVID-19 patients showed significantly ( < 0.001)

the highest intensity, compared to patients with diferent health conditions as indicated in Table 2, 3 and 4, respectively. Mean contrast of the image from Covid-19 patients (134.36 ± 4.37) was significantly lowered ( < 0.001) compared with mean contrast of the image from the normal patients (163.81 ± 1.08) in Table 5.

Additionally, the mean dissimilarity and homogeneity indicate that Pneumonia and tuberculosis

demonstrate the highest dissimilarity (0.98 ± 0.01) and homogeneity (0.38 ± 0.00) , respectively, signifying greater diversity in pixel intensity as shown in Table 6 and 7. COVID-19 and Pneumonia exhibit comparable homogeneity (0.26 − 0.27 ; Table 7). The mean correlation indicates that Normal images exhibit the highest correlation (12.06 ± 0.01) as shown in Table 8. The mean density study indicates that normal cases demonstrate the highest mean density (19.20 ± 0.12 ; Table 9). Meanwhile, the mean contrast, dissimilarities, homogeneity, correlation and density were lower ( < 0.05) for

COVID-19 patients than normal patients. 5. Conclusion Recommendation The statistical data analysis has changed infectious and pandemic illness research and management.

Pattern identification and analysis of infectious and pandemic illness epidemics have underutilized statistical approaches. The global pandemic data study lacks statistical analysis. For pandemic analysis, state-of-the-art statistical methods like mean intensity of a color channel, SEM, mean correlation, homogeneity, dissimilarity, correlation, and density provide depth of finding for informed decisionmaking. This research shows that pandemics are straightforward to control and minimize mortality if statistical methods are used to analyze the pandemic dataset to advise doctors, policymakers, and other stakeholders on disease spread. This study examines statistics, data science, machine learning, and AI in relation to environmental science, natural sciences, medicine, and technology. It shows how statistics may model and analyze huge data in pandemics, covers the state-of-the-art for practical analysis, and shows how to implement methodologies. The paper makes a more substantial contribution to the ifeld of pandemic analytics. Future research may compare results using theoretical machine learning techniques, ML modeling, and validation, to validate the result of this investigation. Multiple data sources enable a more complete picture of disease spread. Maps of COVID-19 hotspots were created using social media, contact tracing apps, and real-time hospital data.

Declaration on Generative AI The authors have not employed any Generative AI tools.

[1]

S. A.

Ajagbe ,

Mudali ,

M. O.

Adigun , Assessing data-driven of discriminative deep learning models in classification task using synthetic pandemic dataset , in: Southern African Conference for Artificial Intelligence Research , Springer, 2024 , pp. 282 - 299 .

[2]

Akinlade ,

Vakaj ,

Dridi ,

Tiwari ,

Ortiz-Rodriguez , Semantic segmentation of the lung to examine the efect of covid-19 using unet model , in: International Conference on Applied Machine Learning and Data Analytics , Springer, 2022 , pp. 52 - 63 .

[3]

Ugbomeh ,

Yiye ,

Ibeke ,

C. P.

Ezenkwu ,

Sharma ,

Alkhayyat , Machine learning algorithms for stroke risk prediction leveraging on explainable artificial intelligence techniques (xai) , in: 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT) , volume 1 , IEEE, 2024 , pp. 1 - 6 .

[4]

Yiye ,

Ugbomeh ,

C. P.

Ezenkwu ,

Ibeke ,

Sharma ,

Alkhayyat , Investigating key contributors to hospital appointment no-shows using explainable ai , in: 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT) , volume 1 , IEEE, 2024 , pp. 1 - 6 .

[5]

S. A.

Ajagbe ,

Mudali ,

M. O.

Adigun , An empirical assessment of discriminative deep learning models for multiclassification of covid-19 x- rays , 2024 .

[6]

Min ,

Song ,

Zheng ,

C. B.

King ,

Deng ,

Hong , Applied statistics in the era of artificial intelligence: A review and vision , arXiv preprint arXiv: 2412 .10331 ( 2024 ).

[7]

Florez ,

Singh , Online dashboard and data analysis approach for assessing covid-19 case and death data , F1000Research 9 ( 2020 ) 570 .

[8]

Mwamnyange ,

E. T.

Luhanga ,

S. R.

Thodge , Big data analytics framework for childhood infectious disease surveillance and response system using modified mapreduce algorithm , International Journal of Advanced Computer Science and Applications (IJACSA) ( 2021 ).

[9]

G. T.

Igwama ,

J. A.

Olaboye ,

C. C.

Maha ,

M. D.

Ajegbile ,

Abdul , Big data analytics for epidemic forecasting: Policy frameworks and technical approaches , International Journal of Applied Research in Social Sciences 6 ( 2024 ) 1449 - 1460 .

[10]

M. N.

Sadiku ,

S. A.

Ajayi ,

J. O.

Sadiku , Predictive analytics for supply chain , Int J Trend Res Dev (IJTRD) 12 ( 2025 ) 112 .

[11]

Hong ,

Lian ,

Xu ,

Min ,

Wang ,

L. J.

Freeman ,

Deng , Statistical perspectives on reliability of artificial intelligence systems , Quality Engineering 35 ( 2023 ) 56 - 78 .

[12]

Z. S.

Wong ,

Zhou , Q. Zhang, Artificial intelligence for infectious disease big data analytics, Infection, disease & health 24 ( 2019 ) 44 - 48 .

[13]

Fei ,

Ryeznik ,

Sverdlov ,

C. W.

Tan ,

W. K.

Wong , An overview of healthcare data analytics with applications to the covid-19 pandemic , IEEE Transactions on Big Data 8 ( 2021 ) 1463 - 1480 .

[14]

Rasouli Panah ,

Madanian ,

Yu , Integration of ai and big data analysis with public health systems for infectious disease outbreak detection , 2023 .

[15]

B. O.

Adegoke ,

Odugbose ,

Adeyemi , Data analytics for predicting disease outbreaks: A review of models and tools , International journal of life science research updates [online] 2 ( 2024 ) 1 - 9 .

[16]

Pastore y Piontti,

Perra ,

Rossi ,

Samay ,

Vespignani , Infectious disease spreading: From data to models , in: Charting the Next Pandemic: Modeling Infectious Disease Spreading in the Data Science Age , Springer, 2018 , pp. 3 - 10 .

[17]

Zhou ,

E. W. J.

Lee ,

Wang ,

Lin ,

Xuan ,

Wu ,

Lin ,

Shen , Infectious diseases prevention and control using an integrated health big data system in china , BMC infectious diseases 22 ( 2022 ) 344 .

[18]

Michael ,

Krishnan , Big data analytics for big outcomes in healthcare , in: 2019 ASEE Midwest Section Conference , 2020 .