<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>PLoS ONE</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/I</article-id>
      <title-group>
        <article-title>Predicting COVID-19 post-vaccination mortality in persons with cardiovascular disease risk factors using explainable AI⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Taiwo Kolajo</string-name>
          <email>taiwo.kolajo@fulokoja.edu.ng</email>
          <email>taiwo.kolajo@up.ac.za</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olawande Daramola</string-name>
          <email>wande.daramola@up.ac.za</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Federal University Lokoja</institution>
          ,
          <addr-line>PMB 1154, Lokoja</addr-line>
          ,
          <country country="NG">Nigeria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Pretoria</institution>
          ,
          <addr-line>Hatfield Campus, Pretoria</addr-line>
          ,
          <country country="ZA">South Africa</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>18</volume>
      <issue>2</issue>
      <fpage>4192</fpage>
      <lpage>4203</lpage>
      <abstract>
        <p>Coronavirus disease (COVID-19) vaccination was adopted worldwide due to the advent of the COVID-19 pandemic in 2019. However, many post-vaccination adverse events, such as death and severe illness, were reported. So far, the specific case of post-vaccination adverse events pertaining to persons with cardiovascular risk factors and comorbidities has not been explored empirically, which limits the understanding of the underlying causes of adverse reactions to vaccination by this category of persons. This paper explored Explainable AI (XAI) to identify the critical determinants of post-vaccination mortality in persons with cardiovascular risk factors. To do this, we extracted 16657 records of persons with cardiovascular risk factors from the Vaccine Adverse Event Reporting System (VAERS) open dataset (from 2020 to May 2024). We then employed predictive modelling using a process that involved four stages. The first stage involved extracting relevant data from VAERS, data preprocessing, and handling class imbalance. In the second stage, we conducted a comparative performance evaluation of seven machine learning (ML) algorithms (Logistic Regression - LR, K-Nearest Neighbour - KNN, Deep Multilayer Perceptron - Deep MLP, Support Vector Machines - SVM, Random Forest - RF, Extreme Gradient Boosting XGBoost, and Categorical Boost - CatBoost). In the third stage, we compared the performance of two stacked ensemble models composed of six base models, using Catboost and XGBoost as the meta-learners in each case. The fourth stage involved using SHAPley Additive Explanations (SHAP) to interpret the predictions of the best-performing model. The result showed that CatBoost has the best performance among the base ML models (Acc = 0.96, F1=0.96, AUC = 0.96), while Stacked ensemble - XGBoost had the best overall performance (Acc = 0.96, F1=0.96, AUC = 0.99). Also, we found the important predictors of post-vaccination mortality in persons with cardiovascular comorbidity. Generally, older age, a higher number of days spent in the hospital increases the risk of mortality, while the absence of current illness, life-threatening condition, hospitalization, prolonged hospitalization, disability, birth defect, doctor visit, and emergency care; and vaccination dose completion will enhance the probability of survival. However, the presence of diabetes, high cholesterol, high blood pressure, and other illnesses increases the risk of mortality. This study's findings contribute to a better understanding of critical factors that could enable better handling of adverse events related to post-vaccination in persons with cardiovascular disease comorbidity.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;post-vaccination mortality</kwd>
        <kwd>COVID-19</kwd>
        <kwd>cardiovascular disease risk factors</kwd>
        <kwd>explainable AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Cardiovascular disease poses a serious risk to the quality of life of vulnerable populations, particularly
middle-aged and older individuals. There is a long history of significant morbidity, death, and financial
losses associated with cardiovascular illnesses and their consequences [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Precision diagnosis is
dificult to achieve and places a significant strain on the healthcare system and the economy due to the
prevalence of various chronic diseases.
      </p>
      <p>
        In December 2019, China saw the start of the global pandemic known as Coronavirus disease
(COVID19), which was brought on by SARS-CoV-2. Over 776 million confirmed cases and 7 million deaths were
reported worldwide by September 2024 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Global health and the economy were significantly impacted
by the COVID-19 pandemic [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Younger people without significant underlying medical conditions
may also experience fatal complications. However, elderly patients with underlying chronic disorders
like cardiovascular disease are thought to be at greater risk for death due to immunocompromised
conditions [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ][
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Among the many instruments at our disposal for lowering morbidity and death, vaccinations are by
far the most successful [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The reluctance of people to get vaccinated frequently results from worries
about possible adverse consequences, which difer depending on the individual [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ][
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. There are two
main types of COVID-19 vaccinations, which are nucleic acid vaccine and viral vector vaccine [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Viral
vaccines based on nucleic acids can be either deoxyribonucleic acid (DNA) or ribonucleic acid (RNA)
vaccines, which work by using the host’s cellular transcriptional and translational machinery to create
viral proteins that the immune system may then detect [13]. Because they are relatively easy to produce,
nucleic acid vaccines are better than other kinds of vaccines [14]. COVID-19 vaccines belong to RNA
subcategory (however, a variant which is of DNA is being developed). Examples are Pfizer-BioNTech
and Moderna. The viral vectors serve as a vehicle for delivering the target immunogen in viral vectored
vaccinations [15]. The vector allows the body to build an immune response by delivering viral genes that
produce antigens against an infectious pathogen. Examples include AstraZeneca, Janssen, and CanSino.
The most frequent cardiac incident following COVID-19 vaccination was found to be myocarditis,
with an overall prevalence of about 1.62 percent. mRNA vaccines caused myocarditis in almost 90% of
post-COVID19 vaccination cases; in contrast, vector-based and/or inactivated vaccines caused fewer
cases of this condition [16][17].
      </p>
      <p>Because Artificial Intelligence (AI) is being utilized to help healthcare professionals forecast patient
outcomes, it is continuously transforming biomedical research and healthcare management [18]. It is
dificult for the general public and medical professionals to understand the outcomes of AI systems,
which makes their acceptability problematic. Explainable AI (XAI) is the product of research attempts
to make the decisions or outcomes of AI systems clearly comprehensible. XAI aids in comprehending
the reasoning behind the results of machine learning algorithms [19][20]. Finding possible weaknesses
and locating the sources of errors more quickly can be facilitated by explaining how AI systems operate
internally [21][22].</p>
      <p>On the other hand, ethics and transparency are necessary for people to have faith in medical devices
that use AI [23]. Furthermore, by highlighting the combination of rigorous validation of AI
decisionsupport systems in real-life clinical settings and valuable explanations, explainability promotes ethically
sound medical decision-making [24]. It follows that the healthcare sector is one of the sectors where
user acceptability of AI algorithms depends on both explainability and accuracy [25]. Explainability
can also be used to evaluate whether the system makes fair decisions and allows users to confirm that
it does not rely on noise or artefacts in the training data, especially in situations where the training
data may present a partial or biased image of the population [26]. Moreover, explanations can help us
better grasp what the algorithm is optimized for and the associated trade-ofs, as well as provide fresh
perspectives on what the AI system has learnt from the data [27]. This paper uses explainable AI to
understand the post-vaccination mortality in persons with cardiovascular risk factors.</p>
      <p>The rest of the paper is structured as follows: Section 2 discusses the related work. Section 3 presents
the methodology. Section 4 presents the results of the research. The discussion was presented in Section
5. Section 6 discusses the limitations of the study while Section 7 presents the conclusion and further
work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>Authors in [28] examined vaccination adverse events utilizing the Vaccine Adverse Event Reporting
System (VAERS) database through the application of ontology and machine learning. In order to
provide easy access to side efect information for patients, healthcare practitioners, and oficials,
a relational/graph database was established. Machine learning algorithms also predict important
symptoms that lead to hospitalization and treatment. Using the VAERS dataset, [29] developed a
prognostic tool to determine the risks connected to COVID-19 vaccinations. Hospitalization, mortality,
and COVID-19 outcomes were predicted using machine learning models such as Multilayer
Feedforward Perceptron, Random Forest, Naive Bayes, Light Gradient Boosting Algorithm, and Linear
Regression. Males between the ages of 50 and 70 and those with significant diseases were found
to be the most vulnerable. Authors in [30] used patient data to identify common factors in adverse
reactions and develop strategies to reduce their incidence. They found that factors such as prior
illnesses, hospital admission, and SARS-CoV-2 reinfection were significantly associated with poor
patient reactions. Preexisting conditions like age, gender, and medication use were also significant
predictors. Machine learning classifiers trained with medical history were successful in predicting
complication-free vaccinations with an accuracy score above 90%.</p>
      <p>Authors in [31] employed machine learning to classify COVID-19 vaccination adverse efects in
people with sensitivities to foods, animals, and weather. Machine learning classifiers have been utilized
to predict bad responses to COVID-19 vaccinations in patients with allergies. This has been helpful in
identifying those who are more likely to have side efects. Authors in [ 32] studied COVID-19 vaccine
adverse events by gender, age, manufacturer, and dose using the Vaccine Adverse Event Reporting
System datasets. They found higher frequency in women, but diferent characteristics of vaccine adverse
events, including gender, manufacturer, age, and underlying diseases, were associated with fatal cases.</p>
      <p>
        Authors in [33] propose a multi-label classification method for Vaccine Adverse Event (VAE) detection,
utilizing term- and topic-based label selection strategies. They use one-vs-rest, problem transformation,
algorithm adaptation, and deep learning methods. Experimental results show that topic-based PT
methods improve accuracy by up to 33.69%, OvsR methods achieve optimal accuracy of 98.88%, and AA
methods increase accuracy by 87.36%. The method enhances model accuracy and VAE interpretability.
Authors in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] employed machine learning algorithms to predict and determine the severity of adverse
responses to COVID-19 and influenza vaccinations. 2111 participants used wearables like the Garmin
Vivosmart 4 and smartphone applications to participate in the study. The XGBoost model exhibited
ROC values of 0.69 and 0.74 for detecting and predicting mild to severe side efects, respectively.
      </p>
      <p>In order to identify ongoing research eforts using machine learning techniques to comprehend
comorbidity processes and make clinical predictions taking these intricate patterns into account, [18]
conducted a review. Four predictive analytics tasks were identified: risk prediction, network analysis,
clustering, and illness comorbidity data extraction. The results show that certain machine
learningdriven applications interpret the model and identify important risk factors for the development of
comorbidity while also addressing intrinsic data inadequacies in healthcare datasets.</p>
      <p>Authors in [34] investigated the impact of COVID-19 immunization in intubated patients with
acute respiratory distress syndrome associated with COVID-19 in Taiwan. Data on patients who
were intubated between May 1, 2022, and October 31, 2022, owing to COVID-19 pneumonia were
retrospectively evaluated by the authors. A person was deemed completely immunized if they received
two or more doses of the vaccine. There were 84 patients in all (40 completely immunized and 44
controls). The two groups had comparable baseline characteristics on the day of intubation, such as
age, comorbidities, and Sequential Organ Failure Assessment (SOFA) score. There was no statistically
significant diference in the Intensive Care Unit (ICU) death rate between the completely vaccinated and
control groups. Body mass index (BMI) and SOFA score had a strong correlation with ICU mortality.</p>
      <p>Numerous adverse occurrences following vaccination, including fatalities and serious illnesses,
have been documented in the literature [35][36][37]. This paper examines a particular instance that
has not been experimentally studied: post-vaccination adverse events involving individuals who had
comorbidities and cardiovascular risk factors. The findings of this paper will foster an understanding of
the underlying factors that could lead to mortality when adverse events due to post-vaccination occur
for this category of patients.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>This section presents the details of methods/techniques used in predicting post-vaccination mortality
in persons with cardiovascular risk factors using XAI. This was done in the following stages. Stage 1
contains data collection, preprocessing, and feature engineering. In stage 2, we have models’ selection,
hyperparameter tuning, model training and model evaluation. Architecture design, hyperparameter
tuning, training, and evaluation of the Stacked ensemble were done in stage 3. In stage 4, we provided
model interpretation and explainability. The process workflow is presented in Figure 1.</p>
      <sec id="sec-3-1">
        <title>3.1. Data collection</title>
        <p>The reported incidences of COVID-19 post-vaccination adverse events were extracted from the
Vaccine Adverse Event Reporting System (VAERS), which can be accessed through the following link:
https://vaers.hhs.gov/data/datasets.html was used in this study. The dataset over five (5) years from
2020 to May 2024 was downloaded in CSV format. Each year dataset has three (3) CSV files tagged
VAERSDATA (history/profile), VAERSYMPTOMS (symptoms), and VAERSVAX (vaccine). All the three
datasets for each year were merged and named as 2020COVIDCOMBINED, 2021COVIDCOMBINED,
2022COVIDCOMBINED, 2023COVIDCOMBINED, and 2024COVIDCOMBINED.</p>
        <p>Based on our interest in examining patients with cardiovascular disease risk factors, we extracted
data on instances where any two of high blood pressure, diabetes, and high cholesterol had been
acknowledged as a preexisting condition by a patient who reported post-vaccination adverse events.
We then selected the records from each year and then merged them for all the five (5) years under
consideration. To get relevant patient records, each of the symptoms was expanded in terms of their
medical terminologies as follows: High blood pressure/Hypertension; Diabetes/Type 1 Diabetes/Type 2
(includes instances where similar/closely related words like diabetic, diabetes mellitus, diabetes/Type 1
diabetic/Type 2 were used); High cholesterol/Hyperlipidemia/Hypercholesterolemia. We had a total
of 16657 records with 52 features. Out of the 52 features, there were 34 categorical features and 18
numerical features.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data preprocessing</title>
        <p>The records of patients less than 18 years of age (11 records) were removed because the focus was on
adult patients. Patient records with unknown vaccine (50 records) and patients with unknown vaccine
dosage (1552 records) were also removed. Redundant features irrelevant to our analysis or empty fields
were also dropped. We introduced additional fields, as shown in Table 1.
For DOSE_COMPLETE, we input Y for a complete
dose and N for an incomplete dose. For Moderna,
Pfizer-Biontech, and Novavax vaccines, the
patient must take 2 primary doses for a complete
dosage. For Janssen, Pfizer-Biontech Bivalent, and
Moderna Bivalent, only 1 primary dose is required.</p>
        <p>For patients who took more than the required
dose, we removed their records (3,843 records)
because they did not follow the prescription.</p>
        <p>We input Y for at least one symptom shown by the
patient. Otherwise, N was inputted. Our focus
was to look at the three cardiovascular disease risk
factors (high blood pressure, diabetes, and high
cholesterol). We created four additional fields for
high blood pressure (HBP), diabetes (DIABETES),
high cholesterol (H_CHOL), and other illnesses
(OTHER_ILL)
Since our focus was to look at the three
cardiovascular disease risk factors (high blood pressure,
diabetes, and high cholesterol, we extracted the
value automatically from the patient's HISTORY
field of the dataset
Due to the nature of the dataset, which is
usergenerated, there was no uniformity in the
reporting by the users. Extracting whether a patient
has other illnesses automatically becomes
dificult since there are many other illnesses reported.</p>
        <p>We manually checked the entries one after the
other to determine whether a patient had other
illnesses apart from at least two combinations of
the three diseases of interest (high blood pressure,
diabetes, and high cholesterol.</p>
        <p>We represented each type of vaccine with A, B,
C, D, E, and F for easy representation and
analysis. A: JANSSEN; B: MODERNA; C: MODERNA
BIVALENT; D: NOVAVAX; E: PFIZER-BIONTECH;</p>
        <p>F: PFIZER-BIONTECH BIVALENT</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Handling missing data</title>
          <p>To handle the missing values in the dataset, we used univariate imputation or data removal techniques
as follows:
1. AGE_YRS: There are 173 cases of missing age values. We calculated the mean (67.24) and median
(68) of the patients’ age from the existing observations. Since the mean and median are close, we
picked the median value and inserted it in place of the missing observations.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Feature engineering</title>
        <p>There are 22 attributes consisting of 19 categorical features and 3 numerical features (See Table 2).</p>
        <p>The categorical features were transformed using one-hot encoding, and Min-Max scaling was applied
to the numerical features. LASSO Regression and Random Forests algorithms were used for feature
selection. We split the data into train and test data in the ratio 70:30, respectively. The GridSearchCV
was used to get the best hyperparameter for alpha needed for LASSO Regression. The result shows
1e-05 for alpha as the best parameter for the GridSearchCV. The two feature selection algorithms
used were compared using log-likelihood and AUC-ROC scores. Lasso Log-Likelihood (DIED_N):
-0.17485207557765708, (DIED_Y): -0.1748520755782875; Random Forest Log-Likelihood (DIED_1):
0.0006035625338038024 (DIED_2): -0.0006035625338038024 as shown in Figure 2. The AUC-ROC scores
for LASSO Regression and Random Forests are presented in Figure 3.</p>
        <p>The RF selected the top 19 features and LASSO also has 19 distinct selected features. The results of
Log-likelihood (which shows the measure of goodness of fit), and AUC-ROC Scores showed that RF
performed better than Lasso. Therefore, we chose features selected by RF for further analysis.</p>
        <p>Based on the features selected, the original and distinct features from Excel in CSV format were
then loaded. At the end of the features selection, we had a total of 20 distinct features: 18 categorical
variables (including the target variable) and 2 numerical variables, as shown in Table 3. Table 3 provides
the selected features and their descriptions.</p>
        <p>Frequencies
4835, 6242
326, 10751
4160, 6917
11, 11066
313, 10764
3913, 4493,
2671
1752, 9325
11040, 37
6958, 4119
6947, 4130
9142, 1935
516, 10561
6, 11071
2870, 8207
2492, 8585
5317, 5760
768, 5089,
35, 2, 5141,
42
1035, 10042
Data value
Male/Female
Yes/No
Yes/No
Yes/No
Yes/No
Yes/No/Unknown
Yes/No
Yes/No
A/B/C/D/E/F
DIED</p>
        <p>Vaccine recipient died or not
Numerical variable</p>
        <p>Feature name/description
AGE_YRS
HOSPDAYS</p>
        <p>The age of the vaccine recipient in years
Number of days Hospitalised</p>
        <p>Yes/No
Data range
18-103
0-91</p>
        <p>Mean
66.62
2.00</p>
        <p>Standard deviation
13.29
5.02</p>
        <sec id="sec-3-3-1">
          <title>3.3.1. Handling class imbalance</title>
          <p>Since our data is largely skewed based on our target variable, we have a class imbalance as presented in
Figure 4 with Alive (N) = 10042 and Died (Y) = 1035 records.</p>
          <p>We used Random Oversampling and SMOTE techniques to balance the datasets. We compared the
performance of Random Oversampling with SMOTE using AUC_ROC and Average Precision metrics.
SMOTE performed better in terms of AUC-ROC and Average Precision. Hence, SMOTE was selected as
the resampling technique used to balance the dataset (see Table 4).</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Model training</title>
        <p>We trained seven (7) ML algorithms on the dataset. The algorithms are Logistic Regression (LR),
K-Nearest Neighbour (KNN), CatBoost, XGBoost, Support Vector Machine (SVM), Deep Multilayer
Perception (DMLP), and Random Forests. We also trained two stacked ensembles, each of these had 6
base models and a meta-learner. The stacked ensembles are Stack Ensemble-CatBoost (CatBoost as the
meta-learner) and Stack Ensemble-XGBoost (XGBoost as the meta-learner). We had a total of 20084
records after resampling with SMOTE. We split the data into training and testing sets in the ratio of
70:30, that is (14058: 6026), respectively.</p>
        <sec id="sec-3-4-1">
          <title>3.4.1. Hyperparameter tuning</title>
          <p>To select the best hyperparameters for the seven models and the stacked ensembles, we performed
hyperparameter tuning for the seven models and the two stack ensembles using GridSearchCV and
Bayesian Optimization, respectively. For the stack ensembles, the best hyperparameters were selected
out of fifty (50) trials. The best hyperparameters for each model are presented in Table 5.</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>3.4.2. Addressing overfitting</title>
          <p>In the hyperparameter optimization, we employed k-fold cross validation, which averages the scores
across all iterations to get the final assessment of the models. One of the advantages of the ensemble
techniques which was used in this study is to address the overfitting issue. In addition, the evaluation
of the models was performed on the test set of the data used in this study.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>In this study, we have explored the post-vaccination adverse event for patients with cardiovascular
comorbidity indicators by targeting those who are alive or dead. We used the following metrics to
assess the performance of the seven algorithms and the two stacked ensembles: Accuracy, Precision,
Recall, F1, AUC, and MCC. We also plot the confusion matrix and ROC curve for the algorithms. SHAP
was also used to determine the importance of the features and their impact on the best-performing
model. The results of the seven models and two stack ensemble models are presented in Table 6.</p>
      <p>From Table 6, CatBoost had the best performance among all the individual models that were trained
and tested. As a result, CatBoost was used as a meta-model for the stacked ensemble, while the remaining
six models were used as base models. However, we noticed a slight drop in the performance of Stack
ensemble-CatBoost compared to the performance of CatBoost as an individual model. We went on to
pick the individual model that performed next to CatBoost, which was XGBoost, as the meta-learner of
the stacked ensemble, and the result showed a superior performance.</p>
      <p>From Figure 5, the receiver operating characteristic (ROC) curves for all the models are shown. The
ROC is used to evaluate the performance of a binary diagnostic classification method. The ROC tells
how much the model is capable of distinguishing between classes. All the models show outstanding
performance of more than 0.9, except in the logistic regression model, which has 0.89. The ROC results
show that the models can clearly distinguish between the two classes, Alive or Died, in the case of this
study.</p>
      <sec id="sec-4-1">
        <title>4.1. Model prediction explanation</title>
        <p>SHAPley Additive ExPlanations (SHAP) is used to explain the output of a machine learning model. It
uses game theory to assign credit for a model’s prediction to each feature or feature value [38][39]. Out
of all the seven (7) base models, the performance of the CatBoost model was the best. The CatBoost
model’s predictions on the test data were used as input to the SHAP for the model’s predictions’
explanations. The SHAP summary plot is presented in Figure 6.</p>
        <p>From Figure 6, the Y-axis indicates the feature names in order of importance from top to bottom. The
X-axis represents the SHAP value, which indicates the degree of change in log odds. The colour of each
point on the graph represents the value of the corresponding feature, with red indicating high values
and blue indicating low values. Each point represents a row of data from the original dataset. From
Figure 6, it is clear that the feature ‘AGE_YRS’ has the highest impact on the model while the feature
‘HBP’ has the least impact.</p>
        <p>We identified the significant determinants of post-vaccination mortality in individuals with
cardiovascular comorbidities from the SHAP summary plot. The risk of death generally rises with age and
the number of days spent in the hospital; on the other hand, the likelihood of survival increases with
the absence of current illness, life-threatening conditions, hospitalization, prolonged hospitalization,
disability, birth defects, doctor visits, and emergency care. On the other hand, the risk of death is raised
by diabetes, high blood pressure, high cholesterol, and other conditions. The results of this study add to
our knowledge of important variables to consider when managing post-vaccination adverse events for
patients who also have cardiovascular disease comorbidity indicators.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>Authors in [40] identified the top ten factors that increase the risk of cardiovascular disease (CVD) as
unhealthful nutrition, physical inactivity, dyslipidemia (high cholesterol), hyperglycemia (high blood
sugar), high blood pressure (hypertension), obesity, considerations of select populations (older age,
race/ethnicity, and sex diferences), thrombosis/smoking, kidney dysfunction and genetics/familial
hypercholesterolemia. This study investigates how COVID-19 post-vaccination adverse events afect
persons with CVD risk factors, particularly focusing on the co-occurrences of dyslipidemia (high
cholesterol), hyperglycemia (high blood sugar), diabetes, and high blood pressure (hypertension). Our
results show that the risk of death from COVID-19 post-vaccination adverse events increases for persons
with diabetes, high blood pressure, high cholesterol, and other illness conditions. This observation
agrees with the results of previously reported clinical studies.</p>
      <p>According to [41], based on a study on 187 COVID-19 patients in which 144 patients survived and
43 patients died, found that the probability of death was higher (69.44%) for those with underlying
CVD and elevated TnTs (troponin T) levels. Also, patients with underlying CVD were more likely to
exhibit elevation of TnT levels (54.5%) compared with the patients without CVD (13.2%). In addition,
patients with elevated TnT levels had evidence of more severe respiratory dysfunction and developed
more frequent complications. Also, [32], a systematic review study that captured findings from 3912
participants, found that patients with preexisting CVD had worse outcomes and increased risk of
death from COVID-19. Also, CVD risk factors such as hypertension, diabetes mellitus, and obesity
were associated with the severity of COVID-19 infection, intensive care unit admission and poor
prognosis. Thus, the results of this study add to our knowledge of important variables to consider
when managing post-vaccination adverse events for patients who also have cardiovascular disease
comorbidity indicators.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Limitations of the study</title>
      <p>The VAERS dataset used in this research work is user generated information and there was no validation
done on the reports. The type of features in the dataset also constitutes a limitation as most of the
features extracted from the dataset are categorical variables; the dataset did not have the actual values
of cholesterol, blood pressure, glycemia values as numerical variables. The absence of these numerical
variables limit the depth of computational analysis that is possible.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion and further work</title>
      <p>This paper explores the critical determinants of mortality in post-vaccination adverse events for persons
with cardiovascular disease risk factors using XAI. We did this by extracting 16657 records (from 2020
to May 2024) of COVID-19 vaccinated persons with preexisting cardiovascular disease risk factors (any
two of high blood pressure, diabetes, and high cholesterol) from the VAERS public dataset. Next, we
applied a predictive modelling process that has four stages, which are 1) data preprocessing; 2) model
training and performance evaluation of seven ML algorithms: Random Forest (RF), Support Vector
Machine (SVM), K-Nearest Neighbor (KNN), Categorical Boosting (CatBoost), Random Forest (LR), and
Extreme Gradient Boosting (XGBoost); 3) modelling using two stacked ensembles consisting of six base
models — Using Catboost and XGBoost as the meta-learners respectively; 4) model explainability using
SHAP.</p>
      <p>The results show that Stacked ensemble - XGBoost had the best overall performance (Acc = 0.96, F1 =
0.96, AUC = 0.99), whereas CatBoost has the best performance among the individual machine learning
models (Acc = 0.96, F1 = 0.96, AUC = 0.96). We also identified the critical determinants of mortality in
persons with cardiovascular disease risk comorbidity when a post-vaccination adverse event occurs.
The risk of death generally rises with age and the number of days spent in the hospital; on the other
hand, the likelihood of survival increases with the absence of current illness, life-threatening conditions,
hospitalization, prolonged hospitalization, disability, birth defects, doctor visits, and emergency care.
Also, the risk of death is raised by diabetes, high blood pressure, high cholesterol, and other conditions.
The results of this study foster an understanding of critical factors that could enable better handling of
adverse events related to post-vaccination in patients with cardiovascular disease comorbidities.</p>
      <p>In further research, we will explore the severity of COVID-19 post-vaccination adverse events in
patients with cardiovascular disease risk factors. We will also try to explore real-life datasets on the
same scenario proposed in this study.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The research was supported by the National Research Foundation (NRF) of South Africa, and the
University of Pretoria, South Africa.
[13] W. W. Leitner, H. Ying, N. P. Restifo, DNA and RNA-based vaccines: principles, progress and
prospects, Vaccine 18(9-10) (1999) 765-777. doi: 10.1016/s0264-410x(99)00271-6.
[14] N. P. Restifo, H. Ying, L. Hwang, W. W. Leitner, The promise of nucleic acid vaccines, Gene Ther.</p>
      <p>7(2) (2000) 89-92. doi: 10.1038/sj.gt.3301117.
[15] Centers for Disease Control and Prevention, Vaccines and immunizations: viral vector COVID-19
vaccines. Atlanta, GA: Centers for Disease Control and Prevention (2021). www.cdc.gov/vaccines/
covid-19/hcp/viral-vector-vaccine-basics.html.
[16] M. H. Paknahad, F. B. Yancheshmeh, A. Soleimani, Cardiovascular complications of COVID-19
vaccines: a review of case-report and case-series studies, Heart Lung 59 (2023) 173-180. doi:
10.1016/j.hrtlng.2023.02.003.
[17] Z. Akhtar, M. Trent, A. Moa, T. C. Tan, O. Fröbert, C. R. MacIntyre, The impact of
COVID19 and COVID vaccination on cardiovascular outcomes, European Heart Journal Supplements
25(Supplement_A) (2023) A42–A49. https://doi.org/10.1093/eurheartjsupp/suac123.
[18] D. Xu, Z. Xu, Machine learning applications in preventive healthcare: A systematic literature review
on predictive analytics of disease comorbidity from multiple perspectives, Artificial Intelligence in
Medicine 156 (2024) 102950. https://doi.org/10.1016/j.artmed.2024.102950.
[19] S. Ali, T. Abuhmed, S. El-Sappagh, K. Muhammad, J. M. Alonso-Moral, R. Confalonieri, R. Guidotti,
J. Del Ser, N Díaz-Rodríguez, F. Herrera, Explainable artificial intelligence (XAI): What we know
and what is left to attain trustworthy artificial intelligence. Information Fusion 99 (2023) 101805.
https://doi.org/10.1016/j.infus.2023.101805
[20] T. Kolajo, O. Daramola, Human-centric and semantics-based explainable event detection: a survey,</p>
      <p>Artificial Intelligence Review 56 (2023) 119-158. https://doi.org/10.1007/s10462-023-10525-0
[21] J. Amann, D. Vetter, S. N. Blomberg, H. C. Christensen, M. Cofee, S. Gerke S, To explain or not
to explain?—Artificial intelligence explainability in clinical decision support systems. PLOS Digit
Health 1(2) (2022) e0000016. https://doi.org/10.1371/journal.pdig.0000016
[22] U. Bhatt, A. Xiang, S. Sharma, A. Weller, A. Taly, Y. Jia, J. Ghosh, R. Puri, J. M. F. Moura, P. Eckersley,
Explainable machine learning in deployment. Proceedings of the 2020 Conference on Fairness,
Accountability, and Transparency pp.648 - 657
[23] L. Farah, J. M. Murris, I. Borget, A. Guilloux, N. M. Martelli, S. I. Katsahian, Assessment of
performance, interpretability, and explainability in artificial intelligence–based health technologies:
what healthcare stakeholders need to know. Mayo Clinic Proceedings: Digital Health 1(2) (2023)
120-138. https://doi.org/10.1016/j.mcpdig.2023.02.004
[24] Gerdes, A. The role of explainability in AI-supported medical decision-making. Discov Artif Intell
4 (2024) 29. https://doi.org/10.1007/s44163-024-00119-2.
[25] V. Hassija, V. Chamola, A. Mahapatra, A. Singal, D. Goel, K. Huang, S. Scardapane, I. Spinelli, M.</p>
      <p>Mahmud, A. Hussain, Interpreting black-box models: a review on explainable artificial intelligence,
Cogn Comput 16 (2024) 45-74. https://doi.org/10.1007/s12559-023-10179-8.
[26] T. Kirat, O. Tambou, V. Do, A. Tsoukiàs, Fairness and explainability in automatic decision-making
systems. a challenge for computer science and law, EURO Journal on Decision Processes 11 (2022)
100036. https://doi.org/10.1016/j.ejdp.2023.100036.
[27] M. M. Soliman, E. Ahmed, A. Darwish, A. E. Hassanien, Artificial intelligence powered metaverse:
analysis, challenges and future perspectives. Artif Intell Rev 57 (2024) 36. https://doi.org/10.1007/
s10462-023-10641-x.
[28] Z. Liu, X. Gao, C. Li, Modeling COVID-19 vaccine adverse efects with a visualized knowledge
graph database, Healthcare 10 (2022) 1419. https://doi.org/10.3390/healthcare10081419.
[29] H. Gupta, O. M. Verma, Vaccine hesitancy in the post-vaccination COVID-19 era: a machine
learning and statistical analysis driven study, Evolutionary Intelligence 16 (2023) 739–757 https:
//doi.org/10.1007/s12065-022-00704-3.
[30] M. Ahamad, S. Aktar, J. Uddin, R. Al-Mahfuz, A. K. M. Azad, S. Uddin, S. A. Alyami, I. H. Sarker, A.</p>
      <p>Khan, P. Liò, J. M. W. Quinn, M. A. Moni, Adverse efects of COVID-19 vaccination: machine
learning and statistical approach to identify and classify incidences of morbidity and postvaccination
reactogenicity, Healthcare 11 (2023) 31. https://doi.org/10.3390/healthcare11010031.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Risk stratification of cardiovascular disease according to age groups in new prevention guidelines: a review</article-title>
          ,
          <source>Journal of Lipid and Atherosclerosis</source>
          <volume>12</volume>
          (
          <issue>2</issue>
          ) (
          <year>2023</year>
          )
          <fpage>96</fpage>
          -
          <lpage>105</lpage>
          . https://doi.org/10. 12997/jla.
          <year>2023</year>
          .
          <volume>12</volume>
          .2.96.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Vaughn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tabet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Ranking age-specific modifiable risk factors for cardiovascular disease and mortality: evidence from a population-based longitudinal study</article-title>
          , eClinicalMedicine
          <volume>64</volume>
          (
          <year>2023</year>
          )
          <article-title>102230</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.eclinm.
          <year>2023</year>
          .
          <volume>102230</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>WHO</surname>
          </string-name>
          ,
          <article-title>Number of COVID-19 deaths reported to WHO (cumulative total) (</article-title>
          <year>2024</year>
          ). https://data.who. int/dashboards/covid19/deaths?n=c
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hosseinzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zareipour</surname>
          </string-name>
          , E. Baljani,
          <string-name>
            <given-names>M.R.</given-names>
            <surname>Moradali</surname>
          </string-name>
          ,
          <article-title>Social consequences of the COVID-19 pandemic, A systematic review</article-title>
          .
          <source>Investig. Educ. Enferm</source>
          .
          <volume>40</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Elharake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Akbar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Malik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gilliam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Omer</surname>
          </string-name>
          ,
          <article-title>Mental health impact of COVID-19 among children and college students: a systematic review</article-title>
          ,
          <source>Child Psychiatry Hum. Dev</source>
          .
          <volume>54</volume>
          (
          <issue>3</issue>
          ) (
          <year>2022</year>
          )
          <fpage>913</fpage>
          -
          <lpage>925</lpage>
          . https://doi.org/10.1007/s10578-021-01297-1.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. L.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <source>Characteristics of SARS-CoV-2 and COVID-19. Nat. Rev. Microbiol</source>
          .
          <volume>19</volume>
          (
          <year>2021</year>
          )
          <fpage>141</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xia</surname>
          </string-name>
          , H. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Shang,</surname>
          </string-name>
          <article-title>Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study</article-title>
          ,
          <source>Lancet Respir Med</source>
          <volume>8</volume>
          (
          <year>2020</year>
          )
          <fpage>475</fpage>
          -
          <lpage>81</lpage>
          . https://doi.org/10.1016/S2213-
          <volume>2600</volume>
          (
          <issue>20</issue>
          )
          <fpage>30079</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dayaramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Leon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Reiss</surname>
          </string-name>
          ,
          <article-title>Cardiovascular Disease Complicating COVID-19 in the elderly</article-title>
          .
          <source>Medicina</source>
          ,
          <volume>57</volume>
          (
          <issue>8</issue>
          ) (
          <year>2021</year>
          )
          <article-title>833</article-title>
          . https://doi.org/10.3390/medicina57080833
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Levi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Brandeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shmueli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yamin</surname>
          </string-name>
          ,
          <article-title>Prediction and detection of side efects severity following COVID-19 and influenza vaccinations: utilizing smartwatches and smartphones</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>14</volume>
          (
          <issue>1</issue>
          ) (
          <year>2024</year>
          )
          <article-title>6012</article-title>
          . doi:
          <volume>10</volume>
          .1145/1188913.1188915.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Biswas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. K.</given-names>
            <surname>Mustapha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Price</surname>
          </string-name>
          ,
          <article-title>The nature and extent of COVID-19 vaccination hesitancy in healthcare workers</article-title>
          .
          <source>J. Commun. Health</source>
          <volume>46</volume>
          (
          <year>2021</year>
          )
          <fpage>1244</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarpanah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Farhadloo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vahidov</surname>
          </string-name>
          , L. Pilote,
          <article-title>Vaccine hesitancy: evidence from an adverse events following immunization database, and the role of cognitive biases</article-title>
          .
          <source>BMC Public Health</source>
          <volume>21</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Muhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Nehira A</given-names>
            .
            <surname>Malhotra S. O. Kotchoni</surname>
          </string-name>
          ,
          <article-title>The race for COVID-19 vaccines: the various types and their strengths and weaknesses</article-title>
          ,
          <source>J Pharm Pract</source>
          .
          <volume>36</volume>
          (
          <issue>4</issue>
          ) (
          <year>2023</year>
          )
          <fpage>953</fpage>
          -
          <lpage>966</lpage>
          . doi:
          <volume>10</volume>
          .1177/089719010.1177/089719002210972480221097248.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>