<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CURE: An Efe ctive COVID-19 Remedies based on Machine Learning Prediction Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Poonam Phogat</string-name>
          <email>poonamphogat07@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rajat Chaudhary</string-name>
          <email>rajat@biet.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>COVID-19, Machine Learning, Prediction Model</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science &amp; Engineering, Bharat Institute of Engineering &amp; Technology</institution>
          ,
          <addr-line>Hyderabad, Telangana</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Computer Science &amp; Engineering, SGT University</institution>
          ,
          <addr-line>Gurugram, Haryana</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <fpage>415</fpage>
      <lpage>424</lpage>
      <abstract>
        <p>Coronavirus disease (COVID-19) is a severe pandemic infectious virus that enters into healthy cells of a living body. COVID-19 virus makes copies in the organs of the host body by multiplying itself which ultimately leads to the death of some healthy cells and therefore weakens the immune system. In a mild stage, it mainly afe cts the respiratory tract and leads to pneumonia, organ failure, and death reaching the last stage. This paper focused on the early detection of the COVID-19 patient based on the positive symptoms of the disease. In this paper, the COVID-19 Remedies (CURE) scheme is proposed based on machine learning prediction models for the treatment of COVID patients. For experimental results, the performance analysis of the CURE scheme is evaluated on the Python platform which is tested using the Kaggle dataset from Johns Hopkins University.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>COVID-19</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>ratory syndrome coronavirus-2 (COVID-2) that was first
The virus that induces COVID-19 is a severe acute respi- globally.
diagnosed in late December 2019 during an investigation
into an outbreak in Wuhan, China. As the cases were
increasing rapidly throughout the world, the WHO
declared the disease pandemic on March 11, 2020. Currently,
the transmission of COVID-19 becomes uncontrollable
because the number of cases has reached the threshold
limit [1]. The virus enters into healthy cells of a living
body and makes copies in the organs of the host body by
multiplying itself which ultimately leads to the death of
some healthy cells and therefore weakens the immune
system. In a mild stage, it mainly afe cts the respiratory
tract and leads to pneumonia, organ failure, and death
reaching the last stage [2]. The disease is prominent
in old age people with a weak immune system and
already having other primitive diseases like diabetes, high
blood pressure, cardiovascular and respiratory diseases
[3].</p>
      <p>Figure 1 shows the global statistics till July 30,
2020, on the total confirme d cases, active cases, total
ure 1(a) presents the total number of coronavirus cases
across difer ent countries which shows that the virus is
spreading rapidly with the highest cases in the USA
followed by India. The total confirme d positive cases across
the world are 2,18,69,976 out of which 26,47,663 cases
Chaudhary)</p>
      <p>0000-0002-6554-918X (R. Chaudhary)
are of India. Figure 1(b) shows the statistics of the active
cases, where there are 65,04,303 active cases occurred</p>
      <p>Figure 1(c) presents the total death cases, and finally ,
Figure 1(d) shows the total cured cases [4]. This is a
communication spreading virus that spreads through
respiratory droplets present in the air. These aerosols
come to an open environment when an infected person
sneezes and coughs and enter in other persons through
the mouth and nostrils and reach to lungs. There is no
precise treatment to cure COVID-19. Some steps are
being taken to eliminate the virus using difer ent medicines
like Hydroxychloroquine which is an antimalarial
antibiotic. Currently, it is used to treat coronavirus patients, it
helps in inhibition of infection by increasing the
endosomal pH which provides enough strength to the immune
system to fight against the viral disease [5].</p>
      <sec id="sec-2-1">
        <title>Some preventions are necessary for the treatment of</title>
        <p>this pandemic. From the very beginning of COVID-19,
the government of almost all the countries has taken
strict actions such as complete lockdown, social
distancing, use of sanitizer, and masks to reduce all the
causLearning seems to be the best prediction model for
forecasting the increasing COVID-19 infected cases.
Regression and classification</p>
        <p>approach of ML work according
to the availability of data to diagnose this problem.</p>
        <sec id="sec-2-1-1">
          <title>1.1. Contributions</title>
          <p>The contributions of the paper are summarized below.</p>
          <p>• Diagnose the symptoms of COVID-19 patients</p>
          <p>based on the classification of the diseases.
• To recover the COVID-19 patients, CURE scheme
deaths, and total cured cases on the COVID-19 virus. Fig- ing elements [6]. By exploring various studies, Machine
25.08%</p>
          <p>Active Cases</p>
          <p>Total Cases</p>
          <p>Total Deaths</p>
          <p>Total Cured
50,921
6.6%
7,73,741</p>
          <p>(d)
19,19,842
13.3%
1,45,91,932</p>
          <p>India
India's Share (%)</p>
          <p>World
6,76,900
10.4%
65,04,303</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Literature Review</title>
      <p>The researchers introduce some methods of Machine
• Finally, the performance evaluation is compared Learning for classification. The easiest classification is
with the fiv e classifiers and predicts the most efi- the Linear Regression method which is used to reduce
cient outcome using the Python platform. the sum of squared difer ences between real and
predicted data. The drawbacks of this model are its
non1.2. Paper Organization efe ctiveness with non-alignment data and sensitiveness
to deviation [7]. Through the Logistic Regression Model,
  The rest of the paper is structured as follows: Section II it is shown that the contingency of conclusion is Logistic
discusses the literature review of the existing schemes. function-based. The positiveness of this model is that it
Section III presents the system model followed by the pro- is free of complications. But it fails to assume linearity.
posed CURE scheme in Section IV. Section V comprises By Naive Bayes Model, it is proposed that it confine d
training data to calculate inevitable parameters and efi- COVID-19
caciously deals with real-world data. One another model Dataset Training Dataset Trained Model
[Km9-o]NP.dieneastreterdsetattNaael.aign[1hd0br]oepulrerovspahonostwewds iMtthhaacmthiuti nwlteio-Lcreklaassresnficipinergnoatblplyepmroswa[ci8thh]e, s (FDpeDiraaoSatgcutynaermoesPpssSretienoe-CgmleOsc)tVioIDn 43521P.....kLSNRre-iVaanNdinMeivNcdaetorioBmRnaeyFmgeorosredesessltison 345621P......AAKHGMerSUU-iEfnMoCCRireImHnadasenuxcree Metrics prCepdoaeim(cnrOftapoioularynmtrspisimausonto)oncdfeoefls.
multi-layered perceptron-imperialist competitive algo- 987... SSMepWneLsc..SSpeencs9955
rithm (MLP-ICA) and adaptive network-based fuzzy inter- 10. ER
ference system (ANFIS) for prediction of the COVID-19
confirme d positive and death cases. This model is used Figure 2: Workflow of the proposed CURE Scheme for the
to maintain accuracy for the next 9 days which gives the treatment of COVID patient.
reassuring results [11]. The government and the public
have to appreciate the researchers and help in lowering
the data by maintaining social distancing and following symptoms of COVID-19 patients.
other precautions [12]. Hamzeh et al. [13] works on Problem is heightened with the unbalancing of data. In
Susceptible-Exposed-Infectious-Recovered (SEIR) model medical data the class imbalance problem is frequent
which predicts that it performs well on moderate data. which occurs with the dominancy of more cases of some
The outbreak of this infectious disease may cause varia- classes over others. To handle the imbalanced dataset,
tions in the data prediction. several elucidations are appropriate at both algorithmic</p>
      <p>Jia et al. [14] defines four stages for COVID-19 cases. and data level. In this paper, the performances of 5
classiIn the first stage, there comes travel history of a person ifers and regressions are compared on imbalanced dataset
having COVID-19 symptoms which leads to lockdown. which is obtained while studying on the prediction of
When the infected person comes in contact with other COVID-19. On the bases of attainment of these
regrespersons, the virus reached in the second stage. To pre- sion and classifiers, impact of SMOTE (Synthetic
Minorvent the increasing data social distancing is applied. Next, ity Oversampling Technique) - an approach which deals
the third stage in which there is neither travel history with imbalanced dataset, is thoroughly evaluated.
nor contact with an infected person. So the chances of With the comfort of the algorithms used in this method,
viral spreading through the respiratory droplets become k samples are finding out which are in proximity to
high. Hence, the use of masks and sanitizers is neces- the minority samples in minority classes and standard
sary. The next and last stage is an uncontrollable stage Euclidean distance method is used to attain this
diswhere the cases reached the threshold limit. Tuli et al. tance. With the number of cases in minority and
ma[15] improved COVID-19 prediction by using a model of jority classes, imbalanced dataset is taken. Based on the
Machine Learning. In this model data-driven approach independent variable, the original dataset is partitioned
is used to help the government and the public. After cov- into two sets – training set (80%) and test sets (20%)
usering data with ML and AI, researchers can forecast the ing stratifie d random sampling. By applying SMOTE
time scale and regions where the possibility of spread- technique, training set is over samples to find out the
dising of this disease is maximum. This is predicted that tribution of class suited best to the dataset and 8 training
using difer ent models of ML, COVID-19 cases can be sets obtain among which 1 is original set other than 7
controlled or eliminated from all the countries of the over sampled set having difer ent rates.
world which are facing this critical situation.</p>
    </sec>
    <sec id="sec-4">
      <title>3. System Model</title>
    </sec>
    <sec id="sec-5">
      <title>4. Proposed CURE Scheme</title>
      <p>The proposed CURE scheme uses wide range of methods
Figure 2 presents the workflo w of the proposed CURE and tools are used for prediction. With the combination
scheme for the treatment of COVID patient. Initially, the of difer ent models- SVM (Support Vector Machine), LR
input is the dataset that is taken from Johns Hopkins Uni- (Linear Regression), k-NN (k- Nearest Neighbors),
Clasversity dataset. Then the symptoms of positive cases are sification Naïve Bayes and R tool, a machine learning
analyzed which are categorized into 3 sub-parts: Severe, model is proposed for forecasting of COVID-19 infection
Moderate, and Mild symptoms. A patient having severe rate. Collected Dataset is cleaned before further
processsymptoms which includes throttling must face a harsh ing and is considered as first step in knowledge discovery
period. Moderate symptoms include shortness of breath, in databases. For written characters classification
probfever, cough. Mild symptoms include fever, cough, and lems this data cleansing process is applied using Machine
headache. The proposed scheme for COVID-19 outbreak Learning techniques. The process that implements
methanalysis is trained and tested on real-time data using the ods to detect missing and incorrect data, error correction
and explore data bases is called data cleaning in which
using mean squared error (MSE). The pros of using LR are
reassembling and disintegrating of data is involved. Data
easy, simple implementation, fast training, regularized
cleansing is practiced on numerous merged data bases in
to avoid over fitting, easily updated with new data using
which appearance of duplicate records takes place. Four
gradient descent. The disadvantages of LR model is that it
dimensional qualities are proposed which includes
cerperforms poorly for non-linear relationships, not fle xible
tainty, correctness, integrity and consistency.</p>
      <p>to capture complex patterns, polynomials can be time
Primary symptoms of this disease include loss of taste
consuming. However to generate a discrete output i.e., 0
and smell, headache, fever, dizziness, tiredness and
shortor 1, the logistic regression (binary classification)
model is
ness of breath. Since seriousness, symptoms are
clasused. Figure 3(b) shows an example of Logistic regression
sifie d into three categories i.e. mild, moderate, and
sewhich calculates the aggregate sum of the input variables
vere. Mild symptoms possess fever, cough, headache. similar to LR model but it runs the output through
nonThe frequency of seriousness is low at this stage. Then
linear sigmoidal function to generate the output.
LR is the most usable statistical technique for predic- into classes to calculate maximum marginal hyperplane.
tive analysis in Machine Learning.  Based on supervised
learning, Linear regression is a Machine Learning algo- the classes based on that SVM select the hyperplane that
Initially, SVM find
hyperplanes iteratively that isolate
rithm which performs a regression task. LR prediction
model use the given data points to obtain the optimal ficiently
ift line to train the dataset. A simple equation of a line
is  =   +  , where  is a dependent variable,  is
independent variable, and  ,  are constant whose values
are computed by using the calculus theories. Figure 3(a)
shows an example of LR prediction model that consider
the features as input and predict a continuous output
divides the classes in best way. SVM can perform
efon non-linear classification</p>
      <p>while performing
linear classification.</p>
      <p>With dimensional spaces and the
cases having number of dimensions greater than
number of samples, it is extremely efe ctive. SVM tranform
the input vector to n-dimensional space known as a
feature space (f) by using non-linear function then a linear
function of linear regression is performed to space. It is
comes the moderate stage in which shortness of breath
is the main symptom along with high fever and cough.</p>
      <p>In severe stage, the patient reach into critical situation
and becomes profoundly serious. Respiratory problem
is the main problem the patients must face. The virus
mainly afe cts the lungs which damages alveoli
responsible for supply of oxygen to all parts of body through
blood vessels and RBCs, respectively. The virus damages
the alveolus wall and results into its thickening due to
which transfer of oxygen to RBCs lowers down which
ultimately leads to hypoxia.  Due to insuficient</p>
      <p>intake of
oxygen, chances of organ failure remain high. Collected
data is first trained and then tested using difer ent models
- SVM (Support Vector Machine), LR (Linear Regression),
k-NN (k- Nearest Neighbors), Classification</p>
      <p>and Naïve
Bayes. The explanation of these prediction methods are
listed below.</p>
      <sec id="sec-5-1">
        <title>4.1. Linear Regression (LR)</title>
        <p>SVM is a supervised ML algorithm used for both
classification</p>
        <p>and regression. An example of SVM
classiifer</p>
        <p>is shown in Figure 3(c) which is a representation
of difer ent classes in a decision plane or hyperplane in
n-dimensional space. In this figur e, support vector are
the datapoints that are nearest to the hyperplane. These
data points are divided into classes by using separating
line ( 1,  2,  3). Here, a margin is define d as the gap
or perpendicular distance from the line to the support
vectors. The objective of SVM is to separate the datasets
 =</p>
        <p>1
1 +  − ,
where x is the input value, y is the output value of the
model, and  is exponential. LR prediction model can be
implemented on Python.
4.2. Support Vector Machine method
(SVM)
as a result by obtaining a linear curve for a given prob- implemented in Python by using SVM kernels. The types
lem. The output of LR model is computed by using the
equation. </p>
        <p>=  0 +  1 1 +  ,
where  0 represents y intercept,  1 represents slope,
 1 is the input value,  represents error term, and  is
the output value of the model. Initially at the start of
the training,  is initialized randomly but we correct 
during the training specifie d to each feature such that
the loss (deviation between the desired and predicted
output) is minimized. The metric of loss is calculated by
of SVM kernels are linear kernel, polynomial kernel, and
raial bias function (RBF) kernel.
(1)</p>
        <sec id="sec-5-1-1">
          <title>Linear Kernel: It is the dot product between two ob</title>
          <p>servations and the linear kernel function is define d by
using the equation.</p>
          <p>( ,   ) =    ( ∗   ),
where  ,   are two vectors.</p>
          <p>Polynomial Kernel: It discriminate curved or non-linear
input space which is define d by using the equation.
 ( ,   ) = 1 +    ( ∗   ) ,

(2)
(3)
(4)
where  lies between 0 and 1 which is set manually
and its default value is 0.1.</p>
          <p>The steps to be followed in implementing SVM
classiifer for text classification are as follows: (i) import   
packages. (ii) load the input dataset. (iii) select features
from the dataset. (iv) plot SVM boundaries with original
data. (v) generate the values of regularization parameter.
(vi) SVM classifier object are created by using kernel
(linear, polynomial, RBF). (vii) text final output is the text
classification. The advantage of using SVM classifiers are
high accuracy with multi-dimensional space, stores very
less memory and use a subset of training points. The
disadvantage of SVM classifiers is that the performance
of SVM does not scale for larger datasets due to high
training time, and does not perform good with
overlapping classes. Thus, decision tree are usually preferred
over SVM for large datasets.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>4.3. k-NN (k-Nearest Neighbors)</title>
        <p>where  is the degree of polynomial which is manually dataset to pandas dataframe, (v) perform data
preprocessset in the learning algorithm. ing, (vi) split the data into train and test dataset (60%</p>
        <p>Radial Bias Function (RBF) Kernel: It transform input training data and 40% of testing data), (vii) perform data
space into multi-dimensional space which is define d by scaling, (viii) train the model using K-nearest neighbors
using the equation. classifier class of sklearn, (ix) obtain prediction, (x)
output results- confusion matrix, classification report, and
 ( ,   ) =    (− ∗    ( ∗   )2), (5) accuracy. The benefits of k-NN algorithms are simple,
useful for nonlinear data, high accuracy. The limitations
of k-NN algorithm is that it is costly algorithm as it stores
all the training data. In addition, it requires more
memory storage, and prediction is slow in case of large dataset.
k-nearest neighbors (k-NN) algorithm is supervised ML
technique which is generally used for classification
problems. It can be used for both classification as well as
regression. k-NN method classifies documents based on
resemblance measurements which estimating the factors
such as distance and proximity, the similarity between
two data points is quantifie d and classifie d based on
neareexstamnepiglehboofrks-NofNeamchoddealtawphoicihnta.sFsiugmurees 3t(hde) schlooswenseasns  (| ) =  ( |)( )() , (6)
of two data points (similar data points). k-NN works
on the principle of feature similarity in order to predict where  (| ) indicates the posterior probability of
the values of new datapoints. Thus, the new data point class,  ( |) indicates likelihood probability of predictor
allocates a value based on the proximity as it matches given class, while P(A) refers to prior probability of class,
the data points in the training set. The steps involved in and P(B) refers to marginal probability or prior
probak-NN algorithm are as follows: (i) Load the training and bility of predictor. For building the prediction model
testing dataset. (ii) Select the value of k (integer) i.e. the using Naïve Bayes classifier , the model is categorized
closest data points. (iii) For each point in the test data, into three types: (i) Gaussian Naïve Bayes (GNB), (ii)
compute the distance between test data and each row Bernoulli Naïve Bayes (BNB), and (iii) Multinomial Naïve
of training data with the help of Euclidean or Hamming Bayes (MNB). Python library, Scikit learn is the most
distance and sort the distance values in ascending order. useful library that helps us to build a Naïve Bayes model
(iv) Select the top k rows from the sorted array. Next, in Python. We have the following three types of Naïve
allocate a class to the test point based on most frequent Bayes model under Scikit learn Python library.
class of these rows. (v) final output. GNB Classifier : It is based on the consideration that
k-NN algorithm can be implemented in Python by the data from each label is drawn from a simple
Gaususing the following approach: (i) importing necessary sian distribution. MNB Classifier : Here, the features are
python packages, (ii) download the Kaggle COVID-19 considered to be drawn from a simple Multinomial
disdataset, (iii) assign column names to the dataset, (iv) read tribution which is most suitable for the features that
represents discrete counts. BNB classifier : BNB consider</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.4. Naïve Bayes</title>
        <p>Naïve Bayes is a classification method based on bayes
theorem which works on the principle of strong
assumptions of conditional independence that the existence of a
feature in a class is independent to the existence of any
other feature in the same class. Let us consider an
example of smart 4K TV, a smart TV is considered into the
category of smart if covers the features such as Internet
connection, high definition, bluetooth, USB ports, HDMI
connectivity, support multiple applications. However,
these are dependent on each other but individual feature
contribute independently to the probability of the smart
4K TV is a smart TV. Naïve Bayes is a highly scalable
algorithm that can be certainly train on small dataset.</p>
        <p>Figure 3(e) shows an example of Naïve Bayes model that
classify the data points based on posterior probability
of class into three difer ent classes i.e., classifier 1 (red
data points), classifier 2 (orange data points), and
classiifer 3 (blue data points). The expression of Naïve Bayes
algorithm based on bayes theorem is define d as follows.
(a) Linear Regression Model
(b) Logistic Regression Model
(c) SVM Model
w1</p>
        <p>w2
X</p>
        <p>w3
(d) k-NN Model</p>
        <p>Total: 391</p>
        <p>cured
True: 391, False: 0</p>
        <p>Total: 426
high-sensitivity C-reactive protein
(hs-CRP) &lt; 41.2 mg I-1
1 0
lactic dehydrogenase (LDH) &lt; 365 U I-1</p>
        <p>Total: 600
1</p>
        <p>0
Total: 35
1</p>
        <p>0
lymphocytes &gt; 14.7 %
Total: 23
cured</p>
        <p>Total: 174</p>
        <p>death
True: 172, False: 2
Total: 12</p>
        <p>death</p>
        <p>True: 22, False: 1 True: 12, False: 0
True : number of correctly classified patients, False : number of misclassified patients
Total : number of patients in a dataset</p>
        <p>(f) Decision Tree based on three key features of COVID patient
(e) Naive Bayes Classifier
the features to be binary (0s and 1s). For example, text classifier are real-time prediction, multi-class prediction,
classification with ‘bag of words’ model. text classification.</p>
        <p>The steps involved in implementing the GNB classifier in
Python are as follows: (i) import the GNB packages un- 4.5. Decision Tree Induction Classifier
der Scikit learn Python library. (ii) obtain blobs of points
by using     _     () function of Scikit with Gaussian is a simple, easy understandable non parametric
classidistribution. (iii) for GNB model, we need to import ifer which is based on fle xible decision tree algorithm.
GaussianNB and make its object. (iv) perform predic- It can perform both classification and regression with
tion after obtaining some new data. (v) plot new data the help of algorithms used to formulate this model from
to find its boundaries. (vi) using line of codes compute the original dataset, unpremeditated selection of training
posterior probabilities of labels. (vii) output array. The data is accomplished. The steps to be involved in the
benefits of using Naïve Bayes classifier are fast and easy working of decision tree algorithm are as follows. (i)
implementation, less training data, converge faster than selection of random samples from a given dataset. (ii)
discriminative models like logistic regression, and suit- construct a decision tree for every sample and compute
able for both continuous as well as discrete data. The the prediction result from every decision tree. (iii) voting
limitations of Naïve Bayes classifier are zero frequency is done for every predicted result. (iv) choose the most
in case a variable is assigned with a category but not voted prediction result as the output of the prediction
being observed in training data set, then Naïve Bayes algorithm.
classifier set a zero probability and does not give a predic- The decision tree is implemented in Python by
ustion, feature independence as in real life application it is ing the following approaches. (i) importing necessary
dificult to have a set of features which are completely in- Python packages, (ii) download the Kaggle dataset, (iii)
dependent of each other. The applications of Naïve Bayes assign column names to the dataset, (iv) read dataset to
pandas dataframe, (v) perform data pre-processing by
disease is missed.
using script lines, (vi) divide the data into train and test
split (suppose, split the dataset into 70% training data and</p>
        <p>Accuracy (  ): The accuracy in a given datasets with
data points (TP + TN) is the ratio of total correct
predic30% of testing data), (vii) train the decision tree model
tions by the classifier to the total data points. The value
of   lies between 0 and 1.</p>
        <p>=</p>
        <p>(  +   )
(  +   +   +   )
time-consuming in comparison to other prediction mod- tions.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Prediction Models Performance</title>
    </sec>
    <sec id="sec-7">
      <title>Evaluation</title>
      <p>The performance of prediction models can be assessed
using a variety of metrics listed as follows:
(1) H-measure, (2) Gini-Index, (3) Area Under Curve
(AUC),  (4) Area Under the convex Hull of the ROC Curve
(AUCH), (5) Kolmogorov-Smirnof statistic (KS), (6) Min- threshold value act as a free parameter.
imum Error Rate (MER), (7) Minimum Cost Weighted</p>
      <sec id="sec-7-1">
        <title>MWL: It is related to the KS statistics. Here, cost guides</title>
      </sec>
      <sec id="sec-7-2">
        <title>Error Rate (MWL), (8) Specificity when Sensitivity is held the threshold value in this measure. ifxe d at 95% (Spec.Sens95), (9) Sensitivity when Speci</title>
        <p>
          Specificity
and Sensitivity: True Positive Rate (TPR)
ifcity
is held fixe d at 95% (Sens.Spec95), and (
          <xref ref-type="bibr" rid="ref22">10</xref>
          ) Error
or Sensitivity (Sens), and True Negative Rate (TNR), or
        </p>
      </sec>
      <sec id="sec-7-3">
        <title>Rate (ER).</title>
      </sec>
      <sec id="sec-7-4">
        <title>H-measure: H-measure is an important measure of</title>
        <p>classification</p>
        <p>performance that measures the accuracy
of the model. The primary statistics of interest are the
so-called mis-classification
counts, i.e., the number of
False Negatives (FN) and False Positives (FP). There are
four scenarios in prediction modeling. (i) True positives
(TP): In case of true positives (TP), actuals are positives
and are predicted as positives. (ii) False positives (FP):
In case of false positives (FP), actuals are negatives and
are predicted as positives. (iii) False negatives (FN): In
case of false negatives (FN), actuals are positives and are
predicted as negatives. (iv) True negatives (TN): In case
of true negatives, actuals are negatives and are predicted
as positives. An example of false positive is occurrences
where a disease is mistakenly diagnosed, and an example
of false negatives is occurrences where the presence of a
called Specificity (Spec.)
    =</p>
        <p>+  
,     . =</p>
        <p>+  
.</p>
        <p>(11)
Figure 7 computes the H measure by using fiv e
classiifers.</p>
      </sec>
      <sec id="sec-7-5">
        <title>The normalised cost is computed on X-axis. Let</title>
        <p>us assume that  ∈ [0, 1] denote the cost of
misclassifying a class 0 object as class 1 (FP), and 1 −  represensts
the cost of misclassifying a class 1 object as class 0 (FN).
This asymmetry can be seen to underlie the KS
statistic, which is a simple linear transformation of the MWL
when  =  1, 1 −  =  0. The severity ratio (SR) is define d
as the ratio between the two costs, where SR = 1 that
represents the symmetric costs.</p>
        <p>=
,              =</p>
        <p>(12)

where, the Y-axis represents the weighted cost. The
H-measure is computed for all the fiv e classifiers and
ifnally , the mean value of Severity Ratio (SR) is 1.12. We
pre-process the data to make the experimental data more
eficient and remove redundancy.</p>
        <sec id="sec-7-5-1">
          <title>5.1. Dataset</title>
          <p>To validate the performance of the proposed CURE scheme,
the dataset is being collected from the Kaggle COVID-19
patient pre-condition dataset [16]. The Kaggle dataset
is provided by the Johns Hopkins University through
Github repository which contains the real-time updated
record of the total active cases, death cases, recovered
cases of the COVID-19 pandemic. In the modern time of
advancement in technology and all rounded progress, to
make human beings as well as the medical science more
mentally and physically prepared and attentive, such
type of health issues or threatening disease will prove
very helpful and challenging. As per the reports
disclosed by World Health Organization (WHO), the health
curve (infectious cases and cured cases) remains
changing abruptly every day, it becomes burdensome for the
medical and other departments engaged in this kind act
to serve the world medical facilities and other necessary
things to make an estimate of total requirements of the
health related equipment’s and resources. It becomes
very helpful for the entire medical department and other
concerned authorities if the corona patients be
accommodated all the resources which will prove a blessing for
them to fight the lethal disease. In this context, the data
collected contains 23 features of 5,66,603 patients.</p>
        </sec>
        <sec id="sec-7-5-2">
          <title>5.2. Results and Discussion</title>
          <p>5.2.1. Missing Values
The implementation of the experimental results are
performed in Python. The results are computed based on
ifnding the missing values, heatmap function, feature
selection, and comparison of the machine learning mod- sented the complete dataset in Figure 5. It is drawn using
els. The discussion related to the results are summarized the heatmap function of python and capable to presenting
below. the diagrammatically view of the dataset. The
parameters of the COVID patients are considered on the X and
Y axis.</p>
          <p>The initial step is to find the missing values in the Kag- 5.2.3. Feature selection
gle dataset [16] and plot these missing values. Figure 4
visualized the histogram of the missing values in COVID As shown in Figure 6, We have selected 10 features among
dataset. As a substitute to these, we computed the mean 23 features from the COVID patient dataset. This
selecand replaced the missing value with its mean. The de- tion is being made by analyzing the features after
computfault input is a numeric array with levels 0 and 1, where ing the feature importance score in the form of Gini-index
the minimum value is 0 and the maximum value is 1. through the implementation of decision tree method.
5.2.2. Heatmap Representation
As the Kaggle COVID-19 dataset, we collected does not
contain any missing or redundant value, so we
repre5.2.4. Machine Learning Model
As discussed in the CURE scheme, the machine
models are being used on the pre-processed data. However,
there are difer ent methods to enhance the performance
of the prediction models which dependent on the
technique involved. One such technique is to construct the toms of the coronavirus. Next, the collected data is first
ensemble models in order to obtain a score for a partic- trained and then tested using difer ent machine learning
ular outcome, we can start integrating them to produce prediction models (such as SVM, LR, k-NN, , and Naive
ensemble scores. Figure 7 computes H-measure of en- Bayes) that classify the features of the COVID patient
sembled model which can be used to improve the area for forecasting of infection rate. Finally, the performance
under the curve for these models even further. Let us of the prediction models are assessed using a variety
assume, a decision tree classifier and a logistic regression of metrics listed as follows: (1) H-measure, (2) Gini
Inmodel, both predicting standard risks. A new score can dex, (3) Area Under Curve (AUC), AUCH, KS, Minimum
be calculated as the average of these two classifiers and Error Rate (MER), Minimum Cost Weighted Error Rate
then assess it as a further model. Usually the area under (MWL), Spec.Sens95, Sens.Spec95, Error Rate (ER). The
the curve improves for these ensemble models. performance evaluation shows that the CURE scheme</p>
          <p>After experimentation, the results are computed in outperforms the existing approach which deals with
imTable 1. balanced dataset.</p>
          <p>In future, we will ensure the secrecy of the corona
6. Conclusion virus data as the patients sensitive credentials can be
leaked during data transmission through wireless
chanIn this paper, a CURE scheme is proposed based on ma- nels (Internet). 
chine learning prediction models for the treatment of the
COVID patients through remote e-heathcare. The
performance analysis of the proposed scheme is evaluated References
on Python platform which is tested using Kaggle dataset [1] Punn, Narinder Singh, Sanjay Kumar Sonbhadra,
from Johns Hopkins University on COVID-19 patient and Sonali Agarwal. ”COVID-19 Epidemic
pre-condition. Then, the features are extracted from the Analysis using Machine Learning and Deep
datasets of the COVID patient for diagnosing the
symp</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Learning</given-names>
            <surname>Algorithms</surname>
          </string-name>
          ” medRxiv (
          <year>2020</year>
          ),
          <article-title>doi: of MERS in the USA</article-title>
          .
          <source>” Journal of Public Health</source>
          <volume>39</volume>
          , no.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          https://doi.org/10.1101/
          <year>2020</year>
          .04.08.20057679. 2 (
          <year>2017</year>
          ):
          <fpage>282</fpage>
          -
          <lpage>289</lpage>
          . [2]
          <string-name>
            <surname>Jamshidi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lalbakhsh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talla</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peroutka</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , [
          <volume>13</volume>
          ]
          <string-name>
            <surname>Hamzah</surname>
            ,
            <given-names>FA</given-names>
          </string-name>
          <string-name>
            <surname>Binti</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Nazri</surname>
            ,
            <given-names>D. V.</given-names>
          </string-name>
          <string-name>
            <surname>Ligot</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Mirmozafari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Dehghani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            and
            <surname>Sabet</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          ”
          <article-title>Ar- COVID-19 outbreak data analysis and prediction</article-title>
          .” Bull
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>tificial Intelligence and COVID-19: Deep Learning World Health Organ 1</source>
          (
          <year>2020</year>
          ):
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>Approaches for Diagnosis and Treatment</article-title>
          ” IEEE Ac- [14]
          <string-name>
            <surname>Jia</surname>
            , Lin,
            <given-names>Kewen</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Yu</given-names>
          </string-name>
          <string-name>
            <surname>Jiang</surname>
          </string-name>
          , and Xin Guo. ”Predic-
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>cess</surname>
          </string-name>
          , vol.
          <volume>8</volume>
          , pp.
          <fpage>109581</fpage>
          -
          <lpage>109595</lpage>
          , Jun.
          <source>2020. tion and analysis of Coronavirus Disease</source>
          <year>2019</year>
          .” arXiv [3]
          <string-name>
            <surname>Yan</surname>
          </string-name>
          , Li,
          <string-name>
            <surname>Hai-Tao</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Yang Xiao, Maolin Wang, preprint arXiv:
          <year>2003</year>
          .
          <volume>05447</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Chuan</given-names>
            <surname>Sun</surname>
          </string-name>
          , Jing Liang,
          <string-name>
            <given-names>Shusheng</given-names>
            <surname>Li</surname>
          </string-name>
          et al. ”
          <source>Prediction</source>
          [15]
          <string-name>
            <surname>Tuli</surname>
          </string-name>
          , Shreshth, Shikhar Tuli, Rakesh Tuli, and Sukh-
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>of survival for severe Covid-19 patients with three pal Singh Gill. ”Predicting the Growth and Trend of</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>clinical features: development of a machine learning-</article-title>
          <source>COVID-19 Pandemic using Machine Learning and</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>based prognostic model with clinical data in Wuhan” Cloud Computing</article-title>
          .” Internet of Things (
          <year>2020</year>
          ):
          <fpage>100222</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>medRxiv</surname>
          </string-name>
          (
          <year>2020</year>
          ). [16]
          <string-name>
            <surname>”</surname>
            <given-names>COVID-</given-names>
          </string-name>
          <article-title>19 patient pre-condition dataset”</article-title>
          , [4] ”COVID-19
          <string-name>
            <surname>Worldwide</surname>
            <given-names>Dashboard - WHO</given-names>
          </string-name>
          <year>2020</year>
          . Online Available: https://www.kag-
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>Live World Statistics” Online available: gle.com/tanmoyx/covid19-patient-precondition-</mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          https://covid19.who.int/,
          <source>accessed on 31 July</source>
          , dataset/notebooks
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <year>2020</year>
          . [5]
          <string-name>
            <surname>Rehman</surname>
          </string-name>
          , Suriya, Tariq Majeed, Mohammad Azam
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          Suhaimi. ”
          <article-title>Current scenario of COVID-</article-title>
          19 in pediatric
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          response.”
          <source>Saudi Journal of Biological Sciences</source>
          (
          <year>2020</year>
          ). [6]
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>Thanh</given-names>
          </string-name>
          <string-name>
            <surname>Thi</surname>
          </string-name>
          . ”
          <article-title>Artificial intelligence in the</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>battle against coronavirus (COVID-19): a survey and</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>future research directions.” Preprint</source>
          , DOI
          <volume>10</volume>
          (
          <year>2020</year>
          ). [7] Zhang, Jian, and
          <string-name>
            <given-names>Yiming</given-names>
            <surname>Yang</surname>
          </string-name>
          . ”Robustness of regu-
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          tion.”
          <source>In Proceedings of the 26th annual international</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>in informaion retrieval</source>
          , pp.
          <fpage>190</fpage>
          -
          <lpage>197</lpage>
          .
          <year>2003</year>
          . [8]
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>Yuxuan.</given-names>
          </string-name>
          ”
          <article-title>An improved KNN text classification</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <article-title>algorithm based on K-medoids and rough set</article-title>
          .” In 2018
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>10th International Conference on Intelligent Human-</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>Machine Systems and Cybernetics (IHMSC)</source>
          , vol.
          <volume>1</volume>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          pp.
          <fpage>109</fpage>
          -
          <lpage>113</lpage>
          . IEEE,
          <year>2018</year>
          . [9]
          <string-name>
            <surname>Samuel</surname>
            , Jim,
            <given-names>G. G.</given-names>
          </string-name>
          <string-name>
            <surname>Ali</surname>
          </string-name>
          , Md Rahman, Ek Esawi, and
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Yana</given-names>
            <surname>Samuel</surname>
          </string-name>
          . ”Covid-19
          <source>public sentiment insights and</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>tion</surname>
          </string-name>
          , vol.
          <volume>11</volume>
          , no.
          <issue>6 Jun.</issue>
          (
          <year>2020</year>
          ). [10]
          <string-name>
            <surname>Pinter</surname>
          </string-name>
          , Gergo, Imre Felde, Amir Mosavi, Pedram
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Ghamisi</surname>
          </string-name>
          , and Richard Gloaguen. ”COVID-19 Pan-
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Learning</given-names>
            <surname>Approach</surname>
          </string-name>
          .
          <source>” Mathematics</source>
          , vol.
          <volume>8</volume>
          , no. 6
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          (
          <year>2020</year>
          ):
          <fpage>890</fpage>
          . [11]
          <string-name>
            <surname>Yan</surname>
          </string-name>
          , Li,
          <string-name>
            <surname>Hai-Tao</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Yang Xiao, Maolin Wang,
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <article-title>of criticality in patients with severe Covid-</article-title>
          19
          <fpage>infec</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>MedRxiv</surname>
          </string-name>
          (
          <year>2020</year>
          ). [12]
          <string-name>
            <surname>Lin</surname>
          </string-name>
          , Leesa, Rachel F.
          <article-title>McCloud, Cabral A</article-title>
          . Bigman,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>