-

CURE: An Efe ctive COVID-19 Remedies based on Machine Learning Prediction Models

Poonam Phogat

poonamphogat07@gmail.com 1 2

Rajat Chaudhary

rajat@biet.ac.in 0 2

COVID-19, Machine Learning, Prediction Model

0 Computer Science & Engineering, Bharat Institute of Engineering & Technology , Hyderabad, Telangana , India 1 Computer Science & Engineering, SGT University , Gurugram, Haryana , India 2 Workshop Proce dings

415 424

Coronavirus disease (COVID-19) is a severe pandemic infectious virus that enters into healthy cells of a living body. COVID-19 virus makes copies in the organs of the host body by multiplying itself which ultimately leads to the death of some healthy cells and therefore weakens the immune system. In a mild stage, it mainly afe cts the respiratory tract and leads to pneumonia, organ failure, and death reaching the last stage. This paper focused on the early detection of the COVID-19 patient based on the positive symptoms of the disease. In this paper, the COVID-19 Remedies (CURE) scheme is proposed based on machine learning prediction models for the treatment of COVID patients. For experimental results, the performance analysis of the CURE scheme is evaluated on the Python platform which is tested using the Kaggle dataset from Johns Hopkins University.

COVID-19

1. Introduction

ratory syndrome coronavirus-2 (COVID-2) that was first The virus that induces COVID-19 is a severe acute respi- globally. diagnosed in late December 2019 during an investigation into an outbreak in Wuhan, China. As the cases were increasing rapidly throughout the world, the WHO declared the disease pandemic on March 11, 2020. Currently, the transmission of COVID-19 becomes uncontrollable because the number of cases has reached the threshold limit [1]. The virus enters into healthy cells of a living body and makes copies in the organs of the host body by multiplying itself which ultimately leads to the death of some healthy cells and therefore weakens the immune system. In a mild stage, it mainly afe cts the respiratory tract and leads to pneumonia, organ failure, and death reaching the last stage [2]. The disease is prominent in old age people with a weak immune system and already having other primitive diseases like diabetes, high blood pressure, cardiovascular and respiratory diseases [3].

Figure 1 shows the global statistics till July 30, 2020, on the total confirme d cases, active cases, total ure 1(a) presents the total number of coronavirus cases across difer ent countries which shows that the virus is spreading rapidly with the highest cases in the USA followed by India. The total confirme d positive cases across the world are 2,18,69,976 out of which 26,47,663 cases Chaudhary)

0000-0002-6554-918X (R. Chaudhary) are of India. Figure 1(b) shows the statistics of the active cases, where there are 65,04,303 active cases occurred

Figure 1(c) presents the total death cases, and finally , Figure 1(d) shows the total cured cases [4]. This is a communication spreading virus that spreads through respiratory droplets present in the air. These aerosols come to an open environment when an infected person sneezes and coughs and enter in other persons through the mouth and nostrils and reach to lungs. There is no precise treatment to cure COVID-19. Some steps are being taken to eliminate the virus using difer ent medicines like Hydroxychloroquine which is an antimalarial antibiotic. Currently, it is used to treat coronavirus patients, it helps in inhibition of infection by increasing the endosomal pH which provides enough strength to the immune system to fight against the viral disease [5].

Some preventions are necessary for the treatment of

this pandemic. From the very beginning of COVID-19, the government of almost all the countries has taken strict actions such as complete lockdown, social distancing, use of sanitizer, and masks to reduce all the causLearning seems to be the best prediction model for forecasting the increasing COVID-19 infected cases. Regression and classification

approach of ML work according to the availability of data to diagnose this problem.

1.1. Contributions

The contributions of the paper are summarized below.

• Diagnose the symptoms of COVID-19 patients

based on the classification of the diseases. • To recover the COVID-19 patients, CURE scheme deaths, and total cured cases on the COVID-19 virus. Fig- ing elements [6]. By exploring various studies, Machine 25.08%

Active Cases

Total Cases

Total Deaths

Total Cured 50,921 6.6% 7,73,741

(d) 19,19,842 13.3% 1,45,91,932

India India's Share (%)

World 6,76,900 10.4% 65,04,303

2. Literature Review

The researchers introduce some methods of Machine • Finally, the performance evaluation is compared Learning for classification. The easiest classification is with the fiv e classifiers and predicts the most efi- the Linear Regression method which is used to reduce cient outcome using the Python platform. the sum of squared difer ences between real and predicted data. The drawbacks of this model are its non1.2. Paper Organization efe ctiveness with non-alignment data and sensitiveness to deviation [7]. Through the Logistic Regression Model, The rest of the paper is structured as follows: Section II it is shown that the contingency of conclusion is Logistic discusses the literature review of the existing schemes. function-based. The positiveness of this model is that it Section III presents the system model followed by the pro- is free of complications. But it fails to assume linearity. posed CURE scheme in Section IV. Section V comprises By Naive Bayes Model, it is proposed that it confine d training data to calculate inevitable parameters and efi- COVID-19 caciously deals with real-world data. One another model Dataset Training Dataset Trained Model [Km9-o]NP.dieneastreterdsetattNaael.aign[1hd0br]oepulrerovspahonostwewds iMtthhaacmthiuti nwlteio-Lcreklaassresnficipinergnoatblplyepmroswa[ci8thh]e, s (FDpeDiraaoSatgcutynaermoesPpssSretienoe-CgmleOsc)tVioIDn 43521P.....kLSNRre-iVaanNdinMeivNcdaetorioBmRnaeyFmgeorosredesessltison 345621P......AAKHGMerSUU-iEfnMoCCRireImHnadasenuxcree Metrics prCepdoaeim(cnrOftapoioularynmtrspisimausonto)oncdfeoefls. multi-layered perceptron-imperialist competitive algo- 987... SSMepWneLsc..SSpeencs9955 rithm (MLP-ICA) and adaptive network-based fuzzy inter- 10. ER ference system (ANFIS) for prediction of the COVID-19 confirme d positive and death cases. This model is used Figure 2: Workflow of the proposed CURE Scheme for the to maintain accuracy for the next 9 days which gives the treatment of COVID patient. reassuring results [11]. The government and the public have to appreciate the researchers and help in lowering the data by maintaining social distancing and following symptoms of COVID-19 patients. other precautions [12]. Hamzeh et al. [13] works on Problem is heightened with the unbalancing of data. In Susceptible-Exposed-Infectious-Recovered (SEIR) model medical data the class imbalance problem is frequent which predicts that it performs well on moderate data. which occurs with the dominancy of more cases of some The outbreak of this infectious disease may cause varia- classes over others. To handle the imbalanced dataset, tions in the data prediction. several elucidations are appropriate at both algorithmic

Jia et al. [14] defines four stages for COVID-19 cases. and data level. In this paper, the performances of 5 classiIn the first stage, there comes travel history of a person ifers and regressions are compared on imbalanced dataset having COVID-19 symptoms which leads to lockdown. which is obtained while studying on the prediction of When the infected person comes in contact with other COVID-19. On the bases of attainment of these regrespersons, the virus reached in the second stage. To pre- sion and classifiers, impact of SMOTE (Synthetic Minorvent the increasing data social distancing is applied. Next, ity Oversampling Technique) - an approach which deals the third stage in which there is neither travel history with imbalanced dataset, is thoroughly evaluated. nor contact with an infected person. So the chances of With the comfort of the algorithms used in this method, viral spreading through the respiratory droplets become k samples are finding out which are in proximity to high. Hence, the use of masks and sanitizers is neces- the minority samples in minority classes and standard sary. The next and last stage is an uncontrollable stage Euclidean distance method is used to attain this diswhere the cases reached the threshold limit. Tuli et al. tance. With the number of cases in minority and ma[15] improved COVID-19 prediction by using a model of jority classes, imbalanced dataset is taken. Based on the Machine Learning. In this model data-driven approach independent variable, the original dataset is partitioned is used to help the government and the public. After cov- into two sets – training set (80%) and test sets (20%) usering data with ML and AI, researchers can forecast the ing stratifie d random sampling. By applying SMOTE time scale and regions where the possibility of spread- technique, training set is over samples to find out the dising of this disease is maximum. This is predicted that tribution of class suited best to the dataset and 8 training using difer ent models of ML, COVID-19 cases can be sets obtain among which 1 is original set other than 7 controlled or eliminated from all the countries of the over sampled set having difer ent rates. world which are facing this critical situation.

3. System Model 4. Proposed CURE Scheme

The proposed CURE scheme uses wide range of methods Figure 2 presents the workflo w of the proposed CURE and tools are used for prediction. With the combination scheme for the treatment of COVID patient. Initially, the of difer ent models- SVM (Support Vector Machine), LR input is the dataset that is taken from Johns Hopkins Uni- (Linear Regression), k-NN (k- Nearest Neighbors), Clasversity dataset. Then the symptoms of positive cases are sification Naïve Bayes and R tool, a machine learning analyzed which are categorized into 3 sub-parts: Severe, model is proposed for forecasting of COVID-19 infection Moderate, and Mild symptoms. A patient having severe rate. Collected Dataset is cleaned before further processsymptoms which includes throttling must face a harsh ing and is considered as first step in knowledge discovery period. Moderate symptoms include shortness of breath, in databases. For written characters classification probfever, cough. Mild symptoms include fever, cough, and lems this data cleansing process is applied using Machine headache. The proposed scheme for COVID-19 outbreak Learning techniques. The process that implements methanalysis is trained and tested on real-time data using the ods to detect missing and incorrect data, error correction and explore data bases is called data cleaning in which using mean squared error (MSE). The pros of using LR are reassembling and disintegrating of data is involved. Data easy, simple implementation, fast training, regularized cleansing is practiced on numerous merged data bases in to avoid over fitting, easily updated with new data using which appearance of duplicate records takes place. Four gradient descent. The disadvantages of LR model is that it dimensional qualities are proposed which includes cerperforms poorly for non-linear relationships, not fle xible tainty, correctness, integrity and consistency.

to capture complex patterns, polynomials can be time Primary symptoms of this disease include loss of taste consuming. However to generate a discrete output i.e., 0 and smell, headache, fever, dizziness, tiredness and shortor 1, the logistic regression (binary classification) model is ness of breath. Since seriousness, symptoms are clasused. Figure 3(b) shows an example of Logistic regression sifie d into three categories i.e. mild, moderate, and sewhich calculates the aggregate sum of the input variables vere. Mild symptoms possess fever, cough, headache. similar to LR model but it runs the output through nonThe frequency of seriousness is low at this stage. Then linear sigmoidal function to generate the output. LR is the most usable statistical technique for predic- into classes to calculate maximum marginal hyperplane. tive analysis in Machine Learning. Based on supervised learning, Linear regression is a Machine Learning algo- the classes based on that SVM select the hyperplane that Initially, SVM find hyperplanes iteratively that isolate rithm which performs a regression task. LR prediction model use the given data points to obtain the optimal ficiently ift line to train the dataset. A simple equation of a line is = + , where is a dependent variable, is independent variable, and , are constant whose values are computed by using the calculus theories. Figure 3(a) shows an example of LR prediction model that consider the features as input and predict a continuous output divides the classes in best way. SVM can perform efon non-linear classification

while performing linear classification.

With dimensional spaces and the cases having number of dimensions greater than number of samples, it is extremely efe ctive. SVM tranform the input vector to n-dimensional space known as a feature space (f) by using non-linear function then a linear function of linear regression is performed to space. It is comes the moderate stage in which shortness of breath is the main symptom along with high fever and cough.

In severe stage, the patient reach into critical situation and becomes profoundly serious. Respiratory problem is the main problem the patients must face. The virus mainly afe cts the lungs which damages alveoli responsible for supply of oxygen to all parts of body through blood vessels and RBCs, respectively. The virus damages the alveolus wall and results into its thickening due to which transfer of oxygen to RBCs lowers down which ultimately leads to hypoxia. Due to insuficient

intake of oxygen, chances of organ failure remain high. Collected data is first trained and then tested using difer ent models - SVM (Support Vector Machine), LR (Linear Regression), k-NN (k- Nearest Neighbors), Classification

and Naïve Bayes. The explanation of these prediction methods are listed below.

4.1. Linear Regression (LR)

SVM is a supervised ML algorithm used for both classification

and regression. An example of SVM classiifer

is shown in Figure 3(c) which is a representation of difer ent classes in a decision plane or hyperplane in n-dimensional space. In this figur e, support vector are the datapoints that are nearest to the hyperplane. These data points are divided into classes by using separating line ( 1, 2, 3). Here, a margin is define d as the gap or perpendicular distance from the line to the support vectors. The objective of SVM is to separate the datasets =

1 1 + − , where x is the input value, y is the output value of the model, and is exponential. LR prediction model can be implemented on Python. 4.2. Support Vector Machine method (SVM) as a result by obtaining a linear curve for a given prob- implemented in Python by using SVM kernels. The types lem. The output of LR model is computed by using the equation.

= 0 + 1 1 + , where 0 represents y intercept, 1 represents slope, 1 is the input value, represents error term, and is the output value of the model. Initially at the start of the training, is initialized randomly but we correct during the training specifie d to each feature such that the loss (deviation between the desired and predicted output) is minimized. The metric of loss is calculated by of SVM kernels are linear kernel, polynomial kernel, and raial bias function (RBF) kernel. (1)

Linear Kernel: It is the dot product between two ob

servations and the linear kernel function is define d by using the equation.

( , ) = ( ∗ ), where , are two vectors.

Polynomial Kernel: It discriminate curved or non-linear input space which is define d by using the equation. ( , ) = 1 + ( ∗ ) , (2) (3) (4) where lies between 0 and 1 which is set manually and its default value is 0.1.

The steps to be followed in implementing SVM classiifer for text classification are as follows: (i) import packages. (ii) load the input dataset. (iii) select features from the dataset. (iv) plot SVM boundaries with original data. (v) generate the values of regularization parameter. (vi) SVM classifier object are created by using kernel (linear, polynomial, RBF). (vii) text final output is the text classification. The advantage of using SVM classifiers are high accuracy with multi-dimensional space, stores very less memory and use a subset of training points. The disadvantage of SVM classifiers is that the performance of SVM does not scale for larger datasets due to high training time, and does not perform good with overlapping classes. Thus, decision tree are usually preferred over SVM for large datasets.

4.3. k-NN (k-Nearest Neighbors)

where is the degree of polynomial which is manually dataset to pandas dataframe, (v) perform data preprocessset in the learning algorithm. ing, (vi) split the data into train and test dataset (60%

Radial Bias Function (RBF) Kernel: It transform input training data and 40% of testing data), (vii) perform data space into multi-dimensional space which is define d by scaling, (viii) train the model using K-nearest neighbors using the equation. classifier class of sklearn, (ix) obtain prediction, (x) output results- confusion matrix, classification report, and ( , ) = (− ∗ ( ∗ )2), (5) accuracy. The benefits of k-NN algorithms are simple, useful for nonlinear data, high accuracy. The limitations of k-NN algorithm is that it is costly algorithm as it stores all the training data. In addition, it requires more memory storage, and prediction is slow in case of large dataset. k-nearest neighbors (k-NN) algorithm is supervised ML technique which is generally used for classification problems. It can be used for both classification as well as regression. k-NN method classifies documents based on resemblance measurements which estimating the factors such as distance and proximity, the similarity between two data points is quantifie d and classifie d based on neareexstamnepiglehboofrks-NofNeamchoddealtawphoicihnta.sFsiugmurees 3t(hde) schlooswenseasns (| ) = ( |)( )() , (6) of two data points (similar data points). k-NN works on the principle of feature similarity in order to predict where (| ) indicates the posterior probability of the values of new datapoints. Thus, the new data point class, ( |) indicates likelihood probability of predictor allocates a value based on the proximity as it matches given class, while P(A) refers to prior probability of class, the data points in the training set. The steps involved in and P(B) refers to marginal probability or prior probak-NN algorithm are as follows: (i) Load the training and bility of predictor. For building the prediction model testing dataset. (ii) Select the value of k (integer) i.e. the using Naïve Bayes classifier , the model is categorized closest data points. (iii) For each point in the test data, into three types: (i) Gaussian Naïve Bayes (GNB), (ii) compute the distance between test data and each row Bernoulli Naïve Bayes (BNB), and (iii) Multinomial Naïve of training data with the help of Euclidean or Hamming Bayes (MNB). Python library, Scikit learn is the most distance and sort the distance values in ascending order. useful library that helps us to build a Naïve Bayes model (iv) Select the top k rows from the sorted array. Next, in Python. We have the following three types of Naïve allocate a class to the test point based on most frequent Bayes model under Scikit learn Python library. class of these rows. (v) final output. GNB Classifier : It is based on the consideration that k-NN algorithm can be implemented in Python by the data from each label is drawn from a simple Gaususing the following approach: (i) importing necessary sian distribution. MNB Classifier : Here, the features are python packages, (ii) download the Kaggle COVID-19 considered to be drawn from a simple Multinomial disdataset, (iii) assign column names to the dataset, (iv) read tribution which is most suitable for the features that represents discrete counts. BNB classifier : BNB consider

4.4. Naïve Bayes

Naïve Bayes is a classification method based on bayes theorem which works on the principle of strong assumptions of conditional independence that the existence of a feature in a class is independent to the existence of any other feature in the same class. Let us consider an example of smart 4K TV, a smart TV is considered into the category of smart if covers the features such as Internet connection, high definition, bluetooth, USB ports, HDMI connectivity, support multiple applications. However, these are dependent on each other but individual feature contribute independently to the probability of the smart 4K TV is a smart TV. Naïve Bayes is a highly scalable algorithm that can be certainly train on small dataset.

Figure 3(e) shows an example of Naïve Bayes model that classify the data points based on posterior probability of class into three difer ent classes i.e., classifier 1 (red data points), classifier 2 (orange data points), and classiifer 3 (blue data points). The expression of Naïve Bayes algorithm based on bayes theorem is define d as follows. (a) Linear Regression Model (b) Logistic Regression Model (c) SVM Model w1

w2 X

w3 (d) k-NN Model

Total: 391

cured True: 391, False: 0

Total: 426 high-sensitivity C-reactive protein (hs-CRP) < 41.2 mg I-1 1 0 lactic dehydrogenase (LDH) < 365 U I-1

Total: 600 1

0 Total: 35 1

0 lymphocytes > 14.7 % Total: 23 cured

Total: 174

death True: 172, False: 2 Total: 12

death

True: 22, False: 1 True: 12, False: 0 True : number of correctly classified patients, False : number of misclassified patients Total : number of patients in a dataset

(f) Decision Tree based on three key features of COVID patient (e) Naive Bayes Classifier the features to be binary (0s and 1s). For example, text classifier are real-time prediction, multi-class prediction, classification with ‘bag of words’ model. text classification.

The steps involved in implementing the GNB classifier in Python are as follows: (i) import the GNB packages un- 4.5. Decision Tree Induction Classifier der Scikit learn Python library. (ii) obtain blobs of points by using _ () function of Scikit with Gaussian is a simple, easy understandable non parametric classidistribution. (iii) for GNB model, we need to import ifer which is based on fle xible decision tree algorithm. GaussianNB and make its object. (iv) perform predic- It can perform both classification and regression with tion after obtaining some new data. (v) plot new data the help of algorithms used to formulate this model from to find its boundaries. (vi) using line of codes compute the original dataset, unpremeditated selection of training posterior probabilities of labels. (vii) output array. The data is accomplished. The steps to be involved in the benefits of using Naïve Bayes classifier are fast and easy working of decision tree algorithm are as follows. (i) implementation, less training data, converge faster than selection of random samples from a given dataset. (ii) discriminative models like logistic regression, and suit- construct a decision tree for every sample and compute able for both continuous as well as discrete data. The the prediction result from every decision tree. (iii) voting limitations of Naïve Bayes classifier are zero frequency is done for every predicted result. (iv) choose the most in case a variable is assigned with a category but not voted prediction result as the output of the prediction being observed in training data set, then Naïve Bayes algorithm. classifier set a zero probability and does not give a predic- The decision tree is implemented in Python by ustion, feature independence as in real life application it is ing the following approaches. (i) importing necessary dificult to have a set of features which are completely in- Python packages, (ii) download the Kaggle dataset, (iii) dependent of each other. The applications of Naïve Bayes assign column names to the dataset, (iv) read dataset to pandas dataframe, (v) perform data pre-processing by disease is missed. using script lines, (vi) divide the data into train and test split (suppose, split the dataset into 70% training data and

Accuracy ( ): The accuracy in a given datasets with data points (TP + TN) is the ratio of total correct predic30% of testing data), (vii) train the decision tree model tions by the classifier to the total data points. The value of lies between 0 and 1.

( + ) ( + + + ) time-consuming in comparison to other prediction mod- tions.

5. Prediction Models Performance Evaluation

The performance of prediction models can be assessed using a variety of metrics listed as follows: (1) H-measure, (2) Gini-Index, (3) Area Under Curve (AUC), (4) Area Under the convex Hull of the ROC Curve (AUCH), (5) Kolmogorov-Smirnof statistic (KS), (6) Min- threshold value act as a free parameter. imum Error Rate (MER), (7) Minimum Cost Weighted

MWL: It is related to the KS statistics. Here, cost guides Error Rate (MWL), (8) Specificity when Sensitivity is held the threshold value in this measure. ifxe d at 95% (Spec.Sens95), (9) Sensitivity when Speci

Specificity and Sensitivity: True Positive Rate (TPR) ifcity is held fixe d at 95% (Sens.Spec95), and ( 10 ) Error or Sensitivity (Sens), and True Negative Rate (TNR), or

Rate (ER). H-measure: H-measure is an important measure of

classification

performance that measures the accuracy of the model. The primary statistics of interest are the so-called mis-classification counts, i.e., the number of False Negatives (FN) and False Positives (FP). There are four scenarios in prediction modeling. (i) True positives (TP): In case of true positives (TP), actuals are positives and are predicted as positives. (ii) False positives (FP): In case of false positives (FP), actuals are negatives and are predicted as positives. (iii) False negatives (FN): In case of false negatives (FN), actuals are positives and are predicted as negatives. (iv) True negatives (TN): In case of true negatives, actuals are negatives and are predicted as positives. An example of false positive is occurrences where a disease is mistakenly diagnosed, and an example of false negatives is occurrences where the presence of a called Specificity (Spec.) =

+ , . =

+ .

(11) Figure 7 computes the H measure by using fiv e classiifers.

The normalised cost is computed on X-axis. Let

us assume that ∈ [0, 1] denote the cost of misclassifying a class 0 object as class 1 (FP), and 1 − represensts the cost of misclassifying a class 1 object as class 0 (FN). This asymmetry can be seen to underlie the KS statistic, which is a simple linear transformation of the MWL when = 1, 1 − = 0. The severity ratio (SR) is define d as the ratio between the two costs, where SR = 1 that represents the symmetric costs.

= , =

(12) where, the Y-axis represents the weighted cost. The H-measure is computed for all the fiv e classifiers and ifnally , the mean value of Severity Ratio (SR) is 1.12. We pre-process the data to make the experimental data more eficient and remove redundancy.

5.1. Dataset

To validate the performance of the proposed CURE scheme, the dataset is being collected from the Kaggle COVID-19 patient pre-condition dataset [16]. The Kaggle dataset is provided by the Johns Hopkins University through Github repository which contains the real-time updated record of the total active cases, death cases, recovered cases of the COVID-19 pandemic. In the modern time of advancement in technology and all rounded progress, to make human beings as well as the medical science more mentally and physically prepared and attentive, such type of health issues or threatening disease will prove very helpful and challenging. As per the reports disclosed by World Health Organization (WHO), the health curve (infectious cases and cured cases) remains changing abruptly every day, it becomes burdensome for the medical and other departments engaged in this kind act to serve the world medical facilities and other necessary things to make an estimate of total requirements of the health related equipment’s and resources. It becomes very helpful for the entire medical department and other concerned authorities if the corona patients be accommodated all the resources which will prove a blessing for them to fight the lethal disease. In this context, the data collected contains 23 features of 5,66,603 patients.

5.2. Results and Discussion

5.2.1. Missing Values The implementation of the experimental results are performed in Python. The results are computed based on ifnding the missing values, heatmap function, feature selection, and comparison of the machine learning mod- sented the complete dataset in Figure 5. It is drawn using els. The discussion related to the results are summarized the heatmap function of python and capable to presenting below. the diagrammatically view of the dataset. The parameters of the COVID patients are considered on the X and Y axis.

The initial step is to find the missing values in the Kag- 5.2.3. Feature selection gle dataset [16] and plot these missing values. Figure 4 visualized the histogram of the missing values in COVID As shown in Figure 6, We have selected 10 features among dataset. As a substitute to these, we computed the mean 23 features from the COVID patient dataset. This selecand replaced the missing value with its mean. The de- tion is being made by analyzing the features after computfault input is a numeric array with levels 0 and 1, where ing the feature importance score in the form of Gini-index the minimum value is 0 and the maximum value is 1. through the implementation of decision tree method. 5.2.2. Heatmap Representation As the Kaggle COVID-19 dataset, we collected does not contain any missing or redundant value, so we repre5.2.4. Machine Learning Model As discussed in the CURE scheme, the machine models are being used on the pre-processed data. However, there are difer ent methods to enhance the performance of the prediction models which dependent on the technique involved. One such technique is to construct the toms of the coronavirus. Next, the collected data is first ensemble models in order to obtain a score for a partic- trained and then tested using difer ent machine learning ular outcome, we can start integrating them to produce prediction models (such as SVM, LR, k-NN, , and Naive ensemble scores. Figure 7 computes H-measure of en- Bayes) that classify the features of the COVID patient sembled model which can be used to improve the area for forecasting of infection rate. Finally, the performance under the curve for these models even further. Let us of the prediction models are assessed using a variety assume, a decision tree classifier and a logistic regression of metrics listed as follows: (1) H-measure, (2) Gini Inmodel, both predicting standard risks. A new score can dex, (3) Area Under Curve (AUC), AUCH, KS, Minimum be calculated as the average of these two classifiers and Error Rate (MER), Minimum Cost Weighted Error Rate then assess it as a further model. Usually the area under (MWL), Spec.Sens95, Sens.Spec95, Error Rate (ER). The the curve improves for these ensemble models. performance evaluation shows that the CURE scheme

After experimentation, the results are computed in outperforms the existing approach which deals with imTable 1. balanced dataset.

In future, we will ensure the secrecy of the corona 6. Conclusion virus data as the patients sensitive credentials can be leaked during data transmission through wireless chanIn this paper, a CURE scheme is proposed based on ma- nels (Internet). chine learning prediction models for the treatment of the COVID patients through remote e-heathcare. The performance analysis of the proposed scheme is evaluated References on Python platform which is tested using Kaggle dataset [1] Punn, Narinder Singh, Sanjay Kumar Sonbhadra, from Johns Hopkins University on COVID-19 patient and Sonali Agarwal. ”COVID-19 Epidemic pre-condition. Then, the features are extracted from the Analysis using Machine Learning and Deep datasets of the COVID patient for diagnosing the symp

Learning

Algorithms ” medRxiv ( 2020 ), doi: of MERS in the USA . ” Journal of Public Health 39 , no.

https://doi.org/10.1101/ 2020 .04.08.20057679. 2 ( 2017 ): 282 - 289 . [2] Jamshidi , M. , Lalbakhsh , A. , Talla , J. , Peroutka , Z. , [ 13 ] Hamzah , FA

Binti , C.

Lau , H.

Nazri , D. V.

Ligot ,

L. , Mirmozafari , M. , Dehghani , M. and Sabet , A. ” Ar- COVID-19 outbreak data analysis and prediction .” Bull

tificial Intelligence and COVID-19: Deep Learning World Health Organ 1 ( 2020 ): 32 .

Approaches for Diagnosis and Treatment ” IEEE Ac- [14] Jia , Lin, Kewen

Li , Yu

Jiang , and Xin Guo. ”Predic-

cess , vol. 8 , pp. 109581 - 109595 , Jun. 2020. tion and analysis of Coronavirus Disease 2019 .” arXiv [3] Yan , Li, Hai-Tao

Zhang

, Yang Xiao, Maolin Wang, preprint arXiv: 2003 . 05447 ( 2020 ).

Chuan

Sun , Jing Liang,

Shusheng

Li et al. ” Prediction [15] Tuli , Shreshth, Shikhar Tuli, Rakesh Tuli, and Sukh-

of survival for severe Covid-19 patients with three pal Singh Gill. ”Predicting the Growth and Trend of

clinical features: development of a machine learning-

COVID-19 Pandemic using Machine Learning and

based prognostic model with clinical data in Wuhan” Cloud Computing .” Internet of Things ( 2020 ): 100222 .

medRxiv ( 2020 ). [16] ”

COVID-

19 patient pre-condition dataset” , [4] ”COVID-19 Worldwide

Dashboard - WHO

2020 . Online Available: https://www.kag-

Live World Statistics” Online available: gle.com/tanmoyx/covid19-patient-precondition-

https://covid19.who.int/, accessed on 31 July , dataset/notebooks

2020 . [5] Rehman , Suriya, Tariq Majeed, Mohammad Azam

Suhaimi. ” Current scenario of COVID- 19 in pediatric

response.” Saudi Journal of Biological Sciences ( 2020 ). [6] Nguyen , Thanh Thi . ” Artificial intelligence in the

battle against coronavirus (COVID-19): a survey and

future research directions.” Preprint , DOI 10 ( 2020 ). [7] Zhang, Jian, and

Yiming

Yang . ”Robustness of regu-

tion.” In Proceedings of the 26th annual international

in informaion retrieval , pp. 190 - 197 . 2003 . [8] Tan , Yuxuan. ” An improved KNN text classification

algorithm based on K-medoids and rough set .” In 2018

10th International Conference on Intelligent Human-

Machine Systems and Cybernetics (IHMSC) , vol. 1 ,

pp. 109 - 113 . IEEE, 2018 . [9] Samuel , Jim, G. G. Ali , Md Rahman, Ek Esawi, and

Yana

Samuel . ”Covid-19 public sentiment insights and

tion , vol. 11 , no. 6 Jun. ( 2020 ). [10] Pinter , Gergo, Imre Felde, Amir Mosavi, Pedram

Ghamisi , and Richard Gloaguen. ”COVID-19 Pan-

Learning

Approach . ” Mathematics , vol. 8 , no. 6

( 2020 ): 890 . [11] Yan , Li, Hai-Tao

Zhang

, Yang Xiao, Maolin Wang,

of criticality in patients with severe Covid- 19 infec -

MedRxiv ( 2020 ). [12] Lin , Leesa, Rachel F. McCloud, Cabral A . Bigman,