A Classification Approach to Fetal Cardiotocography Dataset using R R. Hephzibah a, A. Hepzibah Christinal a, R. Jayanthi a, S. Jebasingh a, D. Abraham Chandy b, Chandrajit Bajaj c a Department of Mathematics, Karunya University, Coimbatore, India b Department of Electronics and Communication Engineering, Karunya University, Coimbatore, India c Computational applied Mathematics chair in visualization, Institute for Computational Engineering and Sciences, University of Texas, Austin Abstract Electronic fetal heart monitoring to check fetal status during pregnancy is common. Cardiotocography is a technique for assisting obstetricians in obtaining clear details during the time of childbirth as a method of monitoring the health condition, especially in pregnant women who are at risk of difficulties. This paper deals with the classification of the fetal cardiotocography dataset using R. The supervised machine learning-based approach is applied for the categorization of fetal datasets. It is classified as normal, suspect, and pathologic based on the random forest classifier. It produces an accuracy of 99.94% in training and 93.57% in testing which is found to be a better performance. It also provides the best results in terms of sensitivity, and specificity in the classification of normal, suspect, and pathology in both training and testing datasets. It is found that this method provides a greater accuracy compared to all other methods. Keywords 1 Machine learning, Random Forest classifier, cardiotocography, fetal heart rate 1. Introduction Machine learning is an advancing field in the research of Engineering and computer science and various algorithms of machine learning play the main role in the medical field. It also helps in the computation of the image features which helps in the classification and better detection of diseases [1]. It helps in learning the empirical data and making decisions accurately using complex algorithms [2]. Supervised learning which includes regression, classification, and reinforcement learning is the general classification of machine learning. The clustering, blind source estimation, and density estimation come under supervised learning. and the information systems and the semi-supervised classification are part of semi-supervised learning [3]. In medical image processing, pixel-based machine learning is the evolving field that deals directly with the pixels or voxels of the images. It performs best in preventing the loss of information caused by improper segmentation or feature computations [4]. Machine learning libraries such as Torch is a freely available software library. There are different algorithms of machine learning such as support vector machine, Parzen windows, Adaboost K nearest Neighbours, Hidden Markov models, multi-layer perceptron, Bagging, Bayes classifiers, etc [5]. The Linear classifiers include Logical regression, Quadratic classifiers, Naive Bayes classifier, Perceptron, Quadratic classifiers, support vector machine, Boosting, Decision tree which aggregate random forest, Bayesian and Neural Networks that deal with classification [6]. To diagnose a human body mathematical algorithms are used in Artificial intelligence along with data points [7]. It is very much useful to develop CVMLH-2022: Workshop on Computer Vision and Machine Learning for Healthcare, April 22 – 24, 2022, Chennai, India. EMAIL: hepzia@yahoo.com (A. Hepzibah Christinal) ORCID: 0000-0003-3965-3183 (A. Hepzibah Christinal) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 33 the prediction accuracy in cancer and related death [8], also involves in predicting cardiac risk [9], and also helps in the diagnostic accuracy of magnetic resonance imaging [10], computerized tomography scan in radiological investigations. To decrease the inconveniences in classification outcomes, CTG interpretation is automated by professionals in medical and engineering [11]. The use of electronic fetal heart monitoring to check fetal status during labor is common. Despite the lack of evidence for its usefulness, this method is nonetheless widely utilized in every current labor and delivery hospital in industrialized countries. To maximize the safety and outcomes of patients, all the contributors of health care to the woman in labour and her new born must have a comprehensive awareness of the underlying pathogenesis of monitoring the heart of fetus as well as an recognition for the labor course and issues as they develop. [12]. In gynaecology, fetal abnormalities are the most likely cause of pregnancy complications. If the fetus’s environment inside the womb is unsuitable, the fetus’s health is likely to worsen. The fetal heart rate and uterine contractions are recorded at the corresponding time using the cardiotocography technique. Decision Tree, Support Vector Machine, and R - Studio approach for Naive Bayes have been utilized in the research. The datasets are extracted from UCI Machine Learning Repository and categorized into fetal stages as a normal, suspect, and pathological class that is trained, and by using algorithms it is tested, and compared by the use of different performance measurements [13]. To prevent intrapartum hypoxic-ischaemic injury, examining the heart rate of fetus with a cardiotocograph is used to identify variations in the heart rate of fetus during labor [14]. The classification is required to predict the health of newborns, especially in urgent circumstances. Cardiotocography is a technique for assisting obstetricians in obtaining precise details during gestation as a method of monitoring fetal health, especially in women who are pregnant and under great risk . CTG is a continuous electronic record of the baby's heart rate taken from the mother's belly, according to obstetricians. The information obtained is important to visualize the embryo's healthiness and allows for early intervention before the embryo suffers a permanent impairment. The intention of machine learning methods is to make use of the qualities of data collected from the data to solve problems. In this study, they compared the classification capabilities of eight various methods of machine-learning using antepartum cardiotocography data [15]. The most important technique to detect fetal distress is to check the fetal heart rate as this distress leads to complications. The main diagnosing tool to measure FHR is cardiotocography. The wrong results of CTG's graph could result in a significant loss. Decision Tree, K-Nearest Neighbours, Logistic Regression, Support Vector Machine, Random Forest, and Naive Bayes are the six algorithms presented for classification in that study for the categorization of CTG data. A feature selection methodology that is based on classification is used to remove the unnecessary features from the dataset to improve the performance of the classifiers. The evaluating metrics are used to measure the precision, accuracy, and recall of classification algorithms [16,17]. To bring out the difference between normal and abnormal fetal heart rate signals, the Bagging ensemble machine learning technique was used. The F-measure, ROC area and accuracy are used as evaluating indicators to evaluate the classifiers' success. The Bagging ensemble classifier generated favorable results in experiments, and Bagging plus Random Forest produced favorable results having an accuracy of 99.02 percent [18]. An open-access software with MATLAB is introduced to detect the fetal heart rate signals. It is freely available software for research and the software details are given. In addition to the non- linear, linear, morphological, and time-frequency characteristics, the software uses a new approach called image-based time-frequency features for analyzing the fetal heart rate signals. In addition, CTG- OAS was used in an experimental investigation using the CTU-UHB database which is publicly available to test the dependability of the software. The accuracy was 77.81 percent, the sensitivity was 76.83 percent, the specificity was 78.27 percent, and the geometric mean was 77.29 percent in the experimental investigation. [19]. The least-squares support vector machine with a binary decision tree is used to evaluate the fetal state for cardiotocography classification. Particle swarm optimization is used to enhance the LS-SVM parameters. The method's robustness is tested using a 10-fold cross- validation procedure. The method's performance is assessed in terms of accuracy. To examine and display the method's performance, cobweb representation along with receiver characteristic analysis is presented. This method achieves an incredible accuracy rate of 91.62 percent in classification, according to experimental results [20]. This work used genetic algorithms and support vector machines (SVM) to provide a new technique for evaluating fetal well-being from cardiotocograph (CTG) data (GA). Obstetricians commonly employ CTG recordings to determine fetal well-being because they contain rate of heart and uterine contraction of fetus. An SVM-based classifier was constructed using features 34 collected from normal and abnormal Uterine contraction and signals from Fetal heart rate. After that, the GA is exploited to identify the right characteristic subset for the classifier to classify based on normal and pathology in this data. The production of the novel system was estimated using comprehensive CTG data classified by three professional obstetricians. [11] 2. Methods In this paper, we used supervised classification on the basis of ML method. The main task of supervised learning is the classification where various techniques are used to create a function that matches the input to the appropriate output. Here the learner learns a function that matches the vector to different classes with the help of the other examples of the input-output function [28]. There are various classifiers which include the Multinomial Logistic Regression, Support Vector Machine, Multilayer perceptron, Random forests and so on which are involved in the classification process [27]. An ensemble technique that is useful to increase the robustness is Bootstrap aggregating. Random forest is found to be a favorable method for decision trees and bagging. Here we used the random forest classifier for classification [31]. We used R studio for implementation. The package of the random forest has some additional details such as the importance of variables and the measure of proximity. The classification is done by the random forest if the response is a factor. If it is not a factor it performs regression [29]. We used the random forest classifier to categorize the fetal stage as normal, suspect, and pathology. The dataset used is the cardiotocography dataset taken from the generally accessible ML repository. The features based on the assessment of the heart rate of fetus and uterine contraction which are organized by obstetricians are available in this dataset. It consists of 2126 fetal cardiotocograms (CTGs) which are processed automatically and measured. We select the attributes for our method. The attributes used in this paper are described in table 2. The classification is mainly based on the 3 class experiments which give the fetal state as normal, suspect and pathology, and also based on 10 class experiments that involve the morphological patterns [21]. The selected subset of the training dataset helps to build a group of decision trees by the random forest classifier. The votes from distinct trees are gathered to make the final decision. The individual trees are grown in the following way: i. Consider N samples to be used as the training set for the development of the tree. ii. Consider Q input variables and q<