=Paper= {{Paper |id=Vol-3338/ICCS_CVMLH_04 |storemode=property |title=A Classification Approach to Fetal Cardiotocography Dataset using R |pdfUrl=https://ceur-ws.org/Vol-3338/ICCS_CVMLH_04.pdf |volume=Vol-3338 |authors=R. Hephzibah,A. Hepzibah Christinal,R. Jayanthi,S. Jebasingh,D. Abraham Chandy,Chandrajit Bajaj }} ==A Classification Approach to Fetal Cardiotocography Dataset using R== https://ceur-ws.org/Vol-3338/ICCS_CVMLH_04.pdf

A Classification Approach to Fetal Cardiotocography Dataset
using R
R. Hephzibah a, A. Hepzibah Christinal a, R. Jayanthi a, S. Jebasingh a, D. Abraham Chandy b,
Chandrajit Bajaj c
a
Department of Mathematics, Karunya University, Coimbatore, India
b
Department of Electronics and Communication Engineering, Karunya University, Coimbatore, India
c
Computational applied Mathematics chair in visualization, Institute for Computational Engineering and
Sciences, University of Texas, Austin

Abstract
Electronic fetal heart monitoring to check fetal status during pregnancy is common.
Cardiotocography is a technique for assisting obstetricians in obtaining clear details during
the time of childbirth as a method of monitoring the health condition, especially in pregnant
women who are at risk of difficulties. This paper deals with the classification of the fetal
cardiotocography dataset using R. The supervised machine learning-based approach is
applied for the categorization of fetal datasets. It is classified as normal, suspect, and
pathologic based on the random forest classifier. It produces an accuracy of 99.94% in
training and 93.57% in testing which is found to be a better performance. It also provides the
best results in terms of sensitivity, and specificity in the classification of normal, suspect, and
pathology in both training and testing datasets. It is found that this method provides a greater
accuracy compared to all other methods.

Keywords 1
Machine learning, Random Forest classifier, cardiotocography, fetal heart rate

1. Introduction
Machine learning is an advancing field in the research of Engineering and computer science and
various algorithms of machine learning play the main role in the medical field. It also helps in the
computation of the image features which helps in the classification and better detection of diseases [1].
It helps in learning the empirical data and making decisions accurately using complex algorithms [2].
Supervised learning which includes regression, classification, and reinforcement learning is the general
classification of machine learning. The clustering, blind source estimation, and density estimation come
under supervised learning. and the information systems and the semi-supervised classification are part
of semi-supervised learning [3]. In medical image processing, pixel-based machine learning is the
evolving field that deals directly with the pixels or voxels of the images. It performs best in preventing
the loss of information caused by improper segmentation or feature computations [4]. Machine learning
libraries such as Torch is a freely available software library. There are different algorithms of machine
learning such as support vector machine, Parzen windows, Adaboost K nearest Neighbours, Hidden
Markov models, multi-layer perceptron, Bagging, Bayes classifiers, etc [5]. The Linear classifiers
include Logical regression, Quadratic classifiers, Naive Bayes classifier, Perceptron, Quadratic
classifiers, support vector machine, Boosting, Decision tree which aggregate random forest, Bayesian
and Neural Networks that deal with classification [6]. To diagnose a human body mathematical
algorithms are used in Artificial intelligence along with data points [7]. It is very much useful to develop

CVMLH-2022: Workshop on Computer Vision and Machine Learning for Healthcare, April 22 – 24, 2022, Chennai, India.
EMAIL: hepzia@yahoo.com (A. Hepzibah Christinal)
ORCID: 0000-0003-3965-3183 (A. Hepzibah Christinal)
©️ 2022 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)

33
the prediction accuracy in cancer and related death [8], also involves in predicting cardiac risk [9], and
also helps in the diagnostic accuracy of magnetic resonance imaging [10], computerized tomography
scan in radiological investigations. To decrease the inconveniences in classification outcomes, CTG
interpretation is automated by professionals in medical and engineering [11]. The use of electronic fetal
heart monitoring to check fetal status during labor is common. Despite the lack of evidence for its
usefulness, this method is nonetheless widely utilized in every current labor and delivery hospital in
industrialized countries. To maximize the safety and outcomes of patients, all the contributors of health
care to the woman in labour and her new born must have a comprehensive awareness of the underlying
pathogenesis of monitoring the heart of fetus as well as an recognition for the labor course and issues
as they develop. [12]. In gynaecology, fetal abnormalities are the most likely cause of pregnancy
complications. If the fetus’s environment inside the womb is unsuitable, the fetus’s health is likely to
worsen. The fetal heart rate and uterine contractions are recorded at the corresponding time using the
cardiotocography technique. Decision Tree, Support Vector Machine, and R - Studio approach for
Naive Bayes have been utilized in the research. The datasets are extracted from UCI Machine Learning
Repository and categorized into fetal stages as a normal, suspect, and pathological class that is trained,
and by using algorithms it is tested, and compared by the use of different performance measurements
[13]. To prevent intrapartum hypoxic-ischaemic injury, examining the heart rate of fetus with a
cardiotocograph is used to identify variations in the heart rate of fetus during labor [14]. The
classification is required to predict the health of newborns, especially in urgent circumstances.
Cardiotocography is a technique for assisting obstetricians in obtaining precise details during gestation
as a method of monitoring fetal health, especially in women who are pregnant and under great risk .
CTG is a continuous electronic record of the baby's heart rate taken from the mother's belly, according
to obstetricians. The information obtained is important to visualize the embryo's healthiness and allows
for early intervention before the embryo suffers a permanent impairment. The intention of machine
learning methods is to make use of the qualities of data collected from the data to solve problems. In
this study, they compared the classification capabilities of eight various methods of machine-learning
using antepartum cardiotocography data [15]. The most important technique to detect fetal distress is
to check the fetal heart rate as this distress leads to complications. The main diagnosing tool to measure
FHR is cardiotocography. The wrong results of CTG's graph could result in a significant loss. Decision
Tree, K-Nearest Neighbours, Logistic Regression, Support Vector Machine, Random Forest, and Naive
Bayes are the six algorithms presented for classification in that study for the categorization of CTG
data. A feature selection methodology that is based on classification is used to remove the unnecessary
features from the dataset to improve the performance of the classifiers. The evaluating metrics are used
to measure the precision, accuracy, and recall of classification algorithms [16,17]. To bring out the
difference between normal and abnormal fetal heart rate signals, the Bagging ensemble machine
learning technique was used. The F-measure, ROC area and accuracy are used as evaluating indicators
to evaluate the classifiers' success. The Bagging ensemble classifier generated favorable results in
experiments, and Bagging plus Random Forest produced favorable results having an accuracy of 99.02
percent [18]. An open-access software with MATLAB is introduced to detect the fetal heart rate signals.
It is freely available software for research and the software details are given. In addition to the non-
linear, linear, morphological, and time-frequency characteristics, the software uses a new approach
called image-based time-frequency features for analyzing the fetal heart rate signals. In addition, CTG-
OAS was used in an experimental investigation using the CTU-UHB database which is publicly
available to test the dependability of the software. The accuracy was 77.81 percent, the sensitivity was
76.83 percent, the specificity was 78.27 percent, and the geometric mean was 77.29 percent in the
experimental investigation. [19]. The least-squares support vector machine with a binary decision tree
is used to evaluate the fetal state for cardiotocography classification. Particle swarm optimization is
used to enhance the LS-SVM parameters. The method's robustness is tested using a 10-fold cross-
validation procedure. The method's performance is assessed in terms of accuracy. To examine and
display the method's performance, cobweb representation along with receiver characteristic analysis is
presented. This method achieves an incredible accuracy rate of 91.62 percent in classification, according
to experimental results [20]. This work used genetic algorithms and support vector machines (SVM)
to provide a new technique for evaluating fetal well-being from cardiotocograph (CTG) data (GA).
Obstetricians commonly employ CTG recordings to determine fetal well-being because they contain
rate of heart and uterine contraction of fetus. An SVM-based classifier was constructed using features

34
collected from normal and abnormal Uterine contraction and signals from Fetal heart rate. After that,
the GA is exploited to identify the right characteristic subset for the classifier to classify based on normal
and pathology in this data. The production of the novel system was estimated using comprehensive
CTG data classified by three professional obstetricians. [11]

2. Methods
In this paper, we used supervised classification on the basis of ML method. The main task of
supervised learning is the classification where various techniques are used to create a function that
matches the input to the appropriate output. Here the learner learns a function that matches the vector
to different classes with the help of the other examples of the input-output function [28]. There are
various classifiers which include the Multinomial Logistic Regression, Support Vector Machine,
Multilayer perceptron, Random forests and so on which are involved in the classification process [27].
An ensemble technique that is useful to increase the robustness is Bootstrap aggregating. Random forest
is found to be a favorable method for decision trees and bagging. Here we used the random forest
classifier for classification [31]. We used R studio for implementation. The package of the random
forest has some additional details such as the importance of variables and the measure of proximity.
The classification is done by the random forest if the response is a factor. If it is not a factor it performs
regression [29]. We used the random forest classifier to categorize the fetal stage as normal, suspect,
and pathology. The dataset used is the cardiotocography dataset taken from the generally accessible ML
repository. The features based on the assessment of the heart rate of fetus and uterine contraction which
are organized by obstetricians are available in this dataset. It consists of 2126 fetal cardiotocograms
(CTGs) which are processed automatically and measured. We select the attributes for our method. The
attributes used in this paper are described in table 2. The classification is mainly based on the 3 class
experiments which give the fetal state as normal, suspect and pathology, and also based on 10 class
experiments that involve the morphological patterns [21]. The selected subset of the training dataset
helps to build a group of decision trees by the random forest classifier. The votes from distinct trees are
gathered to make the final decision. The individual trees are grown in the following way:
i. Consider N samples to be used as the training set for the development of the tree.
ii. Consider Q input variables and q<