=Paper= {{Paper |id=Vol-2786/Paper16 |storemode=property |title=Semantic Ontology-Based Approach to Enhance Text Classification |pdfUrl=https://ceur-ws.org/Vol-2786/Paper16.pdf |volume=Vol-2786 |authors=Sonika Malik,Sarika Jain |dblpUrl=https://dblp.org/rec/conf/isic2/Malik021 }} ==Semantic Ontology-Based Approach to Enhance Text Classification== https://ceur-ws.org/Vol-2786/Paper16.pdf
                                                                                                                                             85




Semantic Ontology-Based Approach to Enhance Text
Classification
Sonika Malika,b, Sarika Jaina
a
 National Institute of Technology, Kurukshetra
b
 Maharaja Surajmal Institute of Technology
sonika.malik@gmail.com, jasarika@nitkkr.ac.in



          Abstract
          Text Classification is the process of defining a collection of pre-defined classes to free-text. It has been
          one of the most researched areas in machine learning with various applications such as sentiment
          analysis, topic labeling, language detection and spam filter etc. The efficiency of text classification
          improves, when some relation or pattern in the data is given or known, which can be provided by
          ontology. It further helps in reducing the size of dataset. Ontology is a collection of data items that
          helps in storing and representing data in a way that preserves the patterns in it and its semantic
          relationship with each other. We have attempted to verify the improvement provided by the use of
          ontology in classification algorithms. The code prepared in this research and the method developed is
          pretty generic, and could be extended to any ontology based text classification system. In this paper, we
          present an enhanced architecture that can uses ontology to provide an effective text classification
          mechanism. We have introduced an ontology based text classification algorithm by utilizing the rich
          semantic information in Disease ontology (DOID). We summarize the existing work and finally
          advocate that the ontology based text classification strategy is better as compared to conventional text
          classification in terms of different metrics like Accuracy, Precision, Recall, and F-measure etc.

          Keywords
          Text Classification, Ontology, Semantic AI, Symbolic AI, Statistical AI, Classifier.


1. Introduction                                                                             There is a plethora of textual data everywhere we look
                                                                                            around, from magazines to journals to papers. There is
                                                                                            a need to systematically categorize and interpret this
The classification of entities based on the available data                                  information without compromising time. Automated
is the foundation for classification techniques. The                                        text classification [1] is one of the most helpful tools for
available data could be of two types- the information                                       this.
that we have on hand, and the information that we have                                      It’s 󠇮one 󠇮of 󠇮the 󠇮most important and rudimentary features
previously used for classification. Either way, an                                          in Natural Language Processing (NLP) [2], with broad
accurate and precise classification relies on the amount                                    applications such as sentiment analysis [3], topic
of information that is available to us. The ways of                                         labelling, spam detection [4], and intent detection [5].
processing and analysing information has been
                                                                                            Text classifier [6] are made and meant to be
transformed through digitization.                                                           implemented on a diverse range of textual datasets. Text
ISIC’21: International Semantic Intelligence Conference, February 25-27,
                                                                                            classification can work on both, structured and
2021, Delhi, India                                                                          unstructured datasets. To understand the process of
📧:sonika.malik@gmail.com (S. Malik); jasarika@nitkkr.ac.in                                  classification and how ontology fits in this process,
(S. Jain)
  : 0000-0003-2721-1951(S Malik) 󠇮
                                                                                            there is a hierarchical progression as shown in Figure
            ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative
            Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                            1.
            CEUR Workshop Proceedings (CEUR-WS.org)                                         Artificial Intelligence (AI) is anticipated to produce
                                                                                            hundreds of billions of dollars in economic value.
                                                                                            However, considering that technology forms part of our
                                                                                                                               86




everyday lives, many people remain suspicious. Their                   dictionary, which stores the information about entities.
key issue is that AI approaches perform like black-                    This information usually consists of the features and
boxes and seems to generate ideas without any                          relations of the said entities [51, 52]. The immense
explanation. In addition, many industries recognised                   importance of ontology is utilised in the research fields,
knowledge graphs (KGs) as an effective method for                      such as data science, where, it eases information
data processing, management and enrichment [53].                       processing because of its organised structure, as
Although KGs are also increasingly recognisable as a                   compared to the more conventional ways of processing
foundations of an AI system that makes explainable AI                  raw data. The formal ontology, thus, represents data in
via 󠇮 the 󠇮 design 󠇮 concept 󠇮 called 󠇮 “Human-in- the-Loop” 󠇮         an organised way and used as a framework [9].
(HITL). The AI’s promise is to automatically derive                    Ontology based text Classification- For Machine
patterns and rules from massive datasets based on                      Learning (ML) style classification, algorithms such as
machine learning algorithms such as deep learning. This                Naive Bayes (NB) [10] or Support Vector Machine
fits very well with particular issues and helps to simplify            (SVM) [11] etc. are used, where we train a model to
classification activities in many situations. The machine              read text as feature vectors and output as one of n
learning algorithms gain the knowledge from historical                 classes. One use of ontology would be to mark-up
information, but they cannot derive new results from it.               entities in the text. In our case, we have a medical
Without explanation, there is no confidence. Explain                   ontology like DOID, whose nodes have information
ability ensures that trustworthy agents in the system are              about various diseases, symptoms, medications, etc.
able to understand and justify 󠇮the 󠇮AI 󠇮agent’s 󠇮decisions 󠇮          We could look for these entities in our text and mark
[50].                                                                  them as single entity - so for example, if we found the
Semantic AI integrates symbolic AI and statistical AI.                 string 󠇮“Lung 󠇮Cancer” 󠇮in 󠇮our 󠇮text 󠇮which 󠇮is 󠇮also 󠇮a 󠇮node 󠇮
It incorporates the approaches like machine learning,                  in our ontology, we could replace all occurrences of
information analysis, semantic web and text mining. It                 “Lung 󠇮Cancer” 󠇮with 󠇮a 󠇮single 󠇮token 󠇮“Lung_Cancer” 󠇮and 󠇮
combines the benefits of AI techniques, primarily                      treat this token as a feature for our classification. These
neural networks and semantic reasoning. It is an                       ontology nodes usually contain multiple versions of the
improvement of the existing framework used primarily                   string 󠇮that 󠇮represents 󠇮it. 󠇮For 󠇮example, 󠇮“heart 󠇮attack” 󠇮is 󠇮
to create AI-based systems. This brings fast learning                  also 󠇮 known 󠇮 as 󠇮“myocardial 󠇮 infarction”, 󠇮 so 󠇮 if 󠇮 our 󠇮 text 󠇮
from less trained data, for example chatbots can be                    contains either string, they could be normalized down
developed without cold-start problem. Semantic AI                      to one single string and treated as a single feature for
incorporates a radically different approach and                        classification. For rule-based classifiers such as
therefore complementary skills for additional                          Bayesian Networks or decision tree algorithms [12], we
stakeholders.          Although conventional Machine                   could also leverage the knowledge in the ontology to
Learning is primarily performed by data or information                 create generalized rules.
scientists involved in Explainable AI or semantic AI. At               The remaining paper has been organised as follows:
the heart of Semantic Enriched Artificial Intelligence                 Section 2 describes the related work in the field of text
architecture, a semantic knowledge graph is used by                    classification. Section 3 defines the background
providing the means for a more automated data quality                  knowledge. Section 4 presents the assessment of
management [7]. For the better quality data and more                   proposed system. Section 5 describes the comparison
options in feature extraction, semantically enhanced                   and results and finally paper ends with conclusion and
data works as a base. It gives the better accuracy of                  future scope.
classification and prediction intended by machine
learning algorithms. Semantic AI aims to have an                       2. Related Work
infrastructure to address the knowledge asymmetries
between designers of AI applications and other
stakeholders including customers and decision makers,                  Angelo A. Salatino, Thiviyan Thanapalasingam,
in 󠇮 direct 󠇮 reference 󠇮 to 󠇮 AI 󠇮 systems 󠇮 which 󠇮 ‘work 󠇮 like 󠇮   Andrea Mannocci, Francesco Osborne and Enrico
                                                                       Motta [13] came up with the Computer Science
magic’ 󠇮 where only some of the analysts actually
recognise the fundamental techniques [8].                              Ontology (CSO). The CSO consists up to twenty-six
Ontology- Ontology specifies a conceptualization of a                  thousand domains, and as many as two hundred and
domain in terms of concepts, attributes, and relations                 twenty-six thousand interpretable relations between
[49]. In simple terms, Ontology is analogous to a                      these domains. To support its availability, they also
                                                                                                                 87




developed the CSO Portal, a web application which              that could be used in classifying the said input research
allows users to explore the ontologies and send                paper.
feedback.                                                      Angelo A. Salatino, Francesco Osborne and Enrico
Angelo A. Salatino, Francesco Osborne and Enrico               Motta [15] presented a CSO classifier for automatic
Motta [14] introduced the CSO Classifier for automatic         classification 󠇮of 󠇮academic 󠇮papers 󠇮according 󠇮to 󠇮CSO’s 󠇮
classification of research papers according to the             rich taxonomy of subjects. The aim is to promote the
Computer Science Ontology (CSO). It is an                      acceptance of CSOs through the various communities
unsupervised approach. For every research Meta data,           involved in scholarly data and enable the creation of
the CSO takes as input, it returns a list of suitable topics   new applications that rely on this knowledge base. This
                                                               paper proposed four stages:




Figure 1. Concept Hierarchy in Semantic AI [45, 46]
(a) Constructing research ontology, (b) Classifying            clustering of sports related terms, so as to preserve the
new research proposals into disciplines, (c) building          semantic meaning behind terms while clustering them.
research proposal clusters using text mining, (d)              Nayat Sanchez-Pi, Luis Marti and A.C.B. Garcia [20]
balancing research proposals and regrouping them by            presented a probing algorithm for the automatic
considering 󠇮applicants’ 󠇮characteristics.                     detection of accidents in occupational health control.
Preet Kaur and Richa Sapra [17] also researched in a           The proposal has more accurate heuristics because it
similar domain, wherein, they proposed Ontology-               contrasts the relevance of techniques used with the
Based text mining methods for classification of                terms. The basic accident detection problem is divided
research proposals as well as external research                into three parts: (i) text analysis, (ii) recognition and
reviewers.                                                     (iii) 󠇮 classification 󠇮 of 󠇮 failed 󠇮 techniques which caused
Chaaminda Manjula Wijewickrema and Ruwan                       accidents.
Gamage [18] addressed the fallacies in manual                  Decker [21] presented a different approach to
classification and proposed ontology based methods for         categorize research papers by using the words present
fully automatic text classification.                           in the papers abstract. It is an unsupervised method
A Sudha Ramkumar, B Poorna and B. Saleena [19]                 which evaluates the relevance of suitable topics for the
used WordNet ontology to perform ontology based                research paper on various time scales.
                                                                                                            88




 Herrera et al. [22] devised a way to categorize research    According to the official documentation, the Natural
papers specific to the domain of physics. They did this      Language Toolkit (NLTK) [48] is a platform used for
with the help of PACS, which stands for Physics and          building Python programs that work with human
Astronomy Classification Scheme. They created a              language data for applying in statistical Natural
network like structure where, a PACS code was                Language Processing (NLP). It is a useful tool in
assigned to every topic node, and a connection between       python, which helps in processing a diverse range of
two nodes was possible only if their codes co-occur          languages by providing algorithms for it. This tool is
together in at least one paper.                              powerful because it is free and open source. Also, one
Ohniwa et al. [23] gave a similar analysis in the field of   does not need to look for any special tutorials when
biomedicine. They used the Medical Subject Heading           using the NLTK, as its official documentation is very
(MeSH).                                                      well described. The most common algorithms used in
Mai et al [24] showed that the performance of their          NLTK are tokenization, lemmatization, part of speech
model, which was only trained using titles, was as good      tagging etc. These algorithms are essentially used to
as the models trained by mining the full texts of papers     preprocess textual data. The preprocessing takes place
and articles. They developed their approach using deep       in five parts:
learning techniques. As training set, they used              Tokenization- A token is the fundamental building
scientific papers from EconBiz and PubMed,                   block of any linguistic structure, such as a sentence or
respectively annotated with the STW Thesaurus for            a paragraph. The process of tokenization is to break
Economics (approximately five thousand classes) and          these structures down into tokens. Tokenizer could be
MeSH (approximately twenty-seven thousand classes).          of two types – a 󠇮 sentence 󠇮 tokenizer’s 󠇮 tokens 󠇮 are 󠇮
Cook et al. [25] developed a method of allocation of         sentences. It, therefore, breaks paragraphs down into
papers to reviewers optimally, to aid the selection          sentences. A work tokenizer identifies words as tokens.
process.                                                     It, hence, disintegrates sentences into words.
Arya and Mittendorf [26] suggested a rotation based          Stemming- A stem is the root word or phrase from
method for the assignment of projects.                       which different forms of that word could be derived.
Choi and Park [27] offered a solution for Research and       Stemming is the process of identifying all the words
Development proposal classification, which was text          that were derived from the same stem and reduce them
mining based.                                                or normalize them back to their stem form. For
Girotra [28] proposed a study for the evaluation of          example, connection, connected, connecting word
portfolio projects.                                          reduce to a common word "connect".
Sun et al. [29, 30] developed a mechanism for                Lemmatization- Sometimes, we may encounter words
assessment of reviewers, who would evaluate the              that have different stems but the same final meaning. In
research papers. Mehdi Allahyari, Krys J. Kochut and         such a case, there is a need for a dictionary lookup to
Maciej Janik [31] proposed a way of dynamic                  further reduce the stems to their common meaning, or
classification of textual records in dynamically             the base word. This base word is known as lemma, and
generated classes.                                           hence the name lemmatization. For example, the word
Rudy Prabowo, Mike Jackson, Peter Burden and                 "better" has "good" as its lemma. Such cases are missed
Heinz-Dieter Knoell [32] developed a web page                out during stemming because these two words not at all
classifier. Its classification was with reference to the     alike, and would need a dictionary lookup where, their
Dewey Decimal System and Library of Congress                 meanings can confirm the lemma.
Classification schemes.                                      POS Tagging- It stands for Part of Speech, and just as
                                                             the name suggests, it identifies the various parts of a
3. Background Knowledge                                      linguistic structure like a sentence. The different parts
                                                             could be an adjective or a noun or a verb. It does so by
                                                             studying the structure of the sentences and observing
In this section we have discussed the pre-processing         the arrangement of words and the relation between the
steps for textual data and Machine Learning classifiers
                                                             various words.
that are being used in our research.

                                                             3.2. Text classification and classifiers
3.1. Pre-Processing Textual data
                                                                                                                                89




The idea behind text classification is to group text into               multi-class scenario with three classes- A, B and C.
categories using machine learning. It finds use in many                 Using Naïve Bayes, we try to predict whether the data
relevant areas such as sentiment analysis, emotion                      point x belongs to class A or B or C, by calculating its
analysis, etc. There have been many classifiers                         probability for the three classes as given in Eq. 1.
developed for each classification category. As stated
previously, text classifier is made and meant to be                                      𝑃(𝐴|𝐵) = (𝑃(𝐵|𝐴).𝑃(𝐴))/𝑃(𝐵)
implemented on a diverse range of textual datasets. Text                                                                          (1)
classification can work on both, structured and                         This 󠇮algorithm 󠇮is 󠇮called 󠇮‘Naïve’ 󠇮because it assumes that
unstructured datasets. Both types of datasets find                      all the features are independent of each other as defined
numerous applications in various fields. The                            in Eq. 2.
Classification process in machine learning can be
explained very simply. First, we assess and analyze the                    𝑃(𝑓1 , 𝑓2 , 𝑓3 … 𝑓𝑛 ) = 𝑃(𝑓1 ) = 𝑃(𝑓2 ) = ⋯ 𝑃(𝑓𝑛 )
training dataset for boundary condition purposes. Next,                                                                          (2)
we predict the class for new data using the information                 There are further two categories of the NB Classifier
obtained and learned during the training phase. This is                 one is Gaussian NB Classifier and other one is
essentially the whole process of classification.                        multinomial NB Classifier.
Classification could be either supervised or                            The Gaussian Naïve-Bayes classifier is used when a
unsupervised. Supervised classification [33] of works                   dataset has continuous values of data. It uses the
on the principle of training and testing, and uses labeled              Gaussian Probability Distribution function (values are
data, i.e. predefined classes, for prediction. In the                   centered on mean and as the graph grows, the values
training phase, the model is made to learn some                         decrease). The Multinomial Naive Bayes algorithm
predefined classes by feeding it labeled or tagged data.                assumes the independence of features, and the
In 󠇮 the 󠇮 testing 󠇮 phase, 󠇮 the 󠇮 efficiency 󠇮 of 󠇮 the 󠇮 model’s 󠇮   multinomial component of this classifier ensures that
prediction or classification is measured by feeding it                  the distribution is multinomial in its features.
unobserved data. In other words, it can only predict                     (b) Decision Tree [38] - It is a highly intuitive
those classes in the testing phase which, it has learnt in              algorithm which uses greedy approach. To construct a
the training phase. Some common examples of                             decision tree, we have to perform the following steps –
supervised classification are spam filters, intent                      (1) select a feature to split the data, (2) select a method
detection, etc. Unsupervised classification [34, 35]                    to split the data on the said feature. It has the internal
involves classification by the model without being fed                  working algorithm as: (i) Create/Select a node. (ii) If
the external information. In this, the algorithm of the                 the node is pure, output the only class. (iii) If no feature
model tries to group or cluster data points based on                    is left to split upon, and the node is impure, output the
similar traits, patterns and other common features that                 majority class. (iv) Else find the best feature to split
can be used to tie two data points together. A common                   upon. Recursively call on this split. Go to b.
example where unsupervised classification is really                     (c) K-Nearest Neighbor [38, 40] - Consider a scenario
helpful is the search engines. They create data clusters                where we have to predict to which class, the testing
based on insights generated from previous searches.                     point belongs to, by considering all the features at once.
This type of classification is extremely customizable                   Such is the working of KNN algorithm as shown in
and dynamic as there is no need for training and tagging                Figure 2. To predict the class of the testing data point,
for it to work on textual datasets. Thus, the                           we check its vicinity. To classify the testing point, we
unsupervised classification is language compatible.
The classifiers used for text-classification could be ML
based, such as Naïve-Bayes Classifier, Decision Tree
Classifier [36] etc., or it can be based on Neural
Network architecture such as Artificial Neural
Network, Convolutional Neural Network etc. [37].
The machine learning based classifiers that can be used
for text classification are:
                                                                        Figure 2. KNN
(a) Naive Bayes classifier [38, 39] - It uses the Bayes
theorem to predict values. This algorithm is good for                   check a specific number of points (1, 3, 5, 7, etc.) and
multi-class classification. Consider a data point x, in a               whichever class is in majority among those, that one is
                                                                                                                                90




predicted. To select the nearest point, we have to                           (g) Bagging Classifier [43]: A Bagging Classifier is an
consider its distance from the other points. The distance                    ensemble Meta estimating system that fits base
metric can be (a) Manhattan distance, (b) Euclidian                          classifiers in each of the random subsets of the original
Distance, (c) Minkowski distance.                                            data sets and then combines their individual predictions
(d) Random forest [38, 41] - It is an extension of the                       to form a final prediction. Usually, such a meta-
decision tree classifier. This algorithm uses multiple                       estimator can be used to minimize the variance of a
combinations of decision trees to accurately predict                         black-box estimator by randomization.
testing data. The random forest classifier overcomes
the over-fitting problem of decision trees by building                       4. Proposed Study
multiple decision trees and going with the majority
result. 󠇮The 󠇮trees’ 󠇮outputs 󠇮vary 󠇮because 󠇮each 󠇮tree 󠇮is 󠇮built 󠇮
with random data and random features. To generate                            The classification by Machine Learning algorithms is
randomness in trees, we use two techniques-                                  supposed to improve with the use of ontology. We aim
(i) 󠇮 Bagging: 󠇮 If 󠇮 we 󠇮 have 󠇮 ‘m’ 󠇮 data 󠇮 points, 󠇮 we 󠇮 select 󠇮 a 󠇮   to verify this fact by studying and comparing values of
                                                                             metrics such as accuracy, precision, recall and F1 score
subset 󠇮of 󠇮‘k’ 󠇮out 󠇮of them. 󠇮For 󠇮‘n’ 󠇮trees, 󠇮n*k 󠇮subsets 󠇮are 󠇮
selected. Data points can be considered with                                 for ontology based text classification and conventional
                                                                             text classification.
replacement as the selection is random; therefore, these
trees are called bag trees.
(ii) Feature Selection: In the training phase, some                          4.1.     Conventional Text Classification
features are selected at random in this technique, with
the condition that the selection is performed without                        In the conventional classification the framework had
replacement.                                                                 three main phases, (i) Dataset generation (ii) Model
(e) SVM Classifier [38, 42] - It is a very powerful                          training and testing (iii) Analyzing/Classifying results
algorithm and overcomes the limitation of logistic                           as shown in Figure 3.
regression. As logistic regression uses sigmoid1.                            1. Dataset Generation: A premature knowledge
function, the value predicted for a testing data point is                    database of disease-symptom associations was
close to 0.5. This causes the problem of incorrect                           available on [45] which consist of three columns named
prediction. So, SVM uses the rules of logistic                               as disease name, count of disease occurrence and the
regression only, but exponentially increases the value,                      symptoms; however, it needed modification to be used
so that the values predicted do not fall in the range (-1,                   for our research. Also some new information was
1).                                                                          added to the dataset so that matching could be done
 This cost function changes to the following equation                        precisely. Thus the final dataset created, is the one that
in SVM as given in Eq.2.                                                     was used for this proposed research. The modified
                                                                             dataset and the ontology are compatible as they consist
  (𝜃) = 𝐶 ∑[ 𝑦 (𝑖) 𝑐𝑜𝑠𝑡1 (𝜃 𝑇 𝑥 (𝑖) ) + (1 − 𝑦 (𝑖) 𝑐𝑜𝑠𝑡0 (𝜃 𝑇 𝑥 (𝑖) )] +     of 󠇮classification/output 󠇮feature 󠇮“disease 󠇮name” 󠇮and 󠇮the 󠇮
                            0.5 ∑(𝜃𝑖 )2                                      matching 󠇮 feature 󠇮 “disease 󠇮 description”. 󠇮 After 󠇮 the 󠇮
                                                        (3)                  ontology and a working dataset were obtained, cleaning
(f) Logistic Regression [38]: It is a primitive                              and preprocessing of the dataset was done, NLTK is
classification algorithm which uses the sigmoid                              used for processing the dataset. A synthetic dataset is
function as in Eq. 4 at its core to perform classification.                  also generated which involves creating new data using
                                        1
                           𝐹(𝑥) =                                            programming techniques. In this research we created
                                     1 + 𝑒 −𝑥
                                                    (4)                      multiple entries using the random feature value
As the sigmoid has an exponential function, the graph                        selection of same class. For example, consider a disease
                                                                             having 10 symptoms. We randomly select a subset of
moves exponentially either towards 0 or 1 with a slight
change in x.                                                                 these 10 symptoms and generate a new entry for the
The cost function of the binary logistic regression is                       dataset involving fewer symptoms and the disease
given in Eq. 5.                                                              name. This process helps to bind the symptom values
    𝐸(ℎ(𝑥))= ∑(−𝑦𝑖 log (ℎ(𝑥)) − (1 − 𝑦𝑖 ) log (1 − ℎ(𝑥)))
                                                                             to the disease and generate strong positive relation
                                                                       (5)   between feature values (symptoms) and class (disease).
                                                                             2. Model Training/Testing: This phase involves using
                                                                             dataset and applying machine learning classifiers to it.
                                                                                                              91




As the dataset initially contains text keywords, it needs 2.   3. Analyzing/ Classifying Results: To analyze the
to be converted into numbers using count vectorizer            results, we compare the disease predictions for the
module. After this the training data is ready for feeding      testing data with the actual disease class. After
to the classifier for training. The classifiers used are       comparing we calculate the classification metrics like
KNN, SVM, Logistic Regression, Decision Tree and               accuracy, precision, recall, F1-score. After this
Random Forest etc. After training we can use the model         computation we can compare the performance of
for predictions on testing data. The ratio of training and     multiple classifiers based on metrics. Also we can
testing data is 80 and 20 respectively.                        verify which classes seem to perform well base on
                                                               individual class-wise precision and recall values.




Figure 3. Conventional Text Classification
                                                                to optimize it. The presented methods/phases are: (i)
4.2.    Ontology               Based             Text           Dataset generation, (ii) Ontology Matching (iii)
                                                                Model Training and Testing iv) Analyzing and
        Classification                                          classifying results as shown in Figure 4 (b).
                                                                The phases i, iii and iv are explained earlier in section
For the purpose of this research, we have used the              4.1.
Human Disease Ontology, which was hosted at the                 Ontology Matching: In this phase the keywords
website for the Institute of Genome Sciences,                   formed from the description of the disease are
Maryland School of Medicine [44]. This ontology is              matched with the keywords of ontology nodes. All
comprehensive hierarchical controlled vocabulary for            the matched nodes are possible classes which can be
human disease representation. It consists of unique             used to create the subset of the data for efficient
label for each disease which acts as identifier. The            model training. The use of priority based matching
owl file of the ontology was exported to csv file using         helps us to further limit classes. In our research for
Protégé. We have presented a second phase between               ontology matching each keyword is assigned two
dataset generation and Model training/Testing, in               numbers to specify its priority. The first number
which a hybrid approach for text classification is used         describes frequency of the keyword and second
                                                                number describes whether the keyword can be
                                                                                                                         92




lemmatized or not. If it cannot be lemmatized it is                   number). The steps for ontology matching are given
assigned as 1 otherwise 0. Thus each keyword has                      in Figure 4(a).
syntax (name, first priority number, second priority




Figure 4. (a) Ontology Matching (b) Ontology Based Text Classification

Algorithm 1: Ontology Based Text Classification                                  indices_to_use. append[i]
                                                                                           else
                                                                                           continue
The Ontology matching function used in this Algorithm refers                     reduced_x_train, reduced_y_train = reducing_dataset
to Algorithm 4.2                                                      (x_train, y_train, indices_to_use)
DOID: Disease Ontology
data_x, data_t= synthetic_data_generation (Knowledge_base)
                                                                      training_data = count_vectorizer.fit_transform (reduced.
                                 // Phase1                            x_train)                                 //Phase 3
x_train, x_test, y_train, y_test =train_test_split (data_x, data_y)   testing_data = count_vectorizer. transform (x_test[z])
Ontology_tree= Loading_Ontology ()                                    classifer.fit (training_data, reduced_y_train)
                                             //Phase 2                prediction = classifier. predict(testing_data)
for i in range (1, len (Knowledge_base))                              predictions. append (prediction)
Keywords_for_matching =Keywords_formation
                                                                      accuracy_score = accuracy (predictions, y_test)
(Knowledge_base)
                                                                                                               //Phase 4
all_classes= set (data_y)
                                                                      precision_score = precision (predictions, y_test)
for z in range (0, len (x_test))
                                                                      recall_score = recall (predictions, y_test)
keywords=keywords_selection (Keywords_for_matching,
                                                                      F1-score = F1-score (predictions, y_test)
y_test[z])
possible_classes=Ontology_matching (tree, keywords)
           for i in range (0, len (possible_classes))
                                                                      Algorithm 2: Ontology Matching
           for j in range (0, len (all_classes))
if set (word_tokenize (possible _classes[i])). subset (set
                                                                      def ontology_mathing (tree, keywords):
(word_tokenize (all_classes[j])))                                          nodes_to_search= []
           final_classes. append(all_classes[j])                        found_nodes = []
           else                                                         priority_1 = []
           continue                                                     priority_2 = []
           for i in range (0, len(y_train))                             priority_3 = []
                      if y_train[i] in final _classes                   for i in range (0, len (keywords)):
                                                                                                                               93




     if keywords[i][1]! = 1:                                                                                 TP+TN
        priority_1. append (keywords[i][0])                                             Accuracy =
     else:
                                                                                                       𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁
        if keywords[i][2] == 1:
           priority_2. append (keywords[i][0])                            Precision- Sometimes, a classifier may label a class
        else:                                                             as true for classification of some raw data, when in
           priority_3. append (keywords[i][0])                            fact, it should have been false. This is the case of a
  priority_1_count = 0                                                    false positive. Precision takes into account the false
  priority_2_count = 0
  priority_3_count = 0                                                    positives as well.
  for i in range (0, len(tree)):                                                                                𝑇𝑃
                                                                                              Precision =
     priority_1_count = 0                                                                                    𝑇𝑃+𝐹𝑃
     priority_2_count = 0                                                 Recall- When a classifier marks a class as negative
     priority_3_count = 0                                                 for an unobserved data item, when in fact, it
     for j in range (0, len(priority_1)):
           if priority_1[j]. lower () in [x. lower () for x in tree[i].
                                                                          should’ve 󠇮been 󠇮true, 󠇮it 󠇮is 󠇮a 󠇮case 󠇮of 󠇮a 󠇮false 󠇮negative. 󠇮
keywords]:                                                                Recall accounts for the sensitivity of a model by
              priority_1_count=priority_1_count+1                         taking into account the false negatives.
           else:                                                                                              𝑇𝑃
              continue                                                                          Recall =
     for j in range (0, len(priority_2)):
                                                                                                           𝑇𝑃+𝐹𝑁
        if priority_2[j]. lower () in [x. lower () for x in tree[i].
                                                                          F1 Score is the weighted average of Precision and
keywords]:                                                                Recall. Therefore, it factors in false positives as well
           priority_2_count = priority_2_count + 1                        as false negatives. In case of an uneven class
        else:                                                             distribution, F1 score becomes more important than
           continue                                                       accuracy. Other times, when false negatives and
     for j in range (0, len (priority_3)):
        if priority_3[j]. lower () in [x. lower () for x in tree[i].      positives have the same cost, accuracy may be treated
keywords]:             priority_3_count = priority_3_count + 1            as the superior evaluation parameter.
        else:                                                                                      2∗(𝑅𝑒𝑐𝑎𝑙𝑙∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
           continue                                                                  F1 Score =
                                                                                                    (𝑅𝑒𝑐𝑎𝑙𝑙+𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
     if priority_1_count+priority_2_count+
priority_3_count>=1:                                                      Where TP- True Positive
                 found_nodes. append ((tree[i]. entity))                           TN- True Negative
     else:                                                                         FP- False Positive
        continue                                                                   FN- False Negative
     classes= []
  for i in range (0, len(found_nodes)):
                                                                          In reference to the Table 1, the values of the metrics
     classes. append(found_nodes[i])                                      Accuracy, Precision recall and F1-score have the
  classes=list(set(classes))                                              same magnitude. This is due to the fact that the FP &
  # print(classes)                                                        FN values are same in magnitude as there is less
  return classes                                                          number of records per disease. This results in an
                                                                          equal value of precision, recall, F1-score for each
5. Classification of Various Algorithms                                   class as shown in Table 2.
   with and without Ontology                                              It can also be observed that decision tree classifier
                                                                          shows the highest boost in accuracy, precision, recall
Parameters used for evaluation and comparison of the                      and F1 score being 0.75 for simple classification, and
model when used with ontology v/s when used                               0.85 for ontology based classification. There is a 10%
without ontology is accuracy, precision, recall, and                      improvement in metrics for 500 test cases and 6% for
F1 score.                                                                 100 test cases for decision tree classifier. It is
Accuracy describes the intuitiveness achieved by a                        followed by the KNN Classifier, which shows the 5%
model after training. It takes into account all the                       improvement for both 100 & 500 test cases. Rest
correctly predicted observations from the list of all                     other classifiers have shown improvement in metrics
predictions.                                                              of around 1% - 3% as shown in Table 1.
Accuracy = Number of correct predictions/ total
number of Predictions
                                                                                                    94




Table 1
Comparison Table for different classifiers based on ontology and without ontology 100 & 500 Test Cases
        Classifier      Parameter                 100 Test Cases                   500 Test Cases

                                             Without        With Ontology   Without       With Ontology
                                             Ontology                       Ontology
      Naïve-Bayes          Accuracy            0.98                1.0        0.97             0.978
                           Precision           0.98                1.0        0.97             0.978
                            Recall             0.98                1.0        0.97             0.978
                           F1 Score            0.98                1.0        0.97             0.978
      Decision Tree        Accuracy            0.74               0.80       0.758             0.852
                           Precision           0.74               0.80       0.758             0.852
                            Recall             0.74               0.80       0.758             0.852
                           F1 Score            0.74               0.80       0.758             0.852
          KNN              Accuracy            0.81               0.86       0.818             0.854
                           Precision           0.81               0.86       0.818             0.854
                            Recall             0.81               0.86       0.818             0.854
                           F1 Score            0.81               0.86       0.818             0.854
     Random Forest         Accuracy            0.96               0.97       0.932             0.952
                           Precision           0.96               0.97       0.932             0.952
                            Recall             0.96               0.97       0.932             0.952
                           F1 Score            0.96               0.97       0.932             0.952
          SVM              Accuracy            0.98                1.0       0.986             0.992
                           Precision           0.98                1.0       0.986             0.992
                            Recall             0.98                1.0       0.986             0.992
                           F1 Score            0.98                1.0       0.986             0.992
         Bagging           Accuracy            0.87               0.89       0.894             0.912
                           Precision           0.87               0.89       0.894             0.912
                            Recall             0.87               0.89       0.894             0.912
                           F1 Score            0.87               0.89       0.894             0.912
        Logistic           Accuracy            0.99                1.0       0.982             0.99
       Regression          Precision           0.99                1.0       0.982             0.99
                            Recall             0.99                1.0       0.982             0.99
                           F1 Score            0.99                1.0       0.982             0.99

Table 2
Values of Precision, recall & F1-score for SVM classifier for each class
             Diseases                   Precision                Recall             F1-Score
      Alcohol Dependence                    1.0                    1.0                 1.0
             Influenza                      1.0                    1.0                 1.0
            Neoplasm                        1.0                    1.0                 1.0
              Hernia                        1.0                    1.0                 1.0
          Fibrous Tumor                     1.0                    1.0                 1.0
          Osteomyelitis                     1.0                    1.0                 1.0
           Pancreatitis                     1.0                    1.0                 1.0
           Cholecystitis                    1.0                    1.0                 1.0
          Pneumocystis                      1.0                    1.0                 1.0
                                                                                                                95




                   Dysphagia                       1.0                 1.0                        1.0
               Psychotic disorder                  1.0                 1.0                        1.0
                   Hepatitis C                     1.0                 1.0                        1.0
            Cerebrovascular Disease                1.0                 1.0                        1.0
                   Depression                      1.0                 1.0                        1.0
                    Hepatitis                      1.0                 1.0                        1.0
                Hypothyroidism                     1.0                 1.0                        1.0
                   Hepatitis B                     1.0                 1.0                        1.0
                 Arthrogryposis                    1.0                 1.0                        1.0
                Manic Disorder                     1.0                 1.0                        1.0
             Tonic-clonic Seizures                 1.0                 1.0                        1.0
                    Migraine                       1.0                 1.0                        1.0
                    Anxiety                        1.0                 1.0                        1.0
           Hepatocellular Carcinoma                1.0                 1.0                        1.0
                    Asthma                         1.0                 1.0                        1.0
            Congestive Heart Failure               1.0                 1.0                        1.0
                  Hypertensive                     1.0                 1.0                        1.0
                 Chronic Kidney                    1.0                 1.0                        1.0
                    Cirrhosis                      1.0                 1.0                        1.0
               Blood Coagulation                   1.0                 1.0                        1.0
                   Melanoma                        1.0                 1.0                        1.0
           Lymphatic System Disease                1.0                 1.0                        1.0
                  Dependence                       1.0                 1.0                        1.0
                Bipolar Disorder                   1.0                 1.0                        1.0
                   Candidiasis                     1.0                 1.0                        1.0
                 Hypoglycemia                      1.0                 1.0                        1.0
               Sinus Tachycardia                   1.0                 1.0                        1.0
          Transient Cerebral Ischemia              1.0                 1.0                        1.0

The order of classifier w.r.t its magnitude is as follows:      Naïve Bayes Classifier < SVM classifier < Logistic
                                                                Regression.
The bagging classifier follows next with the values of the
parameters being 0.87 each in simple text classification,
and 0.89 in ontology based classification. Random Forest
                                                                6. Conclusion and Future Scope
Classifier comes next, as it shows the values of the
parameters as 0.96 and 0.97 in each classification case.        In this paper, the observations show that the ontology
Naïve-Bayes Classifier and the SVM classifier have the          based classification stands at a higher level than the
same values of all the parameters as 0.98 for simple            classification without ontology. The general pattern
classification and 1.0 for ontology based text                  indicates towards a more accurate and precise
classification. Now we will come to Logistic Regression,        classification using an ontology. All the parameters that
which can be labelled as best classifier with the values of     were used (accuracy, precision, recall, and F1 Score)
metrics as 0.99 for simple Text classification while 1.0 for    showed an elevation of 1% to 3% when the classification
ontology based text classification. There is minute             was done with the help of ontology. It can be deduced that
difference in the value of metrics of these classifiers. Also   using the ontology increased the efficiency of
the order of the increasing accuracy of various classifiers     classification. This advantage can be attributed to the fact
(with or without using ontology) goes on as:                    the number of possible classes for classification reduced
Decision Tree Classifier < KNN Classifier <                     while, in turn, reduces the time taken for training purpose.
Bagging Classifier < Random Forest Classifier <                 The results also indicate towards a more comparable
                                                                accuracy level amongst the classifiers when the ontology
                                                                                                                  96




was used. his study, while proving the importance and           10. K.A. Vidhya, G. Aghila, 󠇮“A 󠇮Survey 󠇮of 󠇮Naïve 󠇮Bayes 󠇮
benefits of ontology, still has a lot of scope for future           Machine Learning approach in Text Document
improvements. Future work is needed to improve the                  Classification”, 󠇮 (IJCSIS) 󠇮 International 󠇮 Journal 󠇮 of 󠇮
dataset of diseases used in this project, as there was no           Computer Science and Information Security, 7, 2010.
official dataset available for the human disease ontology.      11. T. JOACHIMS, Text categorization with support
Most of the work has been done on a limited dataset                 vector machines: learning with many relevant
obtained from converting the available ontology into a              features. In Proceedings of ECML-98, 10th European
dataset. There is also a need to optimize the code used for         Conference on Machine Learning (Chemnitz,
ontology matching after the data preprocessing has been             Germany, 1998), 137-142.
done. One could also move on from machine learning              12. D. E. Johnson, F. J. Oles, T. Zhang, T. 󠇮 Goetz, 󠇮 “A 󠇮
towards deep learning and build a neural network for this           decision-tree-based symbolic rule induction system
dataset to further improve the results of the classification        for text Categorization”, 󠇮 by 󠇮 IBM systems journal,
in the future.                                                      41(3) 2002.
                                                                13. A.A. Salatino, T. Thanapalasingam, A. Mannocci, F.
References                                                          Osborne and E. Motta, 󠇮“Classifying 󠇮Research 󠇮Papers 󠇮
                                                                    with 󠇮 the 󠇮 Computer 󠇮 Science 󠇮 Ontology,” 󠇮 Knowledge 󠇮
                                                                    Media Institute, The Open University, MK7 6AA,
1. V. Korde “Text classification and classifiers: A                 Milton Keynes, UK, 2018.
   survey,” 󠇮 International Journal of Artificial               14. A.A. Salatino, F. Osborne and E. Motta, The
   Intelligence & Applications (IJAIA) 3(2) (2012): 85-
                                                                    Computer Science Ontology:               A Large-Scale
   99.
                                                                    Taxonomy of Research Areas, 17th International
2. A. Gelbukh, Natural Language Processing, IEEE Fifth              Semantic Web Conference, Monterey, CA, USA,
   International Conference on Hybrid Intelligent                   October 8-12, 2018, Proceedings, Part II
   Systems (HIS'05), Rio de Janeiro, Brazil (2006).             15. A. Salatino, F. Osborne, E. Motta. The CSO classifier:
3. A. Agarwal, B. Xie, I. Vovsha, O. Rambow and R.                  Ontology-driven detection of research topics in
   Passonneau, 󠇮 “Sentiment 󠇮 Analysis 󠇮 of 󠇮 Twitter 󠇮 Data        scholarly articles. In: A. Doucet et al. (eds.) TPDL
   (2011). In Proc. WLSM-11.                                        2019: 23rd International Conference on Theory and
4. Bhowmick and S.M. Hazarika, E-mail spam filtering:               Practice of Digital Libraries. Cham, Switzerland:
   a review of techniques and trends. In: Kalam A, Das              Springer, 2019, pp. 296–311. doi: 10.1007/978- 3-
   S, Sharma K (eds) Advances in electronics,                       030-30760-8_26.
   communication and computing. Lecture notes in                16. J. Ma, W. Xu, Y. Sun, E. Turban, S. Wang, O. Liu,
   electrical engineering, 443. Springer, Singapore, 583–           “An 󠇮Ontology-Based Text-Mining Method to Cluster
   590. 2018. https://doi.org/10.1007/978-981-10-4765-              Proposals 󠇮 for 󠇮 Research 󠇮 Project 󠇮 Selection,” 󠇮 IEEE 󠇮
   7_61                                                             Transactions on Systems, Man, and Cybernetics Part
5. S. Akulick and E.S.Mahmoud, Intent Detection                     A: Systems and Humans, 42(3), 2012.
   through Text Mining and Analysis. In Proceedings of          17. P. Kaur, R. Sapra, “Ontology 󠇮Based 󠇮Classification and
   the Future Technologies Conference (FTC),                        Clustering of Research Proposals and External
   Vancouver, Canada, 29–30 November 2017; 493–                     Research 󠇮 Reviewers”, 󠇮 International 󠇮 Journal 󠇮 of 󠇮
   496.                                                             Computers & Technology, 5(1) 2013, ISSN 2277-
6. K.Das and R.N. Behera, 󠇮 “A 󠇮 Survey 󠇮 on 󠇮 Machine 󠇮            3061
   Learning: 󠇮 Concept, 󠇮 Algorithms 󠇮 and 󠇮 Applications,” 󠇮   18. C.M. Wijewickrema, R. Gamage, An Ontology Based
   International Journal of Innovative Research in                  Fully Automatic Document Classification System
   Computer and Communication Engineering 2(2),                     Using an Existing Semi-Automatic System, National
   2017.                                                            Institute of Library and Information Sciences,
7. Blumauer,        PoolParty        Semantic          Suite,       University of Colombo, Colombo, Sri Lanka, 2013.
   2018,URL:https://www.poolparty.biz/semantic-ai/              19. Sudha Ramkumar, B. Poorna, B. Saleena, Ontology
8. https://www.slideshare.net/semwebcompany/semanti                 based text document clustering for sports, Journal of
   c-ai                                                             Engineering and Applied Sciences, 2018.
9. T. Berners-Lee, James Hendler and Ora Lassila,               20. Nayat Sanchez-Pi, Luis Marti and A.C.B. Garcia,
   Scientific American: Feature Article: The Semantic               “Improving 󠇮 ontology-based 󠇮 text 󠇮 classification: 󠇮 An 󠇮
   Web: May 2001                                                    occupational 󠇮health 󠇮and 󠇮security 󠇮application,” 󠇮Article
                                                                                                                         97




    Journal of Applied Logic · September 2015                                Comparison, International Journal of Computer
21. S.L. Decker, B. Aleman-meza, D. Cameron and I.B.                         Trends and Technology (IJCTT), 48(3), 2017.
    Arpinar, Detection of Bursty and Emerging Trends                     34. M. Khanum, T. Mahboob, W. Imtiaz, H.A. Ghafoor
    towards Identification of Researchers, the Early Stage                   and R. Sehar, A Survey on Unsupervised Machine
    of Trends (2007).                                                        Learning Algorithms for Automation, Classification
22. M. Herrera, D.C. Roberts and N. Gulbahce, Mapping                        and Maintenance, International Journal of Computer
    the evolution of scientific fields. PLoS ONE. 5(5),                      Applications (0975 – 8887) 119(13), 2015.
    2010.                                                                35. Q. Guo, J. Wentian, S. Zhong and E. Zhou, "The
23. R.L. Ohniwa, A. Hibino and K. Takeyasu, Trends in                        Analysis of the Ontology-based K-Means Clustering
    research foci in life science fields over the last 30 years              Algorithm", Proceedings of the 2nd International
    monitored by emerging topics, Scientometrics. 85(1),                     Conference on Computer Science and Electronics
    2010.                                                                    Engineering (ICCSEE 2013), [online] Available:
24. F. Mai, L. Galke, and A. Scherp, Using Deep Learning                     https://www.atlantis-press.com/proceedings/iccsee-
    for Title Based Semantic Subject Indexing to Reach                       13/4617.
    Competitive Performance to Full-Text, JCDL 󠇮 ’18 󠇮                   36. P. Vateekul and M. Kubat, Fast Induction of Multiple
    Proceedings of the 18th ACM/IEEE on Joint                                Decision Trees in Text Categorization From Large
    Conference on Digital Libraries (Fort Worth, Texas,                      Scale,Imbalanced, and Multi-label Data, IEEE
    USA, Jun. 2018)                                                          International        Conference         on         Data
25. W. D. Cook, B. Golany, M. Kress, M. Penn, and T.                         MiningWorkshops 2009.
    Raviv, 󠇮“Optimal 󠇮allocation 󠇮of 󠇮proposals to reviewers             37. S. Dargan, M. Kumar, M. Ayyagari and G.
    to 󠇮 facilitate 󠇮 effective 󠇮 ranking,” 󠇮 Manage. 󠇮 Sci., 󠇮 51(4),       Kumar, A Survey of Deep Learning and Its
    655–661, 2005.                                                           Applications: A New Paradigm to Machine
26. Arya 󠇮 and 󠇮 B. 󠇮 Mittendorf, 󠇮 “Project 󠇮 assignment 󠇮 when 󠇮
                                                                             Learning, Springer, June 2019.
    budget 󠇮 padding 󠇮 taints 󠇮 resource 󠇮 allocation,” 󠇮 Manage. 󠇮
    Sci., vol. 52, no. 9, pp. 1345–1358, Sep. 2006.                      38. https://jmlr.csail.mit.edu/papers/v12/pedregosa1
27. Choi and Y. Park, 󠇮“R&D 󠇮 proposal 󠇮screening 󠇮system 󠇮                  1a.html
    based on text mining 󠇮approach,” 󠇮Int. 󠇮J. Technol. Intell.          39. S. Xu, Y. Li and Z. Wang, Bayesian multinomial
    Plan, 2(1), 61–72, 2006.                                                 naïve bayes classifier to text classification. In:
28. K. 󠇮Girotra, 󠇮C. 󠇮Terwiesch, 󠇮and 󠇮K. 󠇮T. 󠇮Ulrich, 󠇮“Valuing 󠇮           Advanced        multimedia        and       ubiquitous
    R&D projects in a portfolio: Evidence from the                           engineering. Springer, 347–352, 2017.
    pharmaceutical 󠇮industry,” 󠇮Manage. Sci., 53(9) 1452–                40. G. Guo, H. Wang, D. Bell, Y. Bi and K. Greer,
    1466, 2007.                                                              KNN Model-Based Approach in Classification,
29. Y. 󠇮H. 󠇮Sun, 󠇮J. 󠇮Ma, 󠇮Z. 󠇮P. 󠇮Fan, 󠇮and 󠇮J. 󠇮Wang, 󠇮“A 󠇮group 󠇮         Proc. ODBASE pp- 986 – 996, 2003
    decision support approach to evaluate experts for
                                                                         41. G. 󠇮Biau, 󠇮“Analysis 󠇮of 󠇮a 󠇮Random 󠇮Forests 󠇮Model”, 󠇮
    R&D project selection,” 󠇮 IEEE 󠇮 Transactions 󠇮 of 󠇮
                                                                             Journal of Machine Learning Research 13 (2012)
    Engineering management, 55(1), 158–170, 2008.
30. Y. H. Sun, J. Ma, Z. P. Fan, and J. Wang, 󠇮“A 󠇮hybrid 󠇮                  1063-1095
    knowledge and model approach for reviewer                            42. Y. Qin, X. Wang, Study on Multi-label Text
    assignment,” 󠇮Expert 󠇮System Applications, 34(2), 817–                   Classification Based on SVM, Sixth International
    824, Feb. 2008.                                                          Conference on Fuzzy Systems and Knowledge
31. M. Allahyari, K. J. Kochut and M. Janik, Ontology-                       Discovery 2009
    based text classification into dynamically defined                   43. https://scikitlearn.org/stable/modules/generated/s
    topics, Semantic Computing (ICSC), 273-278, 2014.                        klearn.ensemble.BaggingClassifier.html
32. R. Prabowo, M. Jackson, P. Burden and H. Knoell,                     44. https://bioportal.bioontology.org/ontologies/DOI
    Ontology-Based Automatic Classification for the Web                      D
    Pages Design Implementation and Evaluation", Proc.                   45. http://people.dbmi.columbia.edu/~friedma/Proje
    Of the 3rd International Conference on Web
                                                                             cts/DiseaseSymptomKB/index.html
    Information Systems Engineering, 2002.
33. F.Y. Osisanwo, J.E.T. Akinsola, O. Awodele, J.O.                     46. Y. Freund and R.E. Schapire, A Short
    Hinmikaiye, O. Olakanmi and J. Akinjobi Supervised                       Introduction 󠇮 to 󠇮 Boosting” 󠇮 Journal 󠇮 of 󠇮 Japanese 󠇮
    Machine Learning Algorithms: Classification and                          Society for Artificial Intelligence, 14(5), 771-780,
                                                              98




    1999.
47. V.N. Garla, C. Brandt, Ontology-Guided Feature
    Engineering for Clinical Text Classification,
    Journal of Biomedical Informatics, 45(5): 992–
    998. doi: 10.1016/j.jbi.2012.04.010
48. https://github.com/nltk/nltk
49. D. Fensel. Ontologies: Silver Bullet for
    Knowledge Management and Electronic
    Commerce. Springer-Verlag, 2001.
50. https://www.forbes.com/sites/forbestechcouncil/
    2019/12/30/explainable-ai-the-rising-role-of-
    knowledge scientists/#62bc6193603f
51. S. Malik, S. Mishra, N. K. Jain, S. Jain. Devising
    a super ontology, Procedia Computer Science PP.
    785–792, 2015.
52. S. Malik, S. Jain. Ontology based context aware
    model. In Proceedings of the international
    conference on computational intelligence in data
    science (ICCIDS), p. 1-6, 2017.
53. S. Jain, Understanding Semantics-based Decision
    Support”, 󠇮 Nov 󠇮 2020, 󠇮 152 󠇮 pages, 󠇮 CRC 󠇮 Press, 󠇮
    Taylor& Francis Group. ISBN: 9780367443139
    (HB)