=Paper=
{{Paper
|id=Vol-2786/Paper16
|storemode=property
|title=Semantic Ontology-Based Approach to Enhance Text Classification
|pdfUrl=https://ceur-ws.org/Vol-2786/Paper16.pdf
|volume=Vol-2786
|authors=Sonika Malik,Sarika Jain
|dblpUrl=https://dblp.org/rec/conf/isic2/Malik021
}}
==Semantic Ontology-Based Approach to Enhance Text Classification==
85
Semantic Ontology-Based Approach to Enhance Text
Classification
Sonika Malika,b, Sarika Jaina
a
National Institute of Technology, Kurukshetra
b
Maharaja Surajmal Institute of Technology
sonika.malik@gmail.com, jasarika@nitkkr.ac.in
Abstract
Text Classification is the process of defining a collection of pre-defined classes to free-text. It has been
one of the most researched areas in machine learning with various applications such as sentiment
analysis, topic labeling, language detection and spam filter etc. The efficiency of text classification
improves, when some relation or pattern in the data is given or known, which can be provided by
ontology. It further helps in reducing the size of dataset. Ontology is a collection of data items that
helps in storing and representing data in a way that preserves the patterns in it and its semantic
relationship with each other. We have attempted to verify the improvement provided by the use of
ontology in classification algorithms. The code prepared in this research and the method developed is
pretty generic, and could be extended to any ontology based text classification system. In this paper, we
present an enhanced architecture that can uses ontology to provide an effective text classification
mechanism. We have introduced an ontology based text classification algorithm by utilizing the rich
semantic information in Disease ontology (DOID). We summarize the existing work and finally
advocate that the ontology based text classification strategy is better as compared to conventional text
classification in terms of different metrics like Accuracy, Precision, Recall, and F-measure etc.
Keywords
Text Classification, Ontology, Semantic AI, Symbolic AI, Statistical AI, Classifier.
1. Introduction There is a plethora of textual data everywhere we look
around, from magazines to journals to papers. There is
a need to systematically categorize and interpret this
The classification of entities based on the available data information without compromising time. Automated
is the foundation for classification techniques. The text classification [1] is one of the most helpful tools for
available data could be of two types- the information this.
that we have on hand, and the information that we have It’s 󠇮one 󠇮of 󠇮the 󠇮most important and rudimentary features
previously used for classification. Either way, an in Natural Language Processing (NLP) [2], with broad
accurate and precise classification relies on the amount applications such as sentiment analysis [3], topic
of information that is available to us. The ways of labelling, spam detection [4], and intent detection [5].
processing and analysing information has been
Text classifier [6] are made and meant to be
transformed through digitization. implemented on a diverse range of textual datasets. Text
ISIC’21: International Semantic Intelligence Conference, February 25-27,
classification can work on both, structured and
2021, Delhi, India unstructured datasets. To understand the process of
📧:sonika.malik@gmail.com (S. Malik); jasarika@nitkkr.ac.in classification and how ontology fits in this process,
(S. Jain)
: 0000-0003-2721-1951(S Malik) 󠇮
there is a hierarchical progression as shown in Figure
©️ 2021 Copyright for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
1.
CEUR Workshop Proceedings (CEUR-WS.org) Artificial Intelligence (AI) is anticipated to produce
hundreds of billions of dollars in economic value.
However, considering that technology forms part of our
86
everyday lives, many people remain suspicious. Their dictionary, which stores the information about entities.
key issue is that AI approaches perform like black- This information usually consists of the features and
boxes and seems to generate ideas without any relations of the said entities [51, 52]. The immense
explanation. In addition, many industries recognised importance of ontology is utilised in the research fields,
knowledge graphs (KGs) as an effective method for such as data science, where, it eases information
data processing, management and enrichment [53]. processing because of its organised structure, as
Although KGs are also increasingly recognisable as a compared to the more conventional ways of processing
foundations of an AI system that makes explainable AI raw data. The formal ontology, thus, represents data in
via 󠇮 the 󠇮 design 󠇮 concept 󠇮 called 󠇮 “Human-in- the-Loop” 󠇮 an organised way and used as a framework [9].
(HITL). The AI’s promise is to automatically derive Ontology based text Classification- For Machine
patterns and rules from massive datasets based on Learning (ML) style classification, algorithms such as
machine learning algorithms such as deep learning. This Naive Bayes (NB) [10] or Support Vector Machine
fits very well with particular issues and helps to simplify (SVM) [11] etc. are used, where we train a model to
classification activities in many situations. The machine read text as feature vectors and output as one of n
learning algorithms gain the knowledge from historical classes. One use of ontology would be to mark-up
information, but they cannot derive new results from it. entities in the text. In our case, we have a medical
Without explanation, there is no confidence. Explain ontology like DOID, whose nodes have information
ability ensures that trustworthy agents in the system are about various diseases, symptoms, medications, etc.
able to understand and justify 󠇮the 󠇮AI 󠇮agent’s 󠇮decisions 󠇮 We could look for these entities in our text and mark
[50]. them as single entity - so for example, if we found the
Semantic AI integrates symbolic AI and statistical AI. string 󠇮“Lung 󠇮Cancer” 󠇮in 󠇮our 󠇮text 󠇮which 󠇮is 󠇮also 󠇮a 󠇮node 󠇮
It incorporates the approaches like machine learning, in our ontology, we could replace all occurrences of
information analysis, semantic web and text mining. It “Lung 󠇮Cancer” 󠇮with 󠇮a 󠇮single 󠇮token 󠇮“Lung_Cancer” 󠇮and 󠇮
combines the benefits of AI techniques, primarily treat this token as a feature for our classification. These
neural networks and semantic reasoning. It is an ontology nodes usually contain multiple versions of the
improvement of the existing framework used primarily string 󠇮that 󠇮represents 󠇮it. 󠇮For 󠇮example, 󠇮“heart 󠇮attack” 󠇮is 󠇮
to create AI-based systems. This brings fast learning also 󠇮 known 󠇮 as 󠇮“myocardial 󠇮 infarction”, 󠇮 so 󠇮 if 󠇮 our 󠇮 text 󠇮
from less trained data, for example chatbots can be contains either string, they could be normalized down
developed without cold-start problem. Semantic AI to one single string and treated as a single feature for
incorporates a radically different approach and classification. For rule-based classifiers such as
therefore complementary skills for additional Bayesian Networks or decision tree algorithms [12], we
stakeholders. Although conventional Machine could also leverage the knowledge in the ontology to
Learning is primarily performed by data or information create generalized rules.
scientists involved in Explainable AI or semantic AI. At The remaining paper has been organised as follows:
the heart of Semantic Enriched Artificial Intelligence Section 2 describes the related work in the field of text
architecture, a semantic knowledge graph is used by classification. Section 3 defines the background
providing the means for a more automated data quality knowledge. Section 4 presents the assessment of
management [7]. For the better quality data and more proposed system. Section 5 describes the comparison
options in feature extraction, semantically enhanced and results and finally paper ends with conclusion and
data works as a base. It gives the better accuracy of future scope.
classification and prediction intended by machine
learning algorithms. Semantic AI aims to have an 2. Related Work
infrastructure to address the knowledge asymmetries
between designers of AI applications and other
stakeholders including customers and decision makers, Angelo A. Salatino, Thiviyan Thanapalasingam,
in 󠇮 direct 󠇮 reference 󠇮 to 󠇮 AI 󠇮 systems 󠇮 which 󠇮 ‘work 󠇮 like 󠇮 Andrea Mannocci, Francesco Osborne and Enrico
Motta [13] came up with the Computer Science
magic’ 󠇮 where only some of the analysts actually
recognise the fundamental techniques [8]. Ontology (CSO). The CSO consists up to twenty-six
Ontology- Ontology specifies a conceptualization of a thousand domains, and as many as two hundred and
domain in terms of concepts, attributes, and relations twenty-six thousand interpretable relations between
[49]. In simple terms, Ontology is analogous to a these domains. To support its availability, they also
87
developed the CSO Portal, a web application which that could be used in classifying the said input research
allows users to explore the ontologies and send paper.
feedback. Angelo A. Salatino, Francesco Osborne and Enrico
Angelo A. Salatino, Francesco Osborne and Enrico Motta [15] presented a CSO classifier for automatic
Motta [14] introduced the CSO Classifier for automatic classification 󠇮of 󠇮academic 󠇮papers 󠇮according 󠇮to 󠇮CSO’s 󠇮
classification of research papers according to the rich taxonomy of subjects. The aim is to promote the
Computer Science Ontology (CSO). It is an acceptance of CSOs through the various communities
unsupervised approach. For every research Meta data, involved in scholarly data and enable the creation of
the CSO takes as input, it returns a list of suitable topics new applications that rely on this knowledge base. This
paper proposed four stages:
Figure 1. Concept Hierarchy in Semantic AI [45, 46]
(a) Constructing research ontology, (b) Classifying clustering of sports related terms, so as to preserve the
new research proposals into disciplines, (c) building semantic meaning behind terms while clustering them.
research proposal clusters using text mining, (d) Nayat Sanchez-Pi, Luis Marti and A.C.B. Garcia [20]
balancing research proposals and regrouping them by presented a probing algorithm for the automatic
considering 󠇮applicants’ 󠇮characteristics. detection of accidents in occupational health control.
Preet Kaur and Richa Sapra [17] also researched in a The proposal has more accurate heuristics because it
similar domain, wherein, they proposed Ontology- contrasts the relevance of techniques used with the
Based text mining methods for classification of terms. The basic accident detection problem is divided
research proposals as well as external research into three parts: (i) text analysis, (ii) recognition and
reviewers. (iii) 󠇮 classification 󠇮 of 󠇮 failed 󠇮 techniques which caused
Chaaminda Manjula Wijewickrema and Ruwan accidents.
Gamage [18] addressed the fallacies in manual Decker [21] presented a different approach to
classification and proposed ontology based methods for categorize research papers by using the words present
fully automatic text classification. in the papers abstract. It is an unsupervised method
A Sudha Ramkumar, B Poorna and B. Saleena [19] which evaluates the relevance of suitable topics for the
used WordNet ontology to perform ontology based research paper on various time scales.
88
Herrera et al. [22] devised a way to categorize research According to the official documentation, the Natural
papers specific to the domain of physics. They did this Language Toolkit (NLTK) [48] is a platform used for
with the help of PACS, which stands for Physics and building Python programs that work with human
Astronomy Classification Scheme. They created a language data for applying in statistical Natural
network like structure where, a PACS code was Language Processing (NLP). It is a useful tool in
assigned to every topic node, and a connection between python, which helps in processing a diverse range of
two nodes was possible only if their codes co-occur languages by providing algorithms for it. This tool is
together in at least one paper. powerful because it is free and open source. Also, one
Ohniwa et al. [23] gave a similar analysis in the field of does not need to look for any special tutorials when
biomedicine. They used the Medical Subject Heading using the NLTK, as its official documentation is very
(MeSH). well described. The most common algorithms used in
Mai et al [24] showed that the performance of their NLTK are tokenization, lemmatization, part of speech
model, which was only trained using titles, was as good tagging etc. These algorithms are essentially used to
as the models trained by mining the full texts of papers preprocess textual data. The preprocessing takes place
and articles. They developed their approach using deep in five parts:
learning techniques. As training set, they used Tokenization- A token is the fundamental building
scientific papers from EconBiz and PubMed, block of any linguistic structure, such as a sentence or
respectively annotated with the STW Thesaurus for a paragraph. The process of tokenization is to break
Economics (approximately five thousand classes) and these structures down into tokens. Tokenizer could be
MeSH (approximately twenty-seven thousand classes). of two types – a 󠇮 sentence 󠇮 tokenizer’s 󠇮 tokens 󠇮 are 󠇮
Cook et al. [25] developed a method of allocation of sentences. It, therefore, breaks paragraphs down into
papers to reviewers optimally, to aid the selection sentences. A work tokenizer identifies words as tokens.
process. It, hence, disintegrates sentences into words.
Arya and Mittendorf [26] suggested a rotation based Stemming- A stem is the root word or phrase from
method for the assignment of projects. which different forms of that word could be derived.
Choi and Park [27] offered a solution for Research and Stemming is the process of identifying all the words
Development proposal classification, which was text that were derived from the same stem and reduce them
mining based. or normalize them back to their stem form. For
Girotra [28] proposed a study for the evaluation of example, connection, connected, connecting word
portfolio projects. reduce to a common word "connect".
Sun et al. [29, 30] developed a mechanism for Lemmatization- Sometimes, we may encounter words
assessment of reviewers, who would evaluate the that have different stems but the same final meaning. In
research papers. Mehdi Allahyari, Krys J. Kochut and such a case, there is a need for a dictionary lookup to
Maciej Janik [31] proposed a way of dynamic further reduce the stems to their common meaning, or
classification of textual records in dynamically the base word. This base word is known as lemma, and
generated classes. hence the name lemmatization. For example, the word
Rudy Prabowo, Mike Jackson, Peter Burden and "better" has "good" as its lemma. Such cases are missed
Heinz-Dieter Knoell [32] developed a web page out during stemming because these two words not at all
classifier. Its classification was with reference to the alike, and would need a dictionary lookup where, their
Dewey Decimal System and Library of Congress meanings can confirm the lemma.
Classification schemes. POS Tagging- It stands for Part of Speech, and just as
the name suggests, it identifies the various parts of a
3. Background Knowledge linguistic structure like a sentence. The different parts
could be an adjective or a noun or a verb. It does so by
studying the structure of the sentences and observing
In this section we have discussed the pre-processing the arrangement of words and the relation between the
steps for textual data and Machine Learning classifiers
various words.
that are being used in our research.
3.2. Text classification and classifiers
3.1. Pre-Processing Textual data
89
The idea behind text classification is to group text into multi-class scenario with three classes- A, B and C.
categories using machine learning. It finds use in many Using Naïve Bayes, we try to predict whether the data
relevant areas such as sentiment analysis, emotion point x belongs to class A or B or C, by calculating its
analysis, etc. There have been many classifiers probability for the three classes as given in Eq. 1.
developed for each classification category. As stated
previously, text classifier is made and meant to be 𝑃(𝐴|𝐵) = (𝑃(𝐵|𝐴).𝑃(𝐴))/𝑃(𝐵)
implemented on a diverse range of textual datasets. Text (1)
classification can work on both, structured and This 󠇮algorithm 󠇮is 󠇮called 󠇮‘Naïve’ 󠇮because it assumes that
unstructured datasets. Both types of datasets find all the features are independent of each other as defined
numerous applications in various fields. The in Eq. 2.
Classification process in machine learning can be
explained very simply. First, we assess and analyze the 𝑃(𝑓1 , 𝑓2 , 𝑓3 … 𝑓𝑛 ) = 𝑃(𝑓1 ) = 𝑃(𝑓2 ) = ⋯ 𝑃(𝑓𝑛 )
training dataset for boundary condition purposes. Next, (2)
we predict the class for new data using the information There are further two categories of the NB Classifier
obtained and learned during the training phase. This is one is Gaussian NB Classifier and other one is
essentially the whole process of classification. multinomial NB Classifier.
Classification could be either supervised or The Gaussian Naïve-Bayes classifier is used when a
unsupervised. Supervised classification [33] of works dataset has continuous values of data. It uses the
on the principle of training and testing, and uses labeled Gaussian Probability Distribution function (values are
data, i.e. predefined classes, for prediction. In the centered on mean and as the graph grows, the values
training phase, the model is made to learn some decrease). The Multinomial Naive Bayes algorithm
predefined classes by feeding it labeled or tagged data. assumes the independence of features, and the
In 󠇮 the 󠇮 testing 󠇮 phase, 󠇮 the 󠇮 efficiency 󠇮 of 󠇮 the 󠇮 model’s 󠇮 multinomial component of this classifier ensures that
prediction or classification is measured by feeding it the distribution is multinomial in its features.
unobserved data. In other words, it can only predict (b) Decision Tree [38] - It is a highly intuitive
those classes in the testing phase which, it has learnt in algorithm which uses greedy approach. To construct a
the training phase. Some common examples of decision tree, we have to perform the following steps –
supervised classification are spam filters, intent (1) select a feature to split the data, (2) select a method
detection, etc. Unsupervised classification [34, 35] to split the data on the said feature. It has the internal
involves classification by the model without being fed working algorithm as: (i) Create/Select a node. (ii) If
the external information. In this, the algorithm of the the node is pure, output the only class. (iii) If no feature
model tries to group or cluster data points based on is left to split upon, and the node is impure, output the
similar traits, patterns and other common features that majority class. (iv) Else find the best feature to split
can be used to tie two data points together. A common upon. Recursively call on this split. Go to b.
example where unsupervised classification is really (c) K-Nearest Neighbor [38, 40] - Consider a scenario
helpful is the search engines. They create data clusters where we have to predict to which class, the testing
based on insights generated from previous searches. point belongs to, by considering all the features at once.
This type of classification is extremely customizable Such is the working of KNN algorithm as shown in
and dynamic as there is no need for training and tagging Figure 2. To predict the class of the testing data point,
for it to work on textual datasets. Thus, the we check its vicinity. To classify the testing point, we
unsupervised classification is language compatible.
The classifiers used for text-classification could be ML
based, such as Naïve-Bayes Classifier, Decision Tree
Classifier [36] etc., or it can be based on Neural
Network architecture such as Artificial Neural
Network, Convolutional Neural Network etc. [37].
The machine learning based classifiers that can be used
for text classification are:
Figure 2. KNN
(a) Naive Bayes classifier [38, 39] - It uses the Bayes
theorem to predict values. This algorithm is good for check a specific number of points (1, 3, 5, 7, etc.) and
multi-class classification. Consider a data point x, in a whichever class is in majority among those, that one is
90
predicted. To select the nearest point, we have to (g) Bagging Classifier [43]: A Bagging Classifier is an
consider its distance from the other points. The distance ensemble Meta estimating system that fits base
metric can be (a) Manhattan distance, (b) Euclidian classifiers in each of the random subsets of the original
Distance, (c) Minkowski distance. data sets and then combines their individual predictions
(d) Random forest [38, 41] - It is an extension of the to form a final prediction. Usually, such a meta-
decision tree classifier. This algorithm uses multiple estimator can be used to minimize the variance of a
combinations of decision trees to accurately predict black-box estimator by randomization.
testing data. The random forest classifier overcomes
the over-fitting problem of decision trees by building 4. Proposed Study
multiple decision trees and going with the majority
result. 󠇮The 󠇮trees’ 󠇮outputs 󠇮vary 󠇮because 󠇮each 󠇮tree 󠇮is 󠇮built 󠇮
with random data and random features. To generate The classification by Machine Learning algorithms is
randomness in trees, we use two techniques- supposed to improve with the use of ontology. We aim
(i) 󠇮 Bagging: 󠇮 If 󠇮 we 󠇮 have 󠇮 ‘m’ 󠇮 data 󠇮 points, 󠇮 we 󠇮 select 󠇮 a 󠇮 to verify this fact by studying and comparing values of
metrics such as accuracy, precision, recall and F1 score
subset 󠇮of 󠇮‘k’ 󠇮out 󠇮of them. 󠇮For 󠇮‘n’ 󠇮trees, 󠇮n*k 󠇮subsets 󠇮are 󠇮
selected. Data points can be considered with for ontology based text classification and conventional
text classification.
replacement as the selection is random; therefore, these
trees are called bag trees.
(ii) Feature Selection: In the training phase, some 4.1. Conventional Text Classification
features are selected at random in this technique, with
the condition that the selection is performed without In the conventional classification the framework had
replacement. three main phases, (i) Dataset generation (ii) Model
(e) SVM Classifier [38, 42] - It is a very powerful training and testing (iii) Analyzing/Classifying results
algorithm and overcomes the limitation of logistic as shown in Figure 3.
regression. As logistic regression uses sigmoid1. 1. Dataset Generation: A premature knowledge
function, the value predicted for a testing data point is database of disease-symptom associations was
close to 0.5. This causes the problem of incorrect available on [45] which consist of three columns named
prediction. So, SVM uses the rules of logistic as disease name, count of disease occurrence and the
regression only, but exponentially increases the value, symptoms; however, it needed modification to be used
so that the values predicted do not fall in the range (-1, for our research. Also some new information was
1). added to the dataset so that matching could be done
This cost function changes to the following equation precisely. Thus the final dataset created, is the one that
in SVM as given in Eq.2. was used for this proposed research. The modified
dataset and the ontology are compatible as they consist
(𝜃) = 𝐶 ∑[ 𝑦 (𝑖) 𝑐𝑜𝑠𝑡1 (𝜃 𝑇 𝑥 (𝑖) ) + (1 − 𝑦 (𝑖) 𝑐𝑜𝑠𝑡0 (𝜃 𝑇 𝑥 (𝑖) )] + of 󠇮classification/output 󠇮feature 󠇮“disease 󠇮name” 󠇮and 󠇮the 󠇮
0.5 ∑(𝜃𝑖 )2 matching 󠇮 feature 󠇮 “disease 󠇮 description”. 󠇮 After 󠇮 the 󠇮
(3) ontology and a working dataset were obtained, cleaning
(f) Logistic Regression [38]: It is a primitive and preprocessing of the dataset was done, NLTK is
classification algorithm which uses the sigmoid used for processing the dataset. A synthetic dataset is
function as in Eq. 4 at its core to perform classification. also generated which involves creating new data using
1
𝐹(𝑥) = programming techniques. In this research we created
1 + 𝑒 −𝑥
(4) multiple entries using the random feature value
As the sigmoid has an exponential function, the graph selection of same class. For example, consider a disease
having 10 symptoms. We randomly select a subset of
moves exponentially either towards 0 or 1 with a slight
change in x. these 10 symptoms and generate a new entry for the
The cost function of the binary logistic regression is dataset involving fewer symptoms and the disease
given in Eq. 5. name. This process helps to bind the symptom values
𝐸(ℎ(𝑥))= ∑(−𝑦𝑖 log (ℎ(𝑥)) − (1 − 𝑦𝑖 ) log (1 − ℎ(𝑥)))
to the disease and generate strong positive relation
(5) between feature values (symptoms) and class (disease).
2. Model Training/Testing: This phase involves using
dataset and applying machine learning classifiers to it.
91
As the dataset initially contains text keywords, it needs 2. 3. Analyzing/ Classifying Results: To analyze the
to be converted into numbers using count vectorizer results, we compare the disease predictions for the
module. After this the training data is ready for feeding testing data with the actual disease class. After
to the classifier for training. The classifiers used are comparing we calculate the classification metrics like
KNN, SVM, Logistic Regression, Decision Tree and accuracy, precision, recall, F1-score. After this
Random Forest etc. After training we can use the model computation we can compare the performance of
for predictions on testing data. The ratio of training and multiple classifiers based on metrics. Also we can
testing data is 80 and 20 respectively. verify which classes seem to perform well base on
individual class-wise precision and recall values.
Figure 3. Conventional Text Classification
to optimize it. The presented methods/phases are: (i)
4.2. Ontology Based Text Dataset generation, (ii) Ontology Matching (iii)
Model Training and Testing iv) Analyzing and
Classification classifying results as shown in Figure 4 (b).
The phases i, iii and iv are explained earlier in section
For the purpose of this research, we have used the 4.1.
Human Disease Ontology, which was hosted at the Ontology Matching: In this phase the keywords
website for the Institute of Genome Sciences, formed from the description of the disease are
Maryland School of Medicine [44]. This ontology is matched with the keywords of ontology nodes. All
comprehensive hierarchical controlled vocabulary for the matched nodes are possible classes which can be
human disease representation. It consists of unique used to create the subset of the data for efficient
label for each disease which acts as identifier. The model training. The use of priority based matching
owl file of the ontology was exported to csv file using helps us to further limit classes. In our research for
Protégé. We have presented a second phase between ontology matching each keyword is assigned two
dataset generation and Model training/Testing, in numbers to specify its priority. The first number
which a hybrid approach for text classification is used describes frequency of the keyword and second
number describes whether the keyword can be
92
lemmatized or not. If it cannot be lemmatized it is number). The steps for ontology matching are given
assigned as 1 otherwise 0. Thus each keyword has in Figure 4(a).
syntax (name, first priority number, second priority
Figure 4. (a) Ontology Matching (b) Ontology Based Text Classification
Algorithm 1: Ontology Based Text Classification indices_to_use. append[i]
else
continue
The Ontology matching function used in this Algorithm refers reduced_x_train, reduced_y_train = reducing_dataset
to Algorithm 4.2 (x_train, y_train, indices_to_use)
DOID: Disease Ontology
data_x, data_t= synthetic_data_generation (Knowledge_base)
training_data = count_vectorizer.fit_transform (reduced.
// Phase1 x_train) //Phase 3
x_train, x_test, y_train, y_test =train_test_split (data_x, data_y) testing_data = count_vectorizer. transform (x_test[z])
Ontology_tree= Loading_Ontology () classifer.fit (training_data, reduced_y_train)
//Phase 2 prediction = classifier. predict(testing_data)
for i in range (1, len (Knowledge_base)) predictions. append (prediction)
Keywords_for_matching =Keywords_formation
accuracy_score = accuracy (predictions, y_test)
(Knowledge_base)
//Phase 4
all_classes= set (data_y)
precision_score = precision (predictions, y_test)
for z in range (0, len (x_test))
recall_score = recall (predictions, y_test)
keywords=keywords_selection (Keywords_for_matching,
F1-score = F1-score (predictions, y_test)
y_test[z])
possible_classes=Ontology_matching (tree, keywords)
for i in range (0, len (possible_classes))
Algorithm 2: Ontology Matching
for j in range (0, len (all_classes))
if set (word_tokenize (possible _classes[i])). subset (set
def ontology_mathing (tree, keywords):
(word_tokenize (all_classes[j]))) nodes_to_search= []
final_classes. append(all_classes[j]) found_nodes = []
else priority_1 = []
continue priority_2 = []
for i in range (0, len(y_train)) priority_3 = []
if y_train[i] in final _classes for i in range (0, len (keywords)):
93
if keywords[i][1]! = 1: TP+TN
priority_1. append (keywords[i][0]) Accuracy =
else:
𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁
if keywords[i][2] == 1:
priority_2. append (keywords[i][0]) Precision- Sometimes, a classifier may label a class
else: as true for classification of some raw data, when in
priority_3. append (keywords[i][0]) fact, it should have been false. This is the case of a
priority_1_count = 0 false positive. Precision takes into account the false
priority_2_count = 0
priority_3_count = 0 positives as well.
for i in range (0, len(tree)): 𝑇𝑃
Precision =
priority_1_count = 0 𝑇𝑃+𝐹𝑃
priority_2_count = 0 Recall- When a classifier marks a class as negative
priority_3_count = 0 for an unobserved data item, when in fact, it
for j in range (0, len(priority_1)):
if priority_1[j]. lower () in [x. lower () for x in tree[i].
should’ve 󠇮been 󠇮true, 󠇮it 󠇮is 󠇮a 󠇮case 󠇮of 󠇮a 󠇮false 󠇮negative. 󠇮
keywords]: Recall accounts for the sensitivity of a model by
priority_1_count=priority_1_count+1 taking into account the false negatives.
else: 𝑇𝑃
continue Recall =
for j in range (0, len(priority_2)):
𝑇𝑃+𝐹𝑁
if priority_2[j]. lower () in [x. lower () for x in tree[i].
F1 Score is the weighted average of Precision and
keywords]: Recall. Therefore, it factors in false positives as well
priority_2_count = priority_2_count + 1 as false negatives. In case of an uneven class
else: distribution, F1 score becomes more important than
continue accuracy. Other times, when false negatives and
for j in range (0, len (priority_3)):
if priority_3[j]. lower () in [x. lower () for x in tree[i]. positives have the same cost, accuracy may be treated
keywords]: priority_3_count = priority_3_count + 1 as the superior evaluation parameter.
else: 2∗(𝑅𝑒𝑐𝑎𝑙𝑙∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
continue F1 Score =
(𝑅𝑒𝑐𝑎𝑙𝑙+𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
if priority_1_count+priority_2_count+
priority_3_count>=1: Where TP- True Positive
found_nodes. append ((tree[i]. entity)) TN- True Negative
else: FP- False Positive
continue FN- False Negative
classes= []
for i in range (0, len(found_nodes)):
In reference to the Table 1, the values of the metrics
classes. append(found_nodes[i]) Accuracy, Precision recall and F1-score have the
classes=list(set(classes)) same magnitude. This is due to the fact that the FP &
# print(classes) FN values are same in magnitude as there is less
return classes number of records per disease. This results in an
equal value of precision, recall, F1-score for each
5. Classification of Various Algorithms class as shown in Table 2.
with and without Ontology It can also be observed that decision tree classifier
shows the highest boost in accuracy, precision, recall
Parameters used for evaluation and comparison of the and F1 score being 0.75 for simple classification, and
model when used with ontology v/s when used 0.85 for ontology based classification. There is a 10%
without ontology is accuracy, precision, recall, and improvement in metrics for 500 test cases and 6% for
F1 score. 100 test cases for decision tree classifier. It is
Accuracy describes the intuitiveness achieved by a followed by the KNN Classifier, which shows the 5%
model after training. It takes into account all the improvement for both 100 & 500 test cases. Rest
correctly predicted observations from the list of all other classifiers have shown improvement in metrics
predictions. of around 1% - 3% as shown in Table 1.
Accuracy = Number of correct predictions/ total
number of Predictions
94
Table 1
Comparison Table for different classifiers based on ontology and without ontology 100 & 500 Test Cases
Classifier Parameter 100 Test Cases 500 Test Cases
Without With Ontology Without With Ontology
Ontology Ontology
Naïve-Bayes Accuracy 0.98 1.0 0.97 0.978
Precision 0.98 1.0 0.97 0.978
Recall 0.98 1.0 0.97 0.978
F1 Score 0.98 1.0 0.97 0.978
Decision Tree Accuracy 0.74 0.80 0.758 0.852
Precision 0.74 0.80 0.758 0.852
Recall 0.74 0.80 0.758 0.852
F1 Score 0.74 0.80 0.758 0.852
KNN Accuracy 0.81 0.86 0.818 0.854
Precision 0.81 0.86 0.818 0.854
Recall 0.81 0.86 0.818 0.854
F1 Score 0.81 0.86 0.818 0.854
Random Forest Accuracy 0.96 0.97 0.932 0.952
Precision 0.96 0.97 0.932 0.952
Recall 0.96 0.97 0.932 0.952
F1 Score 0.96 0.97 0.932 0.952
SVM Accuracy 0.98 1.0 0.986 0.992
Precision 0.98 1.0 0.986 0.992
Recall 0.98 1.0 0.986 0.992
F1 Score 0.98 1.0 0.986 0.992
Bagging Accuracy 0.87 0.89 0.894 0.912
Precision 0.87 0.89 0.894 0.912
Recall 0.87 0.89 0.894 0.912
F1 Score 0.87 0.89 0.894 0.912
Logistic Accuracy 0.99 1.0 0.982 0.99
Regression Precision 0.99 1.0 0.982 0.99
Recall 0.99 1.0 0.982 0.99
F1 Score 0.99 1.0 0.982 0.99
Table 2
Values of Precision, recall & F1-score for SVM classifier for each class
Diseases Precision Recall F1-Score
Alcohol Dependence 1.0 1.0 1.0
Influenza 1.0 1.0 1.0
Neoplasm 1.0 1.0 1.0
Hernia 1.0 1.0 1.0
Fibrous Tumor 1.0 1.0 1.0
Osteomyelitis 1.0 1.0 1.0
Pancreatitis 1.0 1.0 1.0
Cholecystitis 1.0 1.0 1.0
Pneumocystis 1.0 1.0 1.0
95
Dysphagia 1.0 1.0 1.0
Psychotic disorder 1.0 1.0 1.0
Hepatitis C 1.0 1.0 1.0
Cerebrovascular Disease 1.0 1.0 1.0
Depression 1.0 1.0 1.0
Hepatitis 1.0 1.0 1.0
Hypothyroidism 1.0 1.0 1.0
Hepatitis B 1.0 1.0 1.0
Arthrogryposis 1.0 1.0 1.0
Manic Disorder 1.0 1.0 1.0
Tonic-clonic Seizures 1.0 1.0 1.0
Migraine 1.0 1.0 1.0
Anxiety 1.0 1.0 1.0
Hepatocellular Carcinoma 1.0 1.0 1.0
Asthma 1.0 1.0 1.0
Congestive Heart Failure 1.0 1.0 1.0
Hypertensive 1.0 1.0 1.0
Chronic Kidney 1.0 1.0 1.0
Cirrhosis 1.0 1.0 1.0
Blood Coagulation 1.0 1.0 1.0
Melanoma 1.0 1.0 1.0
Lymphatic System Disease 1.0 1.0 1.0
Dependence 1.0 1.0 1.0
Bipolar Disorder 1.0 1.0 1.0
Candidiasis 1.0 1.0 1.0
Hypoglycemia 1.0 1.0 1.0
Sinus Tachycardia 1.0 1.0 1.0
Transient Cerebral Ischemia 1.0 1.0 1.0
The order of classifier w.r.t its magnitude is as follows: Naïve Bayes Classifier < SVM classifier < Logistic
Regression.
The bagging classifier follows next with the values of the
parameters being 0.87 each in simple text classification,
and 0.89 in ontology based classification. Random Forest
6. Conclusion and Future Scope
Classifier comes next, as it shows the values of the
parameters as 0.96 and 0.97 in each classification case. In this paper, the observations show that the ontology
Naïve-Bayes Classifier and the SVM classifier have the based classification stands at a higher level than the
same values of all the parameters as 0.98 for simple classification without ontology. The general pattern
classification and 1.0 for ontology based text indicates towards a more accurate and precise
classification. Now we will come to Logistic Regression, classification using an ontology. All the parameters that
which can be labelled as best classifier with the values of were used (accuracy, precision, recall, and F1 Score)
metrics as 0.99 for simple Text classification while 1.0 for showed an elevation of 1% to 3% when the classification
ontology based text classification. There is minute was done with the help of ontology. It can be deduced that
difference in the value of metrics of these classifiers. Also using the ontology increased the efficiency of
the order of the increasing accuracy of various classifiers classification. This advantage can be attributed to the fact
(with or without using ontology) goes on as: the number of possible classes for classification reduced
Decision Tree Classifier < KNN Classifier < while, in turn, reduces the time taken for training purpose.
Bagging Classifier < Random Forest Classifier < The results also indicate towards a more comparable
accuracy level amongst the classifiers when the ontology
96
was used. his study, while proving the importance and 10. K.A. Vidhya, G. Aghila, 󠇮“A 󠇮Survey 󠇮of 󠇮Naïve 󠇮Bayes 󠇮
benefits of ontology, still has a lot of scope for future Machine Learning approach in Text Document
improvements. Future work is needed to improve the Classification”, 󠇮 (IJCSIS) 󠇮 International 󠇮 Journal 󠇮 of 󠇮
dataset of diseases used in this project, as there was no Computer Science and Information Security, 7, 2010.
official dataset available for the human disease ontology. 11. T. JOACHIMS, Text categorization with support
Most of the work has been done on a limited dataset vector machines: learning with many relevant
obtained from converting the available ontology into a features. In Proceedings of ECML-98, 10th European
dataset. There is also a need to optimize the code used for Conference on Machine Learning (Chemnitz,
ontology matching after the data preprocessing has been Germany, 1998), 137-142.
done. One could also move on from machine learning 12. D. E. Johnson, F. J. Oles, T. Zhang, T. 󠇮 Goetz, 󠇮 “A 󠇮
towards deep learning and build a neural network for this decision-tree-based symbolic rule induction system
dataset to further improve the results of the classification for text Categorization”, 󠇮 by 󠇮 IBM systems journal,
in the future. 41(3) 2002.
13. A.A. Salatino, T. Thanapalasingam, A. Mannocci, F.
References Osborne and E. Motta, 󠇮“Classifying 󠇮Research 󠇮Papers 󠇮
with 󠇮 the 󠇮 Computer 󠇮 Science 󠇮 Ontology,” 󠇮 Knowledge 󠇮
Media Institute, The Open University, MK7 6AA,
1. V. Korde “Text classification and classifiers: A Milton Keynes, UK, 2018.
survey,” 󠇮 International Journal of Artificial 14. A.A. Salatino, F. Osborne and E. Motta, The
Intelligence & Applications (IJAIA) 3(2) (2012): 85-
Computer Science Ontology: A Large-Scale
99.
Taxonomy of Research Areas, 17th International
2. A. Gelbukh, Natural Language Processing, IEEE Fifth Semantic Web Conference, Monterey, CA, USA,
International Conference on Hybrid Intelligent October 8-12, 2018, Proceedings, Part II
Systems (HIS'05), Rio de Janeiro, Brazil (2006). 15. A. Salatino, F. Osborne, E. Motta. The CSO classifier:
3. A. Agarwal, B. Xie, I. Vovsha, O. Rambow and R. Ontology-driven detection of research topics in
Passonneau, 󠇮 “Sentiment 󠇮 Analysis 󠇮 of 󠇮 Twitter 󠇮 Data scholarly articles. In: A. Doucet et al. (eds.) TPDL
(2011). In Proc. WLSM-11. 2019: 23rd International Conference on Theory and
4. Bhowmick and S.M. Hazarika, E-mail spam filtering: Practice of Digital Libraries. Cham, Switzerland:
a review of techniques and trends. In: Kalam A, Das Springer, 2019, pp. 296–311. doi: 10.1007/978- 3-
S, Sharma K (eds) Advances in electronics, 030-30760-8_26.
communication and computing. Lecture notes in 16. J. Ma, W. Xu, Y. Sun, E. Turban, S. Wang, O. Liu,
electrical engineering, 443. Springer, Singapore, 583– “An 󠇮Ontology-Based Text-Mining Method to Cluster
590. 2018. https://doi.org/10.1007/978-981-10-4765- Proposals 󠇮 for 󠇮 Research 󠇮 Project 󠇮 Selection,” 󠇮 IEEE 󠇮
7_61 Transactions on Systems, Man, and Cybernetics Part
5. S. Akulick and E.S.Mahmoud, Intent Detection A: Systems and Humans, 42(3), 2012.
through Text Mining and Analysis. In Proceedings of 17. P. Kaur, R. Sapra, “Ontology 󠇮Based 󠇮Classification and
the Future Technologies Conference (FTC), Clustering of Research Proposals and External
Vancouver, Canada, 29–30 November 2017; 493– Research 󠇮 Reviewers”, 󠇮 International 󠇮 Journal 󠇮 of 󠇮
496. Computers & Technology, 5(1) 2013, ISSN 2277-
6. K.Das and R.N. Behera, 󠇮 “A 󠇮 Survey 󠇮 on 󠇮 Machine 󠇮 3061
Learning: 󠇮 Concept, 󠇮 Algorithms 󠇮 and 󠇮 Applications,” 󠇮 18. C.M. Wijewickrema, R. Gamage, An Ontology Based
International Journal of Innovative Research in Fully Automatic Document Classification System
Computer and Communication Engineering 2(2), Using an Existing Semi-Automatic System, National
2017. Institute of Library and Information Sciences,
7. Blumauer, PoolParty Semantic Suite, University of Colombo, Colombo, Sri Lanka, 2013.
2018,URL:https://www.poolparty.biz/semantic-ai/ 19. Sudha Ramkumar, B. Poorna, B. Saleena, Ontology
8. https://www.slideshare.net/semwebcompany/semanti based text document clustering for sports, Journal of
c-ai Engineering and Applied Sciences, 2018.
9. T. Berners-Lee, James Hendler and Ora Lassila, 20. Nayat Sanchez-Pi, Luis Marti and A.C.B. Garcia,
Scientific American: Feature Article: The Semantic “Improving 󠇮 ontology-based 󠇮 text 󠇮 classification: 󠇮 An 󠇮
Web: May 2001 occupational 󠇮health 󠇮and 󠇮security 󠇮application,” 󠇮Article
97
Journal of Applied Logic · September 2015 Comparison, International Journal of Computer
21. S.L. Decker, B. Aleman-meza, D. Cameron and I.B. Trends and Technology (IJCTT), 48(3), 2017.
Arpinar, Detection of Bursty and Emerging Trends 34. M. Khanum, T. Mahboob, W. Imtiaz, H.A. Ghafoor
towards Identification of Researchers, the Early Stage and R. Sehar, A Survey on Unsupervised Machine
of Trends (2007). Learning Algorithms for Automation, Classification
22. M. Herrera, D.C. Roberts and N. Gulbahce, Mapping and Maintenance, International Journal of Computer
the evolution of scientific fields. PLoS ONE. 5(5), Applications (0975 – 8887) 119(13), 2015.
2010. 35. Q. Guo, J. Wentian, S. Zhong and E. Zhou, "The
23. R.L. Ohniwa, A. Hibino and K. Takeyasu, Trends in Analysis of the Ontology-based K-Means Clustering
research foci in life science fields over the last 30 years Algorithm", Proceedings of the 2nd International
monitored by emerging topics, Scientometrics. 85(1), Conference on Computer Science and Electronics
2010. Engineering (ICCSEE 2013), [online] Available:
24. F. Mai, L. Galke, and A. Scherp, Using Deep Learning https://www.atlantis-press.com/proceedings/iccsee-
for Title Based Semantic Subject Indexing to Reach 13/4617.
Competitive Performance to Full-Text, JCDL 󠇮 ’18 󠇮 36. P. Vateekul and M. Kubat, Fast Induction of Multiple
Proceedings of the 18th ACM/IEEE on Joint Decision Trees in Text Categorization From Large
Conference on Digital Libraries (Fort Worth, Texas, Scale,Imbalanced, and Multi-label Data, IEEE
USA, Jun. 2018) International Conference on Data
25. W. D. Cook, B. Golany, M. Kress, M. Penn, and T. MiningWorkshops 2009.
Raviv, 󠇮“Optimal 󠇮allocation 󠇮of 󠇮proposals to reviewers 37. S. Dargan, M. Kumar, M. Ayyagari and G.
to 󠇮 facilitate 󠇮 effective 󠇮 ranking,” 󠇮 Manage. 󠇮 Sci., 󠇮 51(4), Kumar, A Survey of Deep Learning and Its
655–661, 2005. Applications: A New Paradigm to Machine
26. Arya 󠇮 and 󠇮 B. 󠇮 Mittendorf, 󠇮 “Project 󠇮 assignment 󠇮 when 󠇮
Learning, Springer, June 2019.
budget 󠇮 padding 󠇮 taints 󠇮 resource 󠇮 allocation,” 󠇮 Manage. 󠇮
Sci., vol. 52, no. 9, pp. 1345–1358, Sep. 2006. 38. https://jmlr.csail.mit.edu/papers/v12/pedregosa1
27. Choi and Y. Park, 󠇮“R&D 󠇮 proposal 󠇮screening 󠇮system 󠇮 1a.html
based on text mining 󠇮approach,” 󠇮Int. 󠇮J. Technol. Intell. 39. S. Xu, Y. Li and Z. Wang, Bayesian multinomial
Plan, 2(1), 61–72, 2006. naïve bayes classifier to text classification. In:
28. K. 󠇮Girotra, 󠇮C. 󠇮Terwiesch, 󠇮and 󠇮K. 󠇮T. 󠇮Ulrich, 󠇮“Valuing 󠇮 Advanced multimedia and ubiquitous
R&D projects in a portfolio: Evidence from the engineering. Springer, 347–352, 2017.
pharmaceutical 󠇮industry,” 󠇮Manage. Sci., 53(9) 1452– 40. G. Guo, H. Wang, D. Bell, Y. Bi and K. Greer,
1466, 2007. KNN Model-Based Approach in Classification,
29. Y. 󠇮H. 󠇮Sun, 󠇮J. 󠇮Ma, 󠇮Z. 󠇮P. 󠇮Fan, 󠇮and 󠇮J. 󠇮Wang, 󠇮“A 󠇮group 󠇮 Proc. ODBASE pp- 986 – 996, 2003
decision support approach to evaluate experts for
41. G. 󠇮Biau, 󠇮“Analysis 󠇮of 󠇮a 󠇮Random 󠇮Forests 󠇮Model”, 󠇮
R&D project selection,” 󠇮 IEEE 󠇮 Transactions 󠇮 of 󠇮
Journal of Machine Learning Research 13 (2012)
Engineering management, 55(1), 158–170, 2008.
30. Y. H. Sun, J. Ma, Z. P. Fan, and J. Wang, 󠇮“A 󠇮hybrid 󠇮 1063-1095
knowledge and model approach for reviewer 42. Y. Qin, X. Wang, Study on Multi-label Text
assignment,” 󠇮Expert 󠇮System Applications, 34(2), 817– Classification Based on SVM, Sixth International
824, Feb. 2008. Conference on Fuzzy Systems and Knowledge
31. M. Allahyari, K. J. Kochut and M. Janik, Ontology- Discovery 2009
based text classification into dynamically defined 43. https://scikitlearn.org/stable/modules/generated/s
topics, Semantic Computing (ICSC), 273-278, 2014. klearn.ensemble.BaggingClassifier.html
32. R. Prabowo, M. Jackson, P. Burden and H. Knoell, 44. https://bioportal.bioontology.org/ontologies/DOI
Ontology-Based Automatic Classification for the Web D
Pages Design Implementation and Evaluation", Proc. 45. http://people.dbmi.columbia.edu/~friedma/Proje
Of the 3rd International Conference on Web
cts/DiseaseSymptomKB/index.html
Information Systems Engineering, 2002.
33. F.Y. Osisanwo, J.E.T. Akinsola, O. Awodele, J.O. 46. Y. Freund and R.E. Schapire, A Short
Hinmikaiye, O. Olakanmi and J. Akinjobi Supervised Introduction 󠇮 to 󠇮 Boosting” 󠇮 Journal 󠇮 of 󠇮 Japanese 󠇮
Machine Learning Algorithms: Classification and Society for Artificial Intelligence, 14(5), 771-780,
98
1999.
47. V.N. Garla, C. Brandt, Ontology-Guided Feature
Engineering for Clinical Text Classification,
Journal of Biomedical Informatics, 45(5): 992–
998. doi: 10.1016/j.jbi.2012.04.010
48. https://github.com/nltk/nltk
49. D. Fensel. Ontologies: Silver Bullet for
Knowledge Management and Electronic
Commerce. Springer-Verlag, 2001.
50. https://www.forbes.com/sites/forbestechcouncil/
2019/12/30/explainable-ai-the-rising-role-of-
knowledge scientists/#62bc6193603f
51. S. Malik, S. Mishra, N. K. Jain, S. Jain. Devising
a super ontology, Procedia Computer Science PP.
785–792, 2015.
52. S. Malik, S. Jain. Ontology based context aware
model. In Proceedings of the international
conference on computational intelligence in data
science (ICCIDS), p. 1-6, 2017.
53. S. Jain, Understanding Semantics-based Decision
Support”, 󠇮 Nov 󠇮 2020, 󠇮 152 󠇮 pages, 󠇮 CRC 󠇮 Press, 󠇮
Taylor& Francis Group. ISBN: 9780367443139
(HB)