=Paper= {{Paper |id=Vol-2786/Paper43 |storemode=property |title=A method of knowledgebase curation using RDF Knowledge Graph and SPARQL for a knowledge-based clinical decision support system |pdfUrl=https://ceur-ws.org/Vol-2786/Paper43.pdf |volume=Vol-2786 |authors=Xavierlal J Mattam,Ravi Lourdusamy |dblpUrl=https://dblp.org/rec/conf/isic2/MattamL21 }} ==A method of knowledgebase curation using RDF Knowledge Graph and SPARQL for a knowledge-based clinical decision support system== https://ceur-ws.org/Vol-2786/Paper43.pdf
                                                                                                                                                  347




A method of knowledgebase curation using RDF Knowledge
Graph and SPARQL for a knowledge-based clinical decision
support system
Xavierlal J Mattam and Ravi Lourdusamy
Sacred Heart College(Autonomous), Tirupattur, Tamil Nadu, India


                  Abstract
                  Clinical decisions are considered crucial and lifesaving. At times, healthcare workers are
                  overworked and there could be lapses in judgements or decisions that could lead to tragic
                  consequences. The clinical decision support systems are very important to assist heath workers.
                  But in spite of a lot of effort in building a perfect system for clinical decision support, such a
                  system is yet to see the light of day. Any clinical decision support system is as good as its
                  knowledgebase. So, the knowledgebase should be consistently maintained with updated
                  knowledge available in medical literature. The challenge in doing it lies in the fact that there is
                  huge amount of data in the web in varied format. A method of knowledgebase curation is
                  proposed in the article using RDF Knowledge Graph and SPARQL queries.

                  Keywords
                  Clinical Decision Support System, RDF Knowledge Graph, Knowledgebase Curation.


1. Introduction                                                                               and awareness in the diagnosis and treatment of
                                                                                              the disease. Although there were many
                                                                                              breakthroughs published in medical literature
    Decision Support has been a crucial part of
                                                                                              globally, down-to-earth use of any of them were
a healthcare unit. In every area of a health care
                                                                                              slow and far-between. It would have not been
facility, critical and urgent decisions have to be
                                                                                              the case had there been CDSS that was capable
made. In such extreme situations, leaving lives
                                                                                              of automatically acquiring reliable knowledge
at stake totally to mere human knowledge and
                                                                                              from authenticated medical literature. Such
memory is a very big risk. It can often lead to
                                                                                              CDSS could alter heath workers with an all-
untold misery to the stakeholders and disaster
                                                                                              round advanced knowledge at the moment of
to such facilities. In 2009 when Health
                                                                                              crucial decisions.
Information Technology for Economic and
                                                                                                  In the article, some aspects of the recent
Clinical Health (HITECH) was promulgated in
                                                                                              advances in the technology used in CDSS are
the United States of America, monetary aid was
                                                                                              described together with related works carried
disbursed for success in the implementation of
                                                                                              out in the development of knowledge-based
Clinical Decision Support System(CDSS). It
                                                                                              CDSS before the proposed method of
was because CDSS, although being far from a
                                                                                              knowledgebase curation in CDSS is explained.
perfect system, was found to be better than
                                                                                              In the section 2 that follows, a brief background
mere human decisions. Since then, a lot of study
                                                                                              is given into re-cent developments in
and research is being done to perfect the CDSS.
                                                                                              knowledge-based CDSS. In section 3, some
    One of the enlightening issues that came to
                                                                                              recent works on possible methods of
the forefront during the recent pan-demic
                                                                                              knowledge base curation are mentioned. Then
outbreak was the lack of widespread knowledge
                                                                                              the proposed method is explained in section 4
ISIC 2021, February 25-27,2021, New Delhi, India                                              and that is followed by a brief discussion on the
EMAIL:xaviermattam@gmail.com (X. Mattam);                                                     proposed method in section 5. Finally, in
ravi@shctpt.edu(R. Lourdusamy);                                                               section 6, a summarized conclusion is made.
ORCID: 0000-0002-5182-3627(X. Mattam);
              ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative
              Commons License Attribution 4.0 International (CC BY 4.0).

              CEUR Workshop Proceedings (CEUR-WS.org)                                         2. Background
                                                                                                         348




    CDSS has evolved gradually with the ample       availability of such knowledgebases, it could
technological developments that has happened        also be cost effective in terms of price [12]. But
in the past few decades. The system is              such knowledge acquisition might lead to the
essentially centered on high adaption and           knowledge-acquisition bottleneck as certain
effective use of constantly updated knowledge.      standards and formalization will have to be
With the evolution of CDSS over the years,          maintained for the knowledge portability. Such
there has also been a consistent evolution in its   a knowledge-acquisition bottleneck could harm
definition from a mere use of information           the effectiveness of the CDSS and freeing the
technology for data entry to a hi-tech complex      CDSS of the bottleneck makes the knowledge
system that provides individual specific,           acquisition process complex and difficult [12],
intelligently filtered and efficiently presented    [13]. Another bottleneck in the curation of
knowledge for clinicians, staff, patients, or       knowledgebase lies in the maintenance of the
other individuals [1, 2]                            knowledgebase [14]. Together with creation of
    CDSS can be broadly classified as               knowledgebase, its verification and constant
knowledge-based CDSS and non-knowledge-             updating is equally important. The verification
based CDSS. The knowledge-based CDSS are            and validation of the knowledgebase involves
designed to mimic the knowledge processed by        transparency, updatability, adaptability, and
a human expert. Such systems were earlier           learnability [15].
termed as the expert systems. Non-knowledge-            A knowledgebase is judged by its accuracy,
based systems, on the other hand, rely on           completeness and the quality of its data. So, the
statistical data that is available to help in       construction of the knowledgebase is done in
decision making. These systems make full use        such a manner that these three factors are
of the machine learning and neural network          enhanced to the maximum. The methods of
algorithms to predict possible outcomes [3, 4,      constructing knowledgebases can be classified
5, 6].                                              into four main groups. There are closed
                                                    methods in which the knowledgebase is are
2.1.    Knowledge-based CDSS                        manually fixed by experts, open methods in
                                                    which knowledgebases are curated by
                                                    volunteers, automated semi-structured methods
    Knowledge-based systems evolved from
                                                    in which the knowledgebases are procured from
expert systems. While the expert systems were
                                                    semi-structured texts automatically by using
built on the knowledge of human experts, the        rules that are programmed into the system and
knowledge-based systems have the capability         the automated unstructured methods that use
to acquire knowledge from different sources         artificial intelligence algorithms to extract
and build upon it. So, while the expert system      knowledge from unstructured texts [16].
could be ranked according to the knowledge of
the expert, the knowledge-based systems had
the capacity of greater knowledge [7, 8].           2.2. Knowledge                Graphs         for
    The knowledgebase of the knowledge-based        knowledge base
CDSS ultimately determines the effectiveness
of the CDSS [9]. The acquisition,                       Knowledge graphs are knowledgebases in
representation and the integration of knowledge     which knowledge is expressed in a graph
base in the workflow is vital for the success of    structure having nodes to represent the concepts
CDSS [10]. The           process of selecting,      or entities and edges between the nodes to
organizing, and looking after the knowledge in      represent the relationship between those entities
the knowledge base makes the knowledgebase          or concepts [16, 17, 18, 19, 20, 21, 22, 23].
efficient and the CDSS successful. The two          There are diverse definitions for knowledge
important facets of the curation process is the     graphs varying according to the purpose for
method of knowledge acquisition and                 which the knowledge graph is created or by the
knowledge representation [11].                      knowledge graph model [24]. Although Google
    One way to build a cost effective               is credited for the popularity of knowledge
knowledge-based CDSS is to use commercial           graphs from 2012 [24, 25, 26], the term
knowledgebases that are available. It could         knowledge graph was used in a report in 1973
reduce cost of the development time of the          with a very similar meaning [27] and later in
CDSS and also because of the common                 1982 the term was used to represent textual
                                                                                                          349




concepts using graphs. There has been decades        2.3.    RDF Knowledge Graphs
of study in representing knowledge using
graphs [28].
                                                         Resource Description Framework(RDF) is a
    The maintenance of knowledge graphs has
                                                     World Wide Web Consortium (W3C)
the processes of creation, hosting, curation and
                                                     specification to represent knowledge in the
deployment. The process of creation can be
                                                     form of triples (subject, predicate, object)
manual as in the case of expert systems or semi-
                                                     containing references, literals or blank [30],
automatic or automatic. Apart from these, there
                                                     [31]. RDF can be modelled as directed label
is also a method of annotation by mapping the
                                                     graphs in which the subject and object are
knowledgebase entities to the source without
                                                     represented by the vertices or nodes and the
actually keeping the entities in the
                                                     corresponding predicate are represented by the
knowledgebase. Hosting or storage processes
                                                     labelled edges [32, 33, 34, 35]. RDF graphs are
use various methods of keeping knowledge in
                                                     widely used to represent knowledge graphs like
the knowledgebase. The curation processes
                                                     in the cases of Freebase, Yago, and Linked Data
involve three steps, namely, the assessment of
                                                     since the billions of triples scattered across the
new knowledge, its cleaning and its enrichment
                                                     web can be captured and integrated with the
by detecting the source of the knowledge,
                                                     existing knowledge using powerful abstraction
integrating it with existing knowledgebase,
                                                     for representing heterogeneous, partial, scant,
detecting duplication and correcting entity
                                                     and potentially noisy knowledge graphs [36],
relations. Once the knowledgebase is ready, it
                                                     [37]. Unlike property graphs that is also quite
is deployed in appropriate application [29].
                                                     popular representation of knowledge graphs
    There are various sources of knowledge that
                                                     due to its property and value representation for
can be utilized for the creation of knowledge
                                                     its edges, the use of metadata in RDF
base. Textual knowledge that can be in the form
                                                     knowledge graphs allows the convenient
of newspapers, books, scientific articles, social
                                                     distributed storage of knowledge. That also
media, emails, web crawls, and so on is a very
                                                     makes RDF graphs more flexible than property
rich source of knowledge for building a
                                                     graphs [37, 38].
knowledge graph for CDSS. However, the
                                                         RDF knowledge graphs are stored as triples
process of extracting knowledge from text is
                                                     in a Triple store or RDF stores. The flexibility
complex and involves the application of
                                                     of RDF stores is its greatest advantage. Since
Natural Language Processing(NLP) and
                                                     the RDF knowledge graph has the ability to link
Information       Extraction(IE)      techniques.
                                                     any number of entities with their relations, the
Curation of the knowledgebase using these
                                                     RDF stores are also flexible enough to store
techniques may follow a combination of five
                                                     them without restriction on size. Moreover any
stages. In the pre-processing stage, the text is
                                                     kind of knowledge can be expressed and stored
analyzed for atomic terms and symbols. Some
                                                     using RDF knowledge graphs that allows its
of the techniques used in the pre-processing
                                                     extraction and reuse by different applications
stage       are       Tokenization,       Part-of-
                                                     [39].
Speech(POS)tagging, Dependency Parsing and
                                                         In the case of textual knowledge, RDF
Word Sense Disambiguation(WSD). After the
                                                     knowledge graphs are helpful in finding the
pre-processing stage is the Named Entity
                                                     Thematic Scope or the Topic Model of a text.
Recognition (NER) stage in which the various
                                                     The topic category or the semantic entities in a
concepts or entities that forms the nodes of the
                                                     set of documents is abstracted using the
graph are identified. The NER is followed by
                                                     Thematic Scope or the Topic Model [38, 40].
the Entity Linking (EL) stage in which an
                                                     Since the RDF knowledge graphs of a
association is made between the entities that are
                                                     document is represented as a set of triples in
identified in the text and the entities in the
                                                     which each triple is considered a word or an
existing knowledge graphs so that the similar
                                                     entity, there is the possibility of detecting word
entities could be placed side-by-side. During
                                                     and phrase patterns automatically by clustering
the Relation Extraction (RE) stage, the relation
                                                     word groups that best characterize a document.
between the various entities taken from the text
                                                     Some of the methods of the Topic Modelling of
are considered using a various RE techniques.
                                                     RDF knowledge graphs are Latent Semantic
Finally the extracted relation is joint to the
                                                     Analysis (LSA), Probabilistic Latent Semantic
entities in the last stage of the text processing
                                                     Indexing (pLSI) and              Latent Dirichlet
[22].
                                                                                                        350




Allocation (LDA). The challenges faced in           using a part or the whole expression of a
Topic Modelling include sparseness of the           question in natural language. The question in
entities, a lack of context especially when         the natural language can be interrogative in
words used have multiple meanings especially        which case it will be a factoid type of question
when the entities are sparce and the use of         and its answer will be a fact from the
unnatural language like the use of special          knowledge source or the question could be
characters or unusual casing of letters which       statement in which case the answer will be in
normally are removed while pre-processing the       the form of either a list or a definition or
text. Normally, these challenges are overcome       hypothetical statement or a causal remark or a
by supplementing the text or modifying the          relationship      description   or     procedural
method of Topic Modelling [40, 41, 42]. Entity      explanation or just a confirmation. The sources
summarization which is the best way of              of knowledge are normally unstructured and is
summarize an entity by identifying a limited        a set of documents, video clips, audio clips, or
number of ordered RDF triples is one of the         text files that are given as input to the systems
problems that is solved using Topic Models of       [44]. QA systems make use of Information
RDF knowledge graphs [41, 42]. Entity               Retrieval, Information Extraction and Natural
summarization has many applications like            Language Processing(NLP) techniques [45].
search engines and is useful for research               QA systems have a long history of
activities. The existing entity summarization       development starting with the earliest popular
techniques can be classified into the generic       system BASEBALL in 1961 and LUNAR
techniques that apply to a wide range of            made in 1972. The Text REtrieval Conference
domains, applications and users and the specific    (TREC) that began in 1992 for large scale
techniques that make use of external resources      information retrieval accelerated interest and
or factors that are effective only in specific      growth in QA systems. Other forums and
domains or applications. While the generic          campaigns such as the Cross Language
techniques make use of universal features like      Evaluation Forum (CLEF) and NIITest
frequency and centrality, informativeness, and      Collections for IR systems (NTCIR) campaigns
diversity and coverage, the specific techniques     also enhanced the QA systems. Noteworthy QA
make use of domain knowledge, context               systems include IBM Watson, and the other
awareness and personalization [43].                 commercial personal assistant software like
                                                    Apple’s Siri, Amazon’s Alexa, Google’s Assist
3. Related works                                    and Microsoft’s Cortana [44, 45, 46, 47, 48,
                                                    49].
                                                        SPARQL is a query language in which a
    Extracting knowledge from unstructured          pattern in a query is matched with that in a
textual sources has been a challenge. Several
                                                    graph from different sources. The matching is
studies have been done in order to solve the        done in three stages. It begins with the pattern
problem of retrieving meaningful and relevant       matching involving features like optional parts,
knowledge from literature since it is crucial for   union of patterns, nesting, filtering or
decision support in systems like the CDSS.
                                                    constraining values of matches, and selecting
Some relevant techniques have been delt with        the data source to be matched by a pattern. Once
in the earlier sections on knowledge graphs and     these features are applied, the output is
RDF knowledge graphs. As part of the                computed using standard operators like
proposed method, certain other related              projection, distinct, order, limit, and offset to
techniques have to be explained in order to have    modify the values that is got from pattern
a complete picture of the complexity of the         matching. Finally, the result of the SPARQL
problem of curating the knowledgebase for           query is given in one of the many forms like
CDSS.                                               Boolean answers, the pattern matching values,
                                                    new triples from the values, and resources
3.1. Question Answering using                       description. Working with RDF knowledge
SPARQL                                              graphs require the use of SPARQL [50, 51]. In
                                                    order to apply NLP techniques used in the QA
                                                    systems over SPARQL queries, query builders
    Question-answering (QA) is a process of
                                                    such       as      QUaTRO2,         OptiqueVQS,
retrieving knowledge from different sources
                                                    NITELIGHT,           QueryVOWL,         Smeagol,
                                                                                                         351




SPARQL Assist language-neutral query                 or more prevalently using graph embeddings
composer, XSPARQL-Viz, Ontology Based                [58, 59].
Graphical Query Language, NL-Graphs and so               Knowledge graphs can be centralized or
on are used [52]. Some of the challenges in the      distributed. Both the centralized and distributed
use of SPARQL for QA systems include lexical         knowledge graphs have their advantages and
gap, ambiguity, multilingualism, use of              disadvantages [60]. When it comes to
complex operators, distributed knowledge and         distributed knowledge graphs, federated query
in the use of procedural, temporal, spatial          processing is used in which the result of the
templates [53].                                      query is computed from different data source.
                                                     The federated query processing accesses
3.2. Ranked RDF                Triples      and      different autonomous, distributed, and
                                                     heterogeneous data sources to without having
Federated Search                                     any control over the sources. Federated query
                                                     processing is more complex than the
    Ranking SPARQL query results is an               centralized system because of the many
important process for applications involving         parameters involved in the query processing.
searches, QA and entity summarization                Federated query processing makes use of
techniques. Ranking of RDF triples can be over       federated query engine to search for the results
resources, properties, or triples as a whole but a   over distributed sources [61]. The federated
combined ranking of both the triples and its         SPARQL query processing can be done either
entities are important for RDF knowledge             over different SPARQL endpoints or over
graphs for faster and efficient searches in the      linked data or over Distributed Hash Tables.
knowledgebase [54]. Ranking can be done on           The federated SPARQL query processing can
structured data using structured queries that        also be classified either as catalog or index
results in a structured graph. Such rankings are     assisted processing or as catalog or index free
structure-based ranking and mostly use an            processing or a combination of both [62].
extension of ranking algebra that was earlier
used in relational database. Content-based
                                                     3.3.    Open Information Extraction
ranking on the other hand tries to rank the
content of structure or unstructured data. In
content-based ranking, the ranking is done              Information Extraction(IE) is an automated
according to the match between the pattern of        process of collecting a set of corresponding
the query and its holistic match in the              information of interest from a given sequences
knowledgebase. Further classification of query       of unstructured data. IE has many applications
ranking can be as keyword queries on                 such as part-of-speech tagging, named entity
unstructured data like documents, structured         recognition, shallow parsing, table extraction,
queries on structured data, keyword queries on       contact information extraction and so on.
structured data, and keyword-augmented               Methods used for IE can be classified as rule
structured queries on structured data [55].          learning      based     extraction     methods,
Ranking can also be based on the relevance or        classification based extraction methods, and
importance of the SPARQL query results with          sequential Labeling based extraction methods
the topic on which the query is made. The            [63]. Open Information Extraction(OIE) is a
relevance ranking requires the subject of the        text IE paradigm that enables relations
query to be clearly defined so that the results of   discovery independent of the domain that is
the query can be ranked according its relevance      readily scalable to the variations in size and
to the subject. The importance ranking on the        content of the web. OIE is technically capable
other hand specifies the importance that is          of meeting the challenges of the IE, namely,
given to the query result. In the importance         automation of the process, heterogeneity of the
ranking factors such as authoritative,               web corpus and efficiency in extraction of
trustworthy, and so on are placed for the            information [64]. The OIE like Text Runner,
ranking purpose for which human cognitive            Clause IE, OLLIE, and the like were data based
results are taken for consideration [56]. The        using training data that were represented either
ranking is placed along with the triple using        by dependency parsing or parts-of-speech
tokens in an Extended Knowledge Graphs [57]          tagged text. The Rule-based OIE were
                                                     manually programmed using the dependency
                                                                                                         352




trees or parts-of-speech tagged text. Two             4. Proposed     Method                       of
examples of Rule-based OIE are clauseIE and
ExtrHech [65]. Another method of OIE is by               Knowledgebase Curation
linguistic analysis that shows the canonical
ways in which verbs in a text is used to express          A CDSS is as efficient as its knowledgebase
relationship between entities. RE-                    is. If the knowledgebase of the CDSS is highly
    VERB,           ARGLEARNER                 and    adaptive to automatically and constantly update
R2A2(combination of               REVERB and          itself reflecting recent advances and local
ARGLEARNER) are examples of the linguistic            practice, then that will be a robust CDSS. The
analysis method. The earlier methods were             flexibility of the knowledgebase to accept
based on the label, learn and extract stages of       knowledge from diverse sources and portability
IE. The main drawback of the three-stage              of the knowledgebase for various practice
process was that the information extracted were       settings will make the knowledgebase more
either incoherent or uninformative and                effective [9, 70]
therefore they were of little use to applications.
The      linguistic-statistical    analysis     for   4.1.    Motivation
extractions on the other hand identifying a more
meaningful and informative relation phrase                CDSS requires quick and reliable
[66].                                                 knowledge.      Therefore,     a     centralized
    The biggest advantage of OIE is its ability to    knowledgebase will be better than a federated
extract relationship between entities allowing        search. Since knowledge on most of the
queries like “(?, kill, bacteria)” or “(Bill Gates,   advances found in medical literature, the
?, Microsoft)” to extract resultant missing           knowledge extracted from the literature has to
relationships from a text corpus. Moreover,           be found in the knowledgebase. The
OIE will result in a compressed data for a            information extraction from medical literature
knowledgebase [67]. Other than populating             can be done using OIE. If the user interface of
knowledgebases, OIE is also used for question         the CDSS allows natural language questions to
answering, semantic indexing, semantic search         be asked, the questions can be converted
and such target applications. Converting OIE          through a QA system as a SPARQL query
triples into RDF knowledge graph is possible          linked to through the knowledge graph. If
since the longer sentences are broken into            answers are not found in the existing
triples with entities and relationship between        knowledgebase of the CDSS, it can than be
entities leaving out the determiners and              passed through the OIE to relevant medical
propositions. Knowledgebase populating using          literature and the resultant knowledge can be
OIE has been a very useful application domain         integrated to the existing knowledgebase. The
[68, 69]. Integrating the OIE triples with the        knowledgebase is so enhanced that most
exiting knowledgebase has been a research             answer to query will be found in it and the
challenge and there have been many solutions          updating will be done automatically once new
proposed for it like the predicate or attribute       knowledge is found in any form on the web.
level schema where similarity on names, types,            There are many advantages of such
descriptions, instances, and so on are mapped         centralized knowledge graphs. Centralized
and universal schema to apply inferences got          knowledge graphs can be controlled by a single
from OIE and the existing knowledge mapped            entity when it comes to strategic issues such as
at the instance-level. One of the problems of         symptoms for diagnostic systems of the CDSS.
using universal schema is that the process            Such control increases the survivability and
ignores unseen entities and entity pairs and tries    robustness of the CDSS. The uniform use of
to overfit the space entities to large number of      terms in centralized knowledgebase make it
parameters. Rowless Universal Schema                  more stable. The fixed curation method of the
attempts to find inferences between predicated        knowledgebase of the centralized systems make
and relations so that the problem of unseen           the knowledge consistent and improves its
entities and entity pairs are solved. But it tends    quality. Moreover, the fixed schema of the
to completely ignore the existence of entities        centralized knowledge graphs allows uniform
and thus it functions like the predicate or           usage. The knowledge graphs allows the use of
attribute level schema [69].
                                                                                                                          353




application programming interface (API) for           For the evaluation of the proposed algorithm, the
knowledge retrieval and query processing [60].        precision and recall method are used as it is the
                                                      typical form of evaluation metrics used in
                                                      information extraction. During the process of
4.2.    Proposed Approach                             information extraction or retrieval, there could be
                                                      two types of knowledge that is obtained from the
    The proposed approach of knowledgebase            knowledge source. There is knowledge that can be
curation for CDSS has three stages. In the first      considered important to the application and there is
stage, new knowledge is extracted from a              knowledge relevant to the query. In the proposed
medical literature using an OIE application.          approach, since the query is based on keywords from
The OIE application is for knowledge                  the QA system, only knowledge relevant to the
extraction since it will result in triples that can   query is selected rather than all knowledge that is
be integrated to the existing knowledge graph         deemed important from the knowledge source.
of the CDSS. The triples got from the OIE             Therefore, the results of the OIE is restricted to just
application on the medical literature is queried      the knowledge relevant to the query. The application
using keywords from the CDSS interface for            of the evaluation metrics is also bound by the
                                                      consideration that only the sum total of the relevant
relevant knowledge using a QA system. If the
                                                      knowledge found by the system proportionate to all
query results in new knowledge being found,
                                                      the relevant knowledge that can be manually
then those triples in RDF from are added to the
                                                      counted on the same medical literature is calculated
existing knowledge graph of the CDSS. A table         rather that taking in consideration all the important
is maintained with the list of medical literature     knowledge present in the literature that is used in the
already checked for knowledge so that they            test. The precision evaluation metric is given by the
need not be looked for new knowledge again.           formula in equation (1)
The algorithm for the proposed system is as
shown in Algorithm 1.                                                 relevant RDF triples  retrived RDF triples
                                                      Precision =                                                   (1)
                                                                                 retrived RDF triples
Algorithm 1.
Curating the CDSS knowledge base using RDF                A contingency matrix can be formed using
Knowledge Graph                                       the relevance of the RDF triple as shown in
URI (Uniform Resource Identifier)
                                                      table 1. If the triple retrieved by the system is
                                                      relevant to the query than it is true positive
table U = {u1, u2, …, un}
RDF Knowledge Graph G ={V,E} where V Є                otherwise it is false positive. So also, if a
{v1,v2,….,vn} and E Є {e1,e2,…,en}                    relevant triple in the knowledge source is not
RDF triple S = {s, p, o} (subject(s),                 retrieved by the system then it is false negative
predicate(p), object(o))                              and if a triple that is not relevant and is ignored
Keyword K = {k1, k2,…kn} taken from the               by the system it is true negative.
QA system of CDSS
1 Read ui       //URI of a new document               Table 1
2 If ui Є U GOTO Step 12                              Contingency matrix according to the relevance
3     Else use OpenIE to create G                     of RDF triples
4        For every K                                                    Relevant          Not
5            use QA system with SPARQL query
                                                                                        Relevant
to find ki in G
6            For every S found                           RDF triple  True positive       False
7               If S not in CDSS knowledge base          retrieved                      positive
8               Append S to CDSS knowledge base          RDF triple       False           True
9            END For loop                              not retrieved    negative        negative
10 END For loop
11 END Else
                                                         The formula to calculate the precision of the
12 If another document exists GOTO Step 1
                                                      system using the contingency matrix can be
13 Else STOP
                                                      given as in equation (2)

4.3. Evaluation of the Proposed                       Precision =
                                                                            totalnumber of truepositives
                                                                                                                    (2)
                                                                    totalnumber of truepositivesandfalsepositives
Method
                                                                                                                                   354




   The percentage of the precision can also be                                 6. Conclusion
calculated as in equation (3)
                        total number of true positives                             CDSS has been considered a very important
Precision %=                                                        ×100 (3)   system in the healthcare sector. That is the
               total number of true positives and false positives
                                                                               reason for the numerous studies that has been
    For recall we take into consideration the                                  done on developing a perfect system that is
proportion of the retrieved triples to the total                               highly efficient while at the same time reliable.
relevant triples as in equation (4)                                            Since the CDSS requires a very quick response
                                                                               to queries, a centralized system is to be
           relevant RDF triples  retrived RDF triples                         considered. At the same time, the
Recall=                                                                 (4)    knowledgebase of such a system requires being
                           relevant RDF triples
                                                                               maintained with constant and consistent
    Using the contingency matrix, the formula                                  updating from various sources of medical
to calculate recall is as in equation (5)                                      literature. Knowledge graphs have proved to be
                                                                               a very formidable approach to represent huge
                        total number of true positives                         amount of knowledge that is now available in
Recall =                                                                (5)
           total number of true positives and false negatives                  the web. RDF triples are reliable storage
                                                                               method for knowledge graphs in the form of
                                                                               RDF knowledge graphs. Curation of the RDF
5. Discussion for Further Study and                                            knowledge graph can be done through a QA
   Development                                                                 system that converts natural language questions
                                                                               into SPARQL queries which when matched
The proposed method of knowledgebase curation                                  with RDF triples from an OIE process can
using RDF Knowledge Graph and SPARQL for a                                     enhance the knowledgebase. Therefore, a
knowledge-based CDSS is pretty straightforward                                 method is proposed to curate knowledge base
and simple. Its efficiency depends on the underlying                           of a CDSS using RDF Knowledge graph. It is
OIE method chosen for extracting knowledge. The                                possible to evaluate the system using precision
system provides the automatic updating of the                                  and recall methods and give an appraisal of its
knowledgebase and in turn offers reliability to the                            efficiency in acquiring knowledge from various
CDSS. Being a centralized system, the fixed                                    sources.
curation method of the knowledgebase will
consistently improve its quality and make room for
its usefulness in decision making processes.
                                                                               References
    However, there is a lot of improvement
possibilities that can make the system much                                     [1] B. Middleton, D. F. Sittig, A. Wright,
more efficient and robust as a perfect system.                                       Clinical Decision Support: a 25 Year
One of the improvements that can be worked                                           Retrospective and a 25 Year Vision,
into the system is to use ranked RDF triples                                         Yearbook of Medical Infor- matics 25
which can serve in two ways. First of all, it can                                    (2016) S103–S116. URL:
give weightage to the decision suggestion and                                  http://www.thieme-connect.de/
secondly, it can help in removing redundant                                          DOI/DOI?10.15265/IYS-2016-s034.
triple from the knowledgebase allowing the                                           doi:10.15265/IYS-2016-s034.
CDSS to work faster. The ranked triples can be                                  [2] J. A. Osheroff, J. M. Teich, B. Mid- dleton,
evaluated using the precision and recall curves                                      E. B. Steen, A. Wright, D. E. Detmer, A
that can give a better appraisal of the system.                                      Roadmap for National Action on Clinical
Another approach to removing redundant                                               Decision Support, Journal of the American
knowledge is to formalize forgetting.                                                Medical Informatics Association 14
Formalizing forgetting in knowledge graphs                                           (2007)            141–145.           URL:
implies a method of removing either the                                              https://academic.oup.com/ jamia/article-
redundant entities or the redundant relations.                                       lookup/doi/10.1197/jamia.         M2334.
Entities may not exist without relations. So, by                                     doi:10.1197/jamia.M2334.
removing relations would mean new updated                                      [3] A. M. Shahsavarani, Abadi, E. A. Marz,
relations replacing old relations.                                                   M. H. Kalkhoran2, S. Jafari, S.
                                                                                     Qaranli, Clinical Decision Support
                                                                                     Systems (CDSSs): State of the art Review
                                                                                                        355




     of Literature, International Journal of            Research Issues Underlying Implemen-
     Medical Reviews 2 (2015) 299–308. URL:             tation Successes and Failures, Journal of
     http://www.ijmedrev.                               Biomedical Informatics 78 (2018) 134–
     com/article{_}68717.html.                          143. URL: https://linkinghub.elsevier.
[4] R. T. Sutton, D. Pincock, D. C. Baumgart,           com/retrieve/pii/S1532046417302757.
     D. C. Sadowski, R. N. Fedorak, K. I.               doi:10.1016/j.jbi.2017.12.005.
     Kroeker, An Overview of Clinical De-          [11] G. Kong, D.-L. Xu, J.-B. Yang, Clinical
     cision Support Systems: Benefits, Risks,           Decision Support Systems: A Review on
     and Strategies for Success, npj Digital            Knowledge Representation and Infer- ence
     Medicine 3 (2020) 17. URL: http://www.             Under Uncertainties, International Journal
     nature.com/articles/s41746-020-0221-y.             of Computational Intelli- gence Systems
     doi:10.1038/s41746-020-0221-y.                     1(2008)            159–167.            URL:
[5] R. A. Greenes, Definition, Scope, and               https://doi.org/10.1080/18756891.2008.
     Challenges, in: R. A. Greenes (Ed.),               9727613.
     Clinical Decision Support : The Road to            doi:10.1080/18756891.2008.9727613.
     Broad Adoption, sec- ond ed., Elsevier,       [12] A. Al-Badareen, M. Selamat, M. Samat, Y.
     London,          2014, pp. 3–47. URL:              Nazira, O. Akkanat, A Review on Clin-
     https://linkinghub.                                ical Decision Support Systems in Health-
     elsevier.com/retrieve/pii/C20120003043.            care, in: Journal of Convergence In-
     doi:10.1016/C2012-0-00304-3.                       formation Technology(JCIT), volume 9,
[6] E. S. Berner, T. J. La Lande, Overview of           2014, pp. 125–135.
     Clinical Decision Support Sys- tems, in: E.   [13] M. A. Musen, J. van der Lei, Knowledge
     S. Berner (Ed.), Clinical decision support         engineering for clinical consultation
     systems, third ed., Springer, Cham, Cham,          programs: modeling           the applica-
     2016,         pp.        1–17.       URL:          tion area., Methods of information in
     http://link.springer.com/10.1007/ 978-3-           medicine 28 (1989) 28–35. URL:
     319-31913-1{_}1.        doi:10.1007/978-3-         http://www.ncbi.nlm.nih.gov/pubmed/
     319-31913-1_1.                                     2649771.
[7] D. Graham, Introduction to Knowledge-          [14] S. A. Spooner, Mathematical Founda-
     Based Systems, in: Knowledge-Based                 tions of Decision Support Systems,
     Image Processing Systems, Springer                 Springer International Publishing,
     London, London, 1997, pp. 3–13. URL:               Cham, 2016, pp. 19–43. URL: https://
     http://link.springer.com/10.1007/ 978-1-           doi.org/10.1007/978-3-319-31913-1{_}2.
     4471-0635-7{_}1.        doi:10.1007/978-1-         doi:10.1007/978-3-319-31913-1_ 2.
     4471-0635-7_1.                                [15] K. von Michalik, M. Kwiatkowska,
[8] D. Dinevski, U. Bele, T. Sarenac, U. Ra-            K. Kielan, Application of Knowledge-
     jkovic, O. Sustersic, Clinical Decision            Engineering Methods in Medical
     Support Systems, in: G. Graschew, S.               Knowledge            Management,
     Rakowsky (Eds.), Telemedicine                      Springer Berlin Heidelberg, Berlin,
     Techniques           and     Applications,         Heidelberg, 2013, pp. 205–214. URL:
         In- techOpen, 2011, pp. 185–210.               https://doi.        org/10.1007/978-3-642-
     URL: http://www.intechopen.com/books/              36527-0{_}14. doi:10.1007/978-3-642-
     telemedicine-techniques-and-applications/          36527-0_ 14.
     clinical-decision-support-systems.            [16] M. Nickel,           K. Murphy, V.
     doi:10.5772/25399.                                 Tresp, E. Gabrilovich, A Review of Rela-
[9] G. P. Purcell,        What makes a good             tional Machine Learning for Knowl- edge
     clinical decision support sys- tem, BMJ            Graphs, Proceedings of the IEEE 104
     (Clinical research ed.) 330 (2005) 740–            (2016)        11–33.      URL:       https://
     741.               URL:              https:        ieeexplore.ieee.org/document/7358050/.
     //europepmc.org/articles/PMC555864.                doi:10.1109/JPROC.2015.2483592.
     doi:10.1136/bmj.330.7494.740.                 [17] K. Balog, Meet the Data, in: Entity-
[10] R. A. Greenes, D. W. Bates, K. Kawamoto,           Oriented Search, first ed., Springer, Cham,
     B. Middleton, J. Osheroff, Y. Shahar,              Cham,        2018,     pp. 25–53. URL:
     Clinical Decision Support Models and               http://link.springer.com/10.1007/978-3-
     Frameworks: Seeking to Address                     319-93935-3{_}2. doi:10.1007/978-3-
                                                                                                        356




     319-93935-3_2.                                     Evolving Semantics (SuCCESS’16) co-
[18] J. Yan, C. Wang, W. Cheng, M. Gao,                 located with the 12th International Con-
     A. Zhou, A retrospective of knowledge              feren, Sun SITE Central Europe (CEUR),
     graphs, Frontiers of Computer Science 12           Leipzig, Germany, 2016.
     (2018) 55–74.        URL:                     [25] M. Färber, F. Bartscherer, C. Menne, A.
     http://link.springer.com/10.1007/s11704-           Rettinger, Linked data quality of DBpedia,
     016-5228-9.       doi:10.1007/s11704-016-          Freebase, OpenCyc, Wikidata, and
     5228-9.                                            YAGO, Semantic Web 9 (2018) 77– 129.
[19] P. A. Bonatti, S. Decker, A. Polleres,             doi:10.3233/SW-170275.
     V. Presutti, Knowledge Graphs: New Di-        [26] D. Fensel, U. Şimşek, K. Angele, E. Hua-
     rections for Knowledge Representation on           man, E. Kärle, O. Panasiuk, I. Toma, J.
     the Semantic Web, Dagstuhl Reports 8               Umbrich, A. Wahler, How to Build a
     (2019) 29–111. URL: https://drops.                 Knowledge Graph, in: Knowl- edge
     dagstuhl.de/opus/volltexte/2019/10328/.            Graphs, Springer International Publishing,
     doi:10.4230/DagRep.8.9.29.                         Cham, 2020, pp. 11–68. URL:
[20] R. Popping,          Text Analysis for             http://link.springer.com/10.1007/ 978-3-
     Knowledge Graphs,            Qual- ity &           030-37439-6{_}2. doi:10.1007/ 978-3-
     Quantity 41 (2007) 691–709. URL:                   030-37439-6_2.
     http://link.springer. com/10.1007/s11135-     [27] E. W. Schneider, Course Modularization
     006-9020-z.       doi:10.1007/s11135-006-          Applied: The Interface System and its
     9020-z.                                            Implications for Sequence Control and
[21] R. Popping, Knowledge Graphs and                   Data Analysis, Technical Report, Human
     Network Text Analysis, Social Science              Resources Research Organization (Hum-
     Information 42 (2003) 91–106. URL:                 RRO), Virginia, 1973. URL: https://files.
     http://journals.sagepub.com/doi/10.1177/           eric.ed.gov/fulltext/ED088424.pdf.
     0539018403042001798.                          [28] S. S. Nurdiati, C. Hoede, 25 years de-
     doi:10.1177/0539018403042001798.                   velopment of knowledge graph theory: the
[22] A. Hogan, E. Blomqvist, M. Cochez,                 results and the challenge, Technical
     C. D’Amato, G. de Melo, C. Gutier- rez,            Report,      Department       of     Applied
     J. E. L. Gayo, S. Kirrane, S. Neu- maier,          Mathematics, University of Twente,
     A. Polleres, R. Navigli, A. C. N.                  Enschede,             2008.            URL:
     Ngomo, S. M. Rashid, A. Rula, L.                   https://research.utwente.nl/en/publications
     Schmelzeisen, J. Sequeda, S. Staab, A.        [29] D. Fensel, U. Şimşek, K. Angele, E. Hua-
     Zimmermann, Knowledge Graphs (2020).               man, E. Kärle, O. Panasiuk, I. Toma, J.
     URL: http://arxiv.org/abs/2003. 02320.             Umbrich, A. Wahler, Introduc- tion:
     arXiv:2003.02320.                                  What Is a Knowledge Graph?, in:
[23] S. Seufert, P. Ernst, S. J. Bedathur, S. K.        Knowledge        Graphs,     Springer     In-
     Kondreddi, K. Berberich, G. Weikum,                ternational Publishing, Cham, 2020, pp. 1–
     Instant Espresso: Interactive Analysis of          10.      URL:      http://link.springer.com/
     Relationships in Knowledge Graphs, in:             10.1007/978-3-030-37439-6{_}1.
     Proceedings of the 25th International              doi:10.1007/978-3-030-37439-6_1.
     Conference Companion on World Wide            [30] A. Harth, S. Decker, Optimized In- dex
     Web - WWW ’16 Companion, ACM                       Structures for Querying RDF from the
     Press, New York, New York, USA, 2016,              Web, in: Proceedings of the Third Latin
     pp. 251–254. URL: http://dl.acm.org/               American Web Congress, LA-WEB ’05,
     citation.cfm?doid=2872518.2890528.                 IEEE Computer Society, USA, 2005, p.
     doi:10.1145/2872518.2890528.                       71.
[24] L. Ehrlinger, W. Wöß, Towards a                    URL:https://doi.org/10.1109/LAWEB.20
     Definition of Knowledge Graphs, in: M.             05.25. doi:10.1109/LAWEB.2005.25.
     Martin, M. Cuquet, E. Folmer (Eds.), Joint    [31] B. Villazon-Terrazas, N. Garcia-Santa, Y.
     Proceedings of the Postersand De- mos              Ren, A. Faraotti, H. Wu, Y. Zhao, G.
     Track of the 12th International Conference         Vetere, J. Z. Pan, Knowledge Graph
     on      Semantic      Systems     -    SE-         Foundations, Springer Inter- national
     MANTiCS2016 and the 1st Internation-               Publishing, Cham, 2017, pp. 17–55.
     alWorkshop on Semantic Change &                    URL: https://doi.org/10.1007/ 978-3-319-
                                                                                                        357




     45654-6{_}2. doi:10.1007/ 978-3-319-                RDF storage approaches, Revue Africaine
     45654-6_2.                                          de la Recherche en Informa- tique et
[32] W. Zheng, L. Zou, W. Peng, X. Yan,                  Math{é}matiques Appliqu{é}es 15 (2012)
     S. Song, D. Zhao, Semantic SPARQL                   11–35. URL: https://hal.inria.fr/ hal-
     Similarity Search over RDF Knowl- edge              01299496.
     Graphs, Proc. VLDB Endow. 9 (2016)             [40] J. Sleeman, T. W. Finin, A. Joshi, Topic
     840–851.       URL:      https://doi.org/10.        Modeling for RDF Graphs, in: A. L.
     14778/2983200.2983201. doi:10.14778/                Gentile, Z. Zhang, C. D’Amato, H. Paul-
     2983200.2983201.                                    heim (Eds.), Proceedings of the Third In-
[33] T. Yang, J. Chen, X. Wang, Y. Chen,                 ternational Workshop on Linked Data for
     X. Du, Efficient SPARQL Query Evalua-               Information Extraction (LD4IE2015),
     tion via Automatic Data Partitioning, in:           CEUR Workshop Proceedings (CEUR-
     W. Meng, L. Feng, S. Bressan, W. Wini-              WS.org), Bethlehem, Pennsylvania, USA,
     warter, W. Song (Eds.), Database Sys-               2015, pp. 48–62.
     tems for Advanced Applications DAS-            [41] S. Pouriyeh, M. Allahyari, K. Kochut, G.
     FAA 2013, Springer Berlin Heidelberg,               Cheng, H. R. Arabnia, Combin- ing
     Berlin, Heidelberg, 2013, pp. 244–258.              Word Embedding and Knowledge- Based
[34] O. Curé, G. Blin, RDF and the Semantic              Topic Modeling for Entity Sum-
     Web Stack, in: O. Curé, G. Blin (Eds.),             marization, in: 2018 IEEE 12th Inter-
     RDF Database Sys- tems, first ed.,                  national Conference on Semantic Com-
     Elsevier, Waltham, MA, 2015, pp. 41–                puting (ICSC), Institute of Electrical and
     80.URL:https://linkinghub.elsevier.com/r            Electronics Engineers ( IEEE ), Laguna
     etrieve/pii/      B9780127999579000031.             Hills, CA, USA, 2018, pp. 252–255.
     doi:10.1016/ B978-0-12-799957-9.00003-              doi:10.1109/ICSC.2018.00044.
     1.                                             [42] S. Pouriyeh, M. Allahyari, K. Kochut, G.
[35] K. Hose, R. Schenkel, RDF Stores, in: L.            Cheng, H. R. Arabnia, ES-LDA: En- tity
     Liu, M. T. Özsu (Eds.), Encyclopedia of             Summarization using Knowledge- based
     Database Systems, Springer New York,                Topic Modeling, in: G. Kondrak, T.
     New York, NY, 2017, pp. 3100– 3106.                 Watanab (Eds.), Proceedings of the Eighth
     URL:http://link.springer.com/10.1007/97             International Joint Conference on Natural
     8-1-899-7993-3{_}80676-1.                           Language Processing (Volume 1: Long
     doi:10.1007/978-1-4899-7993-3_ 80676-               Papers), Asian Federation of Natu- ral
     1.                                                  Language Processing, Taipei, Taiwan,
[36] F. Du, Y. Chen, X. Du, Partitioned In-              2017, pp. 316–325. URL: https://www.
     dexes for Entity Search over RDF Knowl-             aclweb.org/anthology/I17-1032.pdf.
     edge Bases, in: S.-g. Lee, Z. Peng,            [43] Q. Liu, G. Cheng, K. Gunaratna, Y. Qu,
     X. Zhou, Y.-S. Moon, R. Unland, J. Yoo              Entity Summarization: State of the Art and
     (Eds.), Database Systems for Advanced               Future Challenges, ArXiv abs/1910.0
     Applications DASFAA 2012, Springer                  (2019). URL: http://arxiv.org/abs/1910.
     Berlin Heidelberg, Berlin, Heidelberg,              08252. arXiv:1910.08252.
     2012, pp. 141–155.                             [44] O. Kolomiyets, M.F.Moens, A Survey on
[37] A. Mohamed, G. Abuoda, A. Ghanem, Z.                Question Answering Technology from an
     Kaoudi, A. Aboulnaga, RDF- Frames:                  Information Retrieval           Perspective,
     Knowledge Graph Access for Machine                  Information Sciences 181 (2011) 5412–
     Learning       Tools     (2020).      URL:          5434.URL: http://www.sciencedirect.com/
     http://arxiv.org/abs/2002.03614.                    science/article/pii/S0020025511003860.
     arXiv:2002.03614.                                   doi:https://doi.org/10.1016/j.ins.2011.07.0
[38] R. Denaux, Y. Ren, B. Villazon-Terrazas,            47.
     P. Alexopoulos, A. Faraotti, H. Wu,            [45] E. M. Nabil Alkholy, M. Hassan Haggag,
     Knowledge Architecture for Organisa-                A. Aboutabl, Question Answering
     tions, Springer International Publishing,           Systems:      Analysis     and      Survey,
     Cham, 2017, pp. 57–84. URL: https://                International Journal of Computer Science
     doi.org/10.1007/978-3-319-45654-6{_}3.              & Engineering Survey 09 (2018) 1–13.
     doi:10.1007/978-3-319-45654-6_ 3.                   URL:http://aircconline.com/ijcses/V9N6/
[39] D. C. Faye, O. Curé, G. Blin, A sur- vey of         9618ijcses01.pdf.
                                                                                                      358




     doi:10.5121/ijcses.2018.9601.                      {CEUR} Workshop Proceedings, CEUR-
[46] S. K. Dwivedi, V. Singh, Research and              WS.org, 2016, p. 12. URL: http://ceur-ws.
     Reviews in Question Answering System,              org/Vol-1684/paper21.pdf.
     Procedia Technology 10 (2013) 417–424.        [53] K. Höffner, S. Walter, E. Marx, R. Us-
     URL:https://linkinghub.elsevier.com/retri          beck, J. Lehmann, A.-C. N. Ngomo,
     eve/pii/S2212017313005409.                         Survey on Challenges of Ques- tion
     doi:10.1016/j.protcy.2013.12.378.                  Answering in the Semantic Web,
[47] E. Dimitrakis, K. Sgontzos, Y. Tzitzikas,          Semantic Web 8 (2017) 895–920.
     A Survey on Question Answering Sys-                URL:https://content.iospress.com/articles/
     tems Over Linked Data and Documents,               semantic-web/sw247. doi:10.3233/SW-
     Journal of Intelligent Information Sys-            160247.
     tems 55 (2020) 233–259. URL:                  [54] Axel-Cyrille, N. Ngomo, M. Hoffmann, R.
     https://doi.org/10.1007/s10844-019-0584-           Usbeck, K. Jha, Holistic and Scal- able
     7. doi:10.1007/s10844-019-00584-7.                 Ranking of RDF Data, in: 2017 IEEE
[48] A. M. N. Allam, M. H. Haggag, The                  International Conference on Big Data (Big
     Question Answering Systems: A Survey,              Data), IEEE, Boston, MA, 2017, pp.
     International Journal of Re- search and            746–755.
     Reviews       in     Information Sciences          doi:10.1109/BigData.2017.8257990.
     (IJRRIS) 2 (2012) 211–221. URL:               [55] S. Elbassuoni, M. Ramanath, R. Schenkel,
     https://pdfs.semanticscholar.org/b294/             M. Sydow, G. Weikum, Language-
     4a85b9cb428a28e30bdd236471e712667e                 Model-Based Ranking for Queries on
     91.pdf.                                            RDF-Graphs, in:         Proceedings of the
[49] D. Diefenbach, V. Lopez, K. Singh,                 18th ACM Conference on Infor- mation
     P. Maret,            Core Techniques of            and Knowledge Management, CIKM ’09,
     Question Answering           Systems over          Association for Comput- ing Machinery,
     Knowledge Bases: A Survey, Knowledge               New York, NY, USA, 2009, pp. 977–986.
     and Information Systems 55 (2018) 529–             URL:https://doi.org/10.1145/1645953.16
     569.URL: https://doi.org/10.1007/s10115-           46078. doi:10.1145/ 1645953.1646078.
     017-1100-y.       doi:10.1007/s10115-017-     [56] A. Buikstra, H. Neth, L. Schooler, A. Ten
     1100-y.                                            Teije, F. Van Harmelen, Ranking Query
[50] J. Pérez, M. Arenas, C. Gutierrez, Se-             Results from Linked Open Data Using a
     mantics and Complexity of SPARQL, in:              Simple Cognitive Heuristic, in: Proceed-
     I. Cruz, S. Decker, D. Allemang, C. Preist,        ings of the 2011 International Confer- ence
     D. Schwabe, P. Mika, M. Uschold, L. M.             on Discovering Meaning On the Go in
     Aroyo (Eds.), The Semantic Web - ISWC              Large Heterogeneous Data, LHD’11,
     2006, Springer Berlin Heidelberg, Berlin,          Morgan Kaufmann Publishers Inc., San
     Heidelberg, 2006, pp. 30–43.                       Francisco, CA, USA, 2011, pp. 55–60.
[51] R. Angles, C. Gutierrez, The Expressive       [57] M. Yahya, D. Barbosa, K. Berberich, Q.
     Power of SPARQL, in: A. Sheth, S. Staab,           Wang, G. Weikum, Relationship Queries
     M. Dean, M. Paolucci, D. Maynard, T.               on Extended Knowledge Graphs, in:
     Finin, K. Thirunarayan (Eds.), The Se-             Proceedings      of      the Ninth ACM
     mantic Web - ISWC 2008, Springer Berlin            International Conference on Web Search
     Heidelberg, Berlin, Heidelberg, 2008, pp.          and Data Mining, WSDM ’16, Association
     114–129.                                           for Com- puting Machinery, New York,
[52] P. Grafkin, M. Mironov, M. Fellmann, B.            NY, USA, 2016, pp. 605–614. URL: https:
     Lantow, K. Sandkuhl, A. V. Smirnov,                //doi.org/10.1145/2835776.2835795.
     SPARQL Query Builders: Overview and                doi:10.1145/2835776.2835795.
     Comparison,         in: B. Johansson, F.      [58] Q. Wang, Z. Mao, B. Wang, L. Guo,
     Vencovský (Eds.), Joint Proceedings of             Knowledge Graph Embedding: A Survey
     the {BIR} 2016 Workshops and Doc- toral            of Approaches and Ap- plications, IEEE
     Consortium       co-located with 15th              Transactions on Knowledge and Data
     International Conference on Perspec- tives         Engineering 29 (2017) 2724–2743. URL:
     in Business Informatics Research {(BIR}            http://ieeexplore.ieee.org/document/8047
     2016),      Prague,     Czech    Republic,         276/. doi:10.1109/TKDE.2017.2754499.
     September 14 - 16, 2016, volume 1684 of       [59] G. A. Gesese, R. Biswas, M. Alam,
                                                                                                          359




     H. Sack, A Survey on Knowledge Graph                  One, IJCAI’11, AAAI Press, 2011, pp. 3–
     Embeddings with Literals: Which model                 10.
     links better Literal-ly?, ArXiv abs/1910.1       [67] Mausam, Open Information Extraction
     (2019).                                               Systems and Downstream Applications,
[60] P. Bonatti, S. Decker, A. Polleres, V.                in: Proceedings of the Twenty-Fifth In-
     Presutti, Knowledge Graphs: New                       ternational Joint Conference on Artifi- cial
     Directions for Knowledge Representa-                  Intelligence, IJCAI’16, AAAI Press, 2016,
     tion on the Semantic Web (Dagstuhl                    pp. 4074–4077.
     Seminar 18371), Dagstuhl Reports 8               [68] A. Zhila, E. Yagunova, O. Makarova,
                                                           Bringing The Output of Open Informa-
(2018)      106–110.      URL:       https://drops.        tion Extraction to The RDF / XML For-
     dagstuhl.de/opus/volltexte/2019/10328/.               mat : A Case Study, 2015.
     doi:10.4230/DagRep.8.9.29.                       [69] G. Angeli, M. J. J. Premkumar, C. D.
[61] K. M. Endris, M.-E. Vidal, D. Graux,                  Manning, Leveraging Linguistic Struc-
     Federated Query Processing, Springer                  ture For Open Domain Information Ex-
     International Publishing, Cham, 2020, pp.             traction, in: Proceedings of the 53rd
     73–86. URL: https://doi.org/10.1007/ 978-             Annual Meeting of the Association for
     3-030-53199-7{_}5. doi:10.1007/978-3-                 Computational Linguistics and the 7th
     030-53199-7_5.                                        International Joint Conference on Nat- ural
[62] M. Saleem, Y. Khan, A. Hasnain, I. Er-                Language Processing of the Asian
     milov, A.-C. N. Ngomo, A Fine-grained                 Federation of Natural Language Process-
     Evaluation of SPARQL Endpoint Feder-                  ing, {ACL} 2015, July 26-31, 2015,
     ation Systems, Semantic Web 7 (2016)                  Beijing, The Association for Computer
     493–518.                                              Linguis- tics, 2015, pp. 344–354. URL:
[63] J. Tang, M. Hong, D. L. Zhang, J. Li,                 https://doi.     org/10.3115/v1/p15-1034.
     Information Extraction: Methodologies                 doi:10.3115/ v1/p15-1034.
     and Applications, in: H. A. do Prad, E.          [70] I. Sim, P. Gorman, R. A. Greenes, R.
     Ferneda (Eds.), Emerging Technolo- gies               B. Haynes, B. Kaplan, H. Lehmann, P. C.
     of Text Mining, IGI Global, 2008, pp. 1–              Tang, Clinical decision sup- port systems
     33.     URL:       http://services.igi-global.        for the practice of evidence-based
     com/resolvedoi/resolve.aspx?doi=10.4018               medicine, Journal of the American
     /978-1-59904-373-9.ch001.              doi:10.        Medical Informatics Association: JAMIA
     4018/978-1-59904-373-9.ch001.                         8(2001)527–534.
[64] O. Etzioni, M. Banko, S. Soderland, D.                URL:https://pubmed.ncbi.nlm.
     S. Weld, Open Information Extrac- tion                nih.gov/11687560https://www.ncbi.
     from the Web, Commun. ACM 51 (2008)                   nlm.nih.gov/pmc/articles/PMC130063/.
     68–74.          URL:          https://doi.org/        doi:10.1136/jamia.2001.0080527.
     10.1145/1409360.1409378. doi:10.1145/
     1409360.1409378.
[65] S. Ali, H. Mousa, M. Hussien, A Review
     of Open Information Extrac- tion
     Techniques, IJCI. International Journal of
     Computers and Infor- mation 6 (2019)
     20–28.
     URL:https://ijci.journals.ekb.eg/article{_}
     35099.      htmlhttps://ijci.journals.ekb.eg/
     article{_}35099{_}2751a97dec8ca23f3e6
     ca98f27cee4b6.pdf.
     doi:10.21608/ijci.2019.35099.
[66] O. Etzioni, A. Fader, J. Christensen, S.
     Soderland, M. Mausam, Open Infor-
     mation Extraction: The Second Gener-
     ation, in: Proceedings of the Twenty-
     Second International Joint Conference on
     Artificial Intelligence - Volume Vol- ume