=Paper= {{Paper |id=Vol-1982/paper7 |storemode=property |title=SeDAn: a Plausible Reasoning Approach for Semantics-based Data Analytics in Healthcare |pdfUrl=https://ceur-ws.org/Vol-1982/paper7.pdf |volume=Vol-1982 |authors=Hossein Mohammadhassanzadeh,Samina Raza Abidi,Mohammad Salman Shah,Mehdi Karamollahi,Syed Sibte Raza Abidi |dblpUrl=https://dblp.org/rec/conf/aiia/Mohammadhassanzadeh17 }} ==SeDAn: a Plausible Reasoning Approach for Semantics-based Data Analytics in Healthcare== https://ceur-ws.org/Vol-1982/paper7.pdf
  SeDAn: a Plausible Reasoning Approach for Semantics-
           based Data Analytics in Healthcare

 Hossein Mohammadhassanzadeh1, Samina Raza Abidi2, Mohammad Salman Shah1,
               Mehdi Karamollahi3 and Syed Sibte Raza Abidi 1
     1
         NICHE Research Group, Faculty of Computer Science, Dalhousie University, Canada
 {hassanzadeh, raza.abidi}@dal.ca, msalmanshah1987@gmail.com
              2 Medical Informatics, Faculty of Medicine, Dalhousie University, Canada

                                            samina.abidi@dal.ca
              3 Faculty of Computer Science, The University of Rome La Sapienza, Italy

                    karamollahi.1771946@studenti.uniroma1.it


          Abstract. Plausible Reasoning (PR) is an inferencing mechanism to derive solu-
          tions when dealing with incomplete knowledge. When developing data-driven
          models for clinical decision support, the completeness of the data is always a
          consideration. PR provides a practical approach to extend the knowledge-base of
          a clinical decision support system by abstracting plausible assertions from heath
          data. Implementation of plausible reasoning relies on fine-grained knowledge of
          how different concepts are semantically related. The Semantic Web provides for-
          malisms to semantically represent knowledge at various levels of expressivity,
          and to reason over the knowledge to perform semantic analytics based on
          healthcare data. This paper proposes a SEmantics-based Data ANalytics frame-
          work (SeDan) to investigate the potential of implementing plausible reasoning
          using the Semantic Web technologies. In particular, we will evaluate the efficacy
          of the proposed framework in healthcare to perform effective semantic analytics
          using partial health data to make better decisions in disease diagnosis and long-
          term care. We demonstrate the efficacy of SeDan by answering medical queries
          posed by BioASQ challenges using Disease ontology, DrugBank and Semantic
          MEDLINE databases.

          Keywords: Plausible Reasoning, OWL, Semantic Analytics, Semantic Web
          Reasoning.


1         Introduction

    The massive volume of diverse data from clinical practice, healthcare and biomedi-
cal research is an opportunity for medical big-data analytics. However, due to the in-
trinsic nature of data that may be incomplete and inaccurate, the interpretation of data
and its associations might be a serious challenge [1]. In this regard, innovative methods,
algorithms and tools are needed to facilitate knowledge representation, exchange and
reasoning, which is understandable for both human and machine [2].
    In applying data analytics to real health applications, especially with large and com-
plex datasets, the patient data is typically sparse and incomplete. To deal with missing

D. Impedovo and G. Pirlo (Eds.), Workshop on Artificial Intelligence with Application in Health, Bari, Italy, November 14, 2017.
Copyright held by the authors.
2


data, two approaches exist: (i) removing the objects (entities) or features with incom-
plete data, and (ii) data filling based on expertsโ€™ experience, fuzzy, or Bayesian models
for a best guess estimation [3]. The former solution considerably reduces the size of
data, and the latter needs expertโ€™s input, calculating statistical associations between
data, or requires probability distribution that may not be always available. While Plau-
sible Reasoning (PR) is an alternative reasoning method that derives solutions when
complete data is lacking or non-existent.
    Comparing to clinical decision-making process, physicians, basically, observe the
available knowledge to make a diagnosis or order a treatment. If the existing knowledge
is not sufficient, then the physicians leverage their own tacit knowledge to discover the
correlations within existing medical data, draw new relationships and infer the missing
knowledge [1]. Plausible reasoning follows the physiciansโ€™ thinking process to generate
new hypothesis. Plausible reasoning does not conform to strict logical formalisms; but
it provides a mechanism to infer new knowledge, albeit a weaker inference, especially
when working with the Open World Assumption (OWA) [4]. For such cases, PR can
infer new and missing relationships by leveraging how different concepts are semanti-
cally interrelated [5].
    The Semantic Web (SW) framework provides logic-based formalisms to semanti-
cally represent knowledge at various levels of expressivity. The SW also offers effec-
tive built-in support for deduction-based reasoning, including Description Logic (DL)
reasoning and rule-based languages, that conform to the OWA. The results, therefore,
are demonstrative and consistent with the knowledge. Despite the great potential of the
SW technologies in different domains, including healthcare, there is currently a lack of
support for representing and reasoning with uncertainty and incompleteness in the SW
framework, which is an irresolvable part of our daily life [6], [7]. This shortcoming
limits the use of the SW-based approaches in clinical decision support systems that
require efficient handling of incompleteness [8].
    This drawback of SW has led to several approaches [6] introducing probabilistic
variants and fuzzy extensions [9] to the Web Ontology Language (OWL) to deal with
vague information. Such probabilistic/fuzzy OWL extensions improve the capability of
SW reasoners in dealing with uncertainty. However, they are only applicable to cases
where the truth of facts has some degree of ambiguity (qualitative uncertainty), not the
cases where uncertainty is result of lack of knowledge (quantitative uncertainty) [10].
    In this regard, [1] implemented a multi-strategy reasoning framework, including de-
ductive, inductive and analogical reasoning, within the SW framework. They leveraged
ontological knowledge to increase the expressivity and accuracy of plausible reasoning
methods. They showed that implementing plausible reasoning methods can extend the
coverage of an incomplete KB, and exploiting enriched OWL ontologies can signifi-
cantly increase the accuracy of the results. However, there is still a lack of non-deduc-
tive reasoning support in the logic layer of the SW. Current study, aims to introduce
plausible reasoning as one non-deductive approach targeting the logic layer of the SW.
    In this research, we propose the concept of semantic analytics as the analysis of se-
mantically annotated data, i.e., data represented in Resource Description Framework
(RDF) to infer new knowledge, whilst adhering to the SWโ€™s OWA about knowledge
                                                                                                              3


incompleteness, by using expressive semantics and semantics relevant reasoning meth-
ods [11]. We believe that RDF Schema and OWL, expressing additional semantics on
top of RDF, is one way to achieve semantic analytics. There are a number of ways that
new facts can be inferred when we have complete knowledge, however within the
OWA we need to account for incomplete knowledge that may lead to non-deductive
reasoning, and plausible reasoning is one such reasoning approach.
    This research aims to investigate the potential of implementing plausible reasoning
within the SW, targeting a semantic analytics framework for health data analytics, es-
pecially when working with large health datasets. In line with this objective, we aim to:
(i) introduce additional markups (plausible extension to OWL) that extend OWL se-
mantics to better capture and represent plausible semantics, (ii) develop a semantic an-
alytics framework using query-rewriting algorithm to discover new associations be-
tween underlying domain-specific data, (iii) evaluate framework using health data.


2        Plausible Reasoning

   Plausible Reasoning, which is non-demonstrative, ampliative and non-monotonic, is
a weak inference approach that identifies the associations between the question and the
knowledge retrieved from memory and draw the line of inference based on those asso-
ciations. Plausible reasoning performs inferencing by using a set of frequently recurring
patterns that do not occur in formal logic [12]. A plausible reasoning stack is introduced
in [1]. The stack is comprised of a set of plausible patterns and 3 plausible reasoning
mechanisms that use the patterns to infer new rules and facts. [1] also classifies plausi-
ble patterns into 3 groups (Table 1): hierarchy-based patterns, order-based and hybrid.
                                    Table 1. - Plausible Patterns [1]
    Plausible Pattern                                        Description
    Generalization a    Passing from a given set of objects to a larger set that contains the given set.
                        Passing from a given set of objects to a smaller set that is contained in the given
    Specialization a
                        one.
                        Creating a new relation from observation space ๐‘‹ to conclusion space ๐‘Œ, where
    Interpolation b     ๐‘ฅ๐‘– โˆˆ ๐‘‹ is not mapped to any y โˆˆ ๐‘Œ (unknown relation), but other relations from
                        ๐‘ฅโ„Ž , ๐‘ฅ๐‘— (โ‰  ๐‘ฅ๐‘– ) to ๐‘Œ and ๐‘ฅโ„Ž < ๐‘ฅ๐‘– < ๐‘ฅ๐‘— are known.
                        An inference from a proposition with high degree of confidence to a less confident
    A Fortiori b
                        proposition that is not clearly specified but is implicit in the first one.
    Similarity/
                        Moving between any two comparable nodes (siblings) in the concept hierarchy.
    Dissimilarity c
    a
      Hierarchy-based patterns, b Order-based patterns, c Hybrid patterns

   Hierarchy-based patterns move between the nodes in hierarchical structure, from
parent to child or vice versa, to perform a hierarchical plausible inference. Order-based
patterns leverage measurable properties (partial order) to compare concepts regarding
their size, order, location, ranking, etc. and infer new pieces of knowledge. However,
hybrid patterns will be performed using both hierarchical relations and partial order of
concepts to infer a plausible answer; they probe hierarchy and move between any two
comparable nodes, or consider the concepts that are analogues regarding some measur-
able properties. The utility of ordered-based patterns within inductive and analogical
4


reasoning has been studied and approved [1]. While, current study investigates the ef-
ficiency of all the plausible patterns together, either working alone or in a combination
with other patterns. Definition 1 provides a formal notation for PR.
   Definition 1: Let ๐’ฆ be a knowledge base including terminological constructs ๐’ฏ and
incomplete assertional knowledge ๐’œ (๐’ฆ = โŸจ๐’ฏ, ๐’œโŸฉ) and ๐’ฌ a query. A plausible reasoner
plbRes(๐’ฌ, ๐’ฏ, ๐’œ) returns a set of solutions for ๐’ฌ:
                      ฮก๐’ฌ๐’ฆ = { โŒฉ๐‘๐‘™๐‘๐ด๐‘›๐‘  , {๐œ‹1 , โ€ฆ , ๐œ‹๐‘› }โŒช | ๐œ‹ โˆˆ ฮ , 1 โ‰ค n}
  Which plbAns is a plausibly inferred solution, ฮ  is the set of plausible patterns, and
{๐œ‹1 , โ€ฆ , ๐œ‹๐‘› } demonstrates the plausible pattern(s) involved in the reasoning process.


3      Query Rewriting within the Semantic Web

   Query Rewriting (QR) algorithms use ontological constructs to transform a given
query to an expanded version that extracts both explicit (what a KB knows) and implicit
(what it assumes) knowledge from the data [13], [14]. Therefore, QR can be used as a
technique to implement plausible patterns and solve queries over an incomplete KB.
Within the SW framework, OWL 2 QL profile provides a query rewriting mechanism
to query data through an ontology. OWL 2 QL is underpinned by DL-Lite family of
description logics. The OWA made in DLs makes OWL 2 QL suitable to work with
incomplete knowledge in the SW scenarios [14], [15]. Independence from data and
support of other variants of DL-Lite have made QL a suitable approach to Ontology
Based Data Access (OBDA) in large RDF stores with different levels of expressivity.
   However, the DL-Lite underlying OWL 2 QL roughly describes the allowed opera-
tors, which limits their expressivity when it comes to the domains with uncertainty and
incompleteness [16]. The axioms within QL support variety of inferences in OBDA,
but it may not cover all the plausible semantics. The goal of our work is to introduce a
plausible extension to OWL QL to support plausible relations and properties.


4      SeDan: Semantics based Data Analytics Framework

   To achieve the semantic analytics, we propose a framework (Fig. 1) that implements
a plausible reasoner to infer new knowledge from RDF knowledge bases. This reasoner
develops plausible reasoning patterns by manipulating the underlying graph directly
with SPARQL query rewriting using OWL DL constructs.
   The proposed framework mainly includes three modules: knowledge sources, plau-
sible reasoner and user interface. Knowledge sources provide terminological constructs
to be consumed during the reasoning process, and assertional knowledge to be used to
evaluate the extended query. The plausible reasoner (discussed more in the following
section) delivers semantics analytics by running a query rewriting algorithm to perform
plausible patterns and infer a set of so-called certain solutions. The system accepts the
query with a list of desired plausible patterns via the user interface, and in return, de-
livers the plausible answer(s) and their justifications.
                                                                                                       5




                     Fig. 1. Proposed semantics based data analytics framework
4.1       Plausible Extension to OWL
   Standard reasoning capabilities within OWL QL profile support various types of on-
tology-based inference โ€“ rdfs:subClassOf represents hierarchical relations and
owl:sameAs conducts similarity. However, QL does not support all the semantics re-
quired in plausible patterns, like partial order or context. Therefore, it is needed to con-
sider how we can extend OWL in the cases that it has not enough expressivity. In this
regard, plausible extension to OWL includes defining new classes, followed by defin-
ing new properties that use new (and existing) classes to express new relations. Table
2 demonstrates a subset of the proposed extension to OWL.
                          Table 2. Plausible extension to OWL (PLOWL)
        Class Name                  Supper Class                             On Property
      OrderedProperty               ObjectProperty                                -
          Context                       Class                                 hasContext
      PlausiblePattern                  Class                             inferredViaPattern
      Property Name             Type               Domain          Range            Inverse Property
       standsBefore        Ordered Property         Entity          Entity             standsAfter
        standsAfter        Ordered Property         Entity          Entity            standsBefore
        hasContext         Object Property          Entity         Context                  -
                                                Plausible An-   Plausible Pat-
  inferredViaPattern       Object Property                                                  -
                                                    swer             tern
   An ordered property is a property to reflect partial order of two entities w.r.t a meas-
urable property (plowl:Context). More formally, if P is an plowl:OrderedProperty, any
instance of P, like (x,y), implies x is bigger, older, etc. than y or vice versa. From this,
the plausible reasoner would be able to conduct interpolation and a fortiori reasoning.
In Table 2, plowl:standsAfter and plowl:standsBefore are instances of plowl:Or-
deredProperty demonstrates how entities are comparable. Similarly, plowl:hasContext
indicates the specific context in which the ordered property is meaningful.

4.2       Query Rewriting Algorithm
   In this section, we present the proposed QR algorithm (Algorithm 1) that supports
plausible reasoning patterns in the SeDan reasoning engine. We makes use of GCLRR
algorithm [17] which transforms a query Q into a Union of Conjunctive Queries
(UCQs) by applying the TBox axioms to the body atoms of the query. UCQ is one of
6


the most common approach for computing a so-called perfect rewriting of a query. A
UCQ is a set of conjunctive queries of the same arity and the same query predicate.
Algorithm 1 demonstrates the proposed algorithm.
    To start the rewriting, the algorithm needs an initial query, a set of preferred plausi-
ble patterns to limit the extended query to those patterns, and an ontology based on
๐ท๐ฟ โˆ’ ๐ฟ๐‘–๐‘ก๐‘’๐‘‡ axioms that is semantically enriched with introduced Plausible OWL ex-
tension. Starting with the initial query, the algorithm tries to replace the body atom of
the query ๐ท (step 7), with new atom ๐ทโ€ฒ . The atom ๐ทโ€ฒ should be (i) semantically related
to ๐ท (โˆƒ๐›ผ โˆˆ ๐’ฏ ๐›ผ(๐ท, ๐ทโ€ฒ )), and (ii) applicable to the preferred plausible patters (step 6).
For example, rdfs:subClassOf is applicable to generalization, owl:instanceOf is used in
specialization, owl:sameAs conducts (dis)similarity, and plowl:standsAfter is applica-
ble to a fortiori and interpolation. The new conjunctive query resulting from replacing
an atom will be added to ๐‘…, the set of conjunctive queries. This algorithm keeps formu-
lating new queries until there is no unique query to be added.
        Algorithm 1. The proposed QR algorithm
       Input: A query in a triple format, a set of plausible patterns
        ๐œ‹ โˆˆ ฮ : {๐บ๐ธ๐‘, ๐‘†๐‘ƒ๐ธ๐ถ, ๐‘†๐ผ๐‘€, ๐ท๐ผ๐‘†, ๐น๐‘‚๐‘…๐‘‡, ๐ผ๐‘๐‘‡๐‘ƒ}, ๐ท๐ฟ โˆ’ ๐ฟ๐‘–๐‘ก๐‘’๐‘‡ TBOx ๐’ฏ enriched with PL-OWL extension
        Output: R, a set of rewriting queries.
          1: R = {๐‘„};
          2: repeat
          3:    foreach ๐‘ž๐‘ข๐‘’๐‘Ÿ๐‘ฆ Q โˆˆ R do
          4:      foreach ๐‘Ž๐‘ก๐‘œ๐‘š ๐ท ๐‘–๐‘› ๐‘„ do
          5:        foreach ๐‘Ž๐‘ฅ๐‘–๐‘œ๐‘š ๐›ผ โˆˆ ๐’ฏ do
          6:          if ๐›ผ is applicable to any ฯ€ โˆˆ ฮ , w. r. t. D
          7:             Qโ€ฒ = โˆƒ๐ท โ€ฒ . Q(๐ท โ†’ ๐ท โ€ฒ ) โˆง ๐›ผ(๐ท, ๐ท โ€ฒ );
          8:             R = R โˆช {๐‘„ โ€ฒ};
         13: until no unique query can be added to R;
         14: return R;




5      Improving clinical decision support using SeDan

   The SeDan framework has the potential to be used for decision-making and problem
solving in any domain, which suffer from incomplete knowledge. However, we have
focused on healthcare applications for the following reasons:
โ”€ Semantic analytics is very relevant to healthcare, as it is predominantly a knowledge-
   intensive domain. The opportunity to capture and leverage semantics via inference
   or query processing is vital for supporting both disease diagnosis and long term care
   (e.g. predictive and preventive diagnosis of chronic diseases) [18].
โ”€ A vast amount of health data is available from many diverse automated information
   systems including Electronic Health Records (EHR), Personal Health Records
   (PHR), Electronic Medical Records (EMR). Effective semantic analytics of data en-
   ables the extraction of potential relationships existing in healthcare data to provide
   insights that can assist healthcare providers to make better decisions.
  To demonstrate the efficacy of SeDan and the feasibility of our query rewriting al-
gorithm, we provide two case studies where we attempt to answer two questions from
BioASQ challenges [19] using DrugBank [20], Disease Ontology [21] and Semantic
                                                                                                7


MEDLINE1 database [22]. BioASQ challenges are a series of tasks in which partici-
pants are asked to respond to a set of questions posed by medical expert. The DrugBank
is a bioinformatics and cheminformatics resource that includes detailed drug data. Dis-
ease ontology is standardized ontology for human disease, and Semantic MEDLINE
database is a repository of 89.2 million sematic triples extracted from PubMed articles.
For the sake of simplicity, in the examples below, we only discuss one conjunctive
query out of the possible dozens of queries resulting from query rewriting algorithm.

5.1      Example 1: Migalastat treats Fabry Disease?

  In this case study, when the question โ€œIs Migalastat used for treatment of Fabry
Disease?โ€ (BioASQ challenge, Task 5b) is posed to the SemMedDB, the traditional
approach returns a response โ€˜Noโ€™ as it cannot find any matching triple. The initial
SPARQL syntax of the question can be written as below:
      Initial SPARQL query:                                                 Answer:
      @PREFIX sem: 
                                                                            No
        ASK { "Migalastat" sem:treats    "Fabry Disease" }

                 Code 1. Initial query answering if Migalastat treats Fabry Disease
   By posing the failed query to SeDan framework, it uses ontological semantics to
conduct the query rewriting. The QR algorithm explores the domain ontology to find
any hierarchical/ordered relationships that matches any or a combination of the subject,
object, and predicate of the triple in the question. Then, the query transformation would
be performed by replacing the new atom with the matching atom in the triple.
   Regarding the failed query (Code 1), and using the DrugBank ontology, we know
Migalastat is an alpha-Galactosidase (DrugBank: DB05018). Based on generalization
pattern, QR algorithm replace the subject of the triple (Migalastat) with its super class
(alpha-Galactosidase) with this logic that โ€œif a category of drugs can treat a disease,
then any subclass or instance of that category would treat the disease as wellโ€. Using
the relevant ontology axiom, transformation ๐‘ก (Fig. 2) can be conducted:
                           ("Migalastat", sem:treats, "Fabry Diseaseโ€)
                              t: Migalastat db:isa alphaโˆ’Galactosidase
                               โ†’
                      ("alpha-Galactosidase", sem:treats, "Fabry Disease")

                        Fig. 2. Rewritten triple using the ontology axiom
   Considering the transformation above, the initial SPARQL query could be written
as below (Code 2). By posing this new query over SemMedDB, we will get a โ€˜Yesโ€™
answer, as the database contains the matching triple. As seen in Code 2, the plausible
answer โ€˜Yesโ€™ is accompanied by the plausible pattern, generalization, that is involved
in the QR. It shows which plausible patterns has lead to this plausible answer.
      Rewritten SPARQL query:                                               Plausible Answer:
      PREFIX sem:                         (Yes, {GEN})
        ASK
        { "alpha-Galactosidase" sem:treats "Fabry Disease" }
                Code 2. Rewritten query answering if Migalastat treats Fabry Disease

1 https://skr3.nlm.nih.gov/SemMedDB/index.html
8


5.2      Example 2: Herceptin treats Prostate Cancer?

    In this case study, we are asking another Yes/No question, โ€œIs Herceptin of potential
use in the treatment of prostate cancer?โ€ (BioASQ challenge, Task 2b), over the
SemMedDB. Making use of the existing triples in the database, there is no matching
triple unifying the question. Consequently, the answer will be โ€˜Noโ€™. The initial
SPARQL syntax of the question is as bellow:
      Initial SPARQL query:                                                 Answer:
      @PREFIX sem: 
                                                                            No
        ASK { "Herceptin" sem:treats    "Prostate cancer" }

                 Code 3. Initial query answering if Migalastat treats Fabry Disease
  Utilizing Disease ontology axioms (DOID:10286) and existing triples in
SemMedDB, we know:
               ("Herceptin", sem:treats, "Malignant neoplasms")             (1)
               ("Malignant neoplasms", sem:occurs_in, "Prostate carcinoma") (2)
               ("Prostate carcinoma", do:isa, "Prostate cancer")            (3)

                        Fig. 3. Rewritten triple using the ontology axiom
    In the triples above, the treats predicate (Fig. 3.1) shows a disease (malignant neo-
plasms) that could be treated by Herceptin. The occurs_in relationship (Fig. 3.2) char-
acterizes the โ€œorderโ€ of occurrence of two phenomena, in this case two phases of a
disease: malignant neoplasms and prostate carcinoma. The is_a relationship (Fig. 3.3)
represents a hierarchical relationship between two diseases, prostate carcinoma and
prostate cancer. Using the semantics above, QR algorithm exploits specialization pat-
tern and a fortiori pattern, to transform the initial query to the expanded query below:
    Rewritten SPARQL query:                                            Plausible Answer:
    PREFIX do:    < http://disease-ontology.org/term#>
    PREFIX sem:                      (Yes, {SPEC, AFORT})
       ASK
       { "Herceptin" sem:treats " Malignant neoplasms".
         "Malignant neoplasms", sem:occurs_in, "Prostate carcinoma".
         "Prostate carcinoma", do:isa, "Prostate cancer"}
                Code 4. Rewritten query answering if Migalastat treats Fabry Disease
    By posing the new query over SemMedDB, we will get a plausible positive answer
that is inferred via both specialization and a fortiori patterns. The inference above means:
Herceptin could treat prostate cancer, as Herceptin could treat malignant neoplasms
that is an earlier phase (ordered relationship) of prostate carcinoma, which is a type of
(hierarchical relationship) prostate cancer. In other words, Herceptin could plausibly
treat prostate cancer as it is administered to some prior phases of the disease.


6        Discussion

   Medical experts can make plausible conclusions as they know semantics and under-
stand the relationships between the concepts. They also utilize plausible patterns to
draw tentative associations that are currently missing. So, case studies above and simi-
lar inferences might seem straightforward to the practitioners making clinical decisions.
However, examples above showed even with a large database like SemMedDB (with
                                                                                         9


over 89 million predicates from all of PubMed citations), conventional clinical reason-
ing engines cannot guarantee an answer. This drawback is due to the lack of support
for handling uncertainty resulting from missing associations between data attributes.
    Despite the strict logical formalisms in traditional reasonings, case studies above
showed PR, as a weak form of inference, can infer new knowledge by exploiting se-
mantics between data. In the first case study, the QR algorithm replaced the subject of
the triple in the question by its parent in the hierarchy to conduct a generalization pat-
tern, with this logic that โ€œwhen something is true about a set of objects, it might be true
for any subset of itโ€. In the second case study, the plausible answer is the result of
combination of specialization and a fortiori. The rationale behind specialization pattern
contrasts with generalization: โ€œwhen something is true about a class/entity, it might be
true about its super class (parent) as wellโ€. However, a fortiori pattern, as an ordered-
based pattern, conducts the query transformation based on the belief that โ€œif something
is true about a stage of a phenomena, then it might be true for any stages after thatโ€.
    The efficiency of SeDan depends on (i) the collaboration between the plausible pat-
terns, like how human thought process works, and (ii) the ontological constructs that
conduct the plausible patterns. A well-designed QR algorithm addresses the first issue.
However, the enrichment, validity and variety of semantic annotations and relationships
of the ontologies that QR algorithm uses to rewrite a query would be a challenge.


7      Conclusions and Future Work

   Healthcare is a knowledge-intensive domain, which typically suffers from incom-
plete data. To extend the knowledge coverage of medical knowledge-bases and enhance
patient health outcomes, machines are required to (i) capture and understand semantics
and relationships between data attributes, and (ii) leverage those semantics to extract
potential relationships existing in healthcare data (EHR, PHR, EMR, etc.)
   To this aim, we introduced the SeDan framework that supports automated clinical
decision support via semantics-based data analytics. The plausible reasoner integrates
plausible patterns with fine-grained biomedical ontologies. The reasoner infers plausi-
ble solution(s) by transforming an initial query with no answer to an augmented union
conjunctive of queries. This flexible mechanism extends SPARQL queries with the
hope to overcome the existing gap in the medical knowledge bases.
   From the theory development perspective, Sedan implements plausible patterns us-
ing OWL constructs and SPARQL to provide principled means to represent and reason
with incompleteness. Our proposed plausible extension to OWL provides full-fledge
support to implement plausible patterns within the SW. From an applied perspective,
due to the flexible graph-based data format capable of incorporating new relations, sup-
port for rich semantics and automatic DL-based reasoning, the SW technologies pro-
vide excellent support for PR to draw semantic inferences from large datasets.
   Future work consists of studying the efficiency of SeDan in answering the questions
from the latest BioAsk task using Disease ontology, DrugBank and Semantic
MEDLINE databases. This is the first step to verify the competency of SeDan in an-
swering to real-world medical questions before using it in a real clinical environment.
10


Improving the performance of the QR algorithm (i.e., reduction phase) to guarantee
computational completeness and decidability of the reasoner will be the next step.
Acknowledgment: This research is supported by a NSERC Discovery Grant.

References
[1]    H. Mohammadhassanzadeh, W. Van Woensel, S. R. Abidi, and S. S. R. Abidi, โ€œSemantics-based
       plausible reasoning to extend the knowledge coverage of medical knowledge bases for improved
       clinical decision support,โ€ BioData Min., vol. 10, no. 1, p. 7, 2017.
[2]    A. Holzinger and I. Jurisica, โ€œKnowledge discovery and data mining in biomedical informatics:
       The future is in integrative, interactive machine learning solutions,โ€ Interact. Knowl. Discov. data
       Min. Biomed. informatics, pp. 1โ€“18, 2014.
[3]    R. Almeida, U. Kaymak, and J. Sousa, โ€œA new approach to dealing with missing values in data-
       driven fuzzy modeling,โ€ Fuzzy Syst. (FUZZ), 2010 IEEE Int. Conf., 2010.
[4]    D. Walton, C. W. Tindale, and T. F. Gordon, โ€œApplying Recent Argumentation Methods to Some
       Ancient Examples of Plausible Reasoning,โ€ Argumentation, vol. 28, no. 1, pp. 85โ€“119, Nov. 2014.
[5]    A. Collins and R. Michalski, โ€œThe logic of plausible reasoning: A core theory,โ€ Cogn. Sci., vol. 13,
       pp. 1โ€“49, 1989.
[6]    D. Ausรญn, D. Lรณpez-de-Ipina, and F. Castanedo, โ€œA probabilistic OWL reasoner for intelligent
       environments,โ€ in Proceedings of the 10th International Conference on Uncertainty Reasoning for
       the Semantic Web, 2014, vol. 1259, pp. 1โ€“12.
[7]    N. Al Haider, S. Abidi, W. Van Woensel, and S. S. Abidi, โ€œIntegrating existing large scale medical
       laboratory data into the semantic web framework,โ€ in Big Data (Big Data), 2014 IEEE
       International Conference on, 2014.
[8]    T. Berners-Lee, M. Fischetti, and M. F. By-Dertouzos, Weaving the Web: The original design and
       ultimate destiny of the World Wide Web by its inventor. HarperInformation, 2000.
[9]    G. Stoilos, T. Venetis, and G. Stamou, โ€œA Fuzzy Extension to the OWL 2 RL Ontology Language,โ€
       Comput. J., vol. 58, no. 11, pp. 2956โ€“2971, 2015.
[10]   P. Han, W. Klein, and N. Arora, โ€œVarieties of uncertainty in health care a conceptual taxonomy,โ€
       Med. Decis. Mak., 2011.
[11]   M. Dimartino, A. Calรฌ, and A. Poulovassilis, โ€œQuery Rewriting under Linear EL Knowledge
       Bases,โ€ in International Conference on Web Reasoning and Rule Systems, 2016, pp. 61โ€“67.
[12]   M. Virvou and K. Kabassi, โ€œAdapting the human plausible reasoning theory to a graphical user
       interface,โ€ IEEE Trans. Syst. Man, Cybern. A Syst. Humans, vol. 34, no. 4, pp. 546โ€“563, 2004.
[13]   H. Pรฉrez-Urbina and E. Rodrฤฑguez-Dฤฑaz, โ€œEvaluation of query rewriting approaches for OWL 2,โ€
       Proc. of SSWS+ HPCSW, 2012.
[14]   S. Grimm and B. Motik, โ€œClosed World Reasoning in the Semantic Web through Epistemic
       Operators,โ€ OWLED, 2005.
[15]   M. Bienvenu, โ€œOntology-Mediated Query Answering: Harnessing Knowledge to Get More From
       Data,โ€ in nternational Joint Conference on Artificial Intelligence, 2016.
[16]   T. Lukasiewicz and U. Straccia, โ€œManaging uncertainty and vagueness in description logics for the
       semantic web,โ€ Web Semant. Sci. Serv. Agents World Wide Web, vol. 6, no. 4, 2008.
[17]   H. Pรฉrez-Urbina, B. Motik, and I. Horrocks, โ€œA Comparison of Query Rewriting Techniques for
       DL-lite,โ€ Descr. Logics, 2009.
[18]   O. Mohammed, โ€œSemantic web system for differential diagnosis recommendations,โ€ Lakehead
       University, 2012.
[19]   โ€œThe BioASQ Challenge.โ€ [Online]. Available: http://www.bioasq.org/.
[20]   V. Law, C. Knox, Y. Djoumbou, and T. Jewison, โ€œDrugBank 4.0: shedding new light on drug
       metabolism,โ€ Nucleic Acids Res., vol. 42, no. D1, pp. D1091โ€“D1097, 2013.
[21]   W. Kibbe, C. Arze, V. Felix, and E. Mitraka, โ€œDisease Ontology 2015 update: an expanded and
       updated database of human diseases for linking biomedical knowledge through disease data,โ€
       Nucleic Acids Res., vol. 43, no. D1, pp. D1071โ€“D1078, 2014.
[22]   T. Rindflesch, H. Kilicoglu, and M. Fiszman, โ€œSemantic MEDLINE: An advanced information
       management application for biomedicine,โ€ Inf. Serv. Use, vol. 31, no. 1โ€“2, pp. 15โ€“21, 2011.