=Paper= {{Paper |id=None |storemode=property |title=BIOQUERY-ASP: Querying Biomedical Ontologies using Answer Set Programming |pdfUrl=https://ceur-ws.org/Vol-799/paper8.pdf |volume=Vol-799 |dblpUrl=https://dblp.org/rec/conf/ruleml/ErdemEO11 }} ==BIOQUERY-ASP: Querying Biomedical Ontologies using Answer Set Programming== https://ceur-ws.org/Vol-799/paper8.pdf
B IO Q UERY-ASP: Querying Biomedical Ontologies using
              Answer Set Programming

                      Esra Erdem, Halit Erdogan, and Umut Oztok
      Faculty of Engineering and Natural Sciences, Sabancı University, İstanbul, Turkey



       Abstract. We describe a software system, called B IO Q UERY-ASP, that finds an-
       swers and generates explanations to complex biomedical queries over the avail-
       able knowledge resources, such as, P HARM GKB, D RUG BANK, CTD, S IDER, B I -
       O G RID , using the computational methods/tools of Answer Set Programming.



1   Introduction
Recent advances in health and life sciences have led to generation of a large amount of
biomedical data. To facilitate access to its desired parts, such a big mass of data has been
represented in structured forms, like biomedical ontologies and databases. On the other
hand, representing these ontologies and databases in different forms, constructing them
independently from each other, and storing them at different locations have brought
about many challenges for answering queries about the knowledge represented in these
ontologies.
     One of the challenges for the users is to be able to represent a complex query in
a natural language, and get its answers in an understandable form. Another challenge
is to be able to answer complex queries that require appropriate integration of relevant
knowledge from different knowledge resources and/or that require auxiliary definitions,
such as, chains of drug-drug interactions, cliques of genes based on gene-gene relations,
or similarity/diversity of genes/drugs. Furthermore, once an answer is found for a com-
plex query, the experts may need further explanations about the answer.
     Consider, for instance, the following queries:
Q1 What are the genes that are targeted by the drug Epinephrine and that interact with
   the gene DLG4?
Q2 What are the genes that are targeted by all the drugs that belong to the category
   Hmg-coa reductase inhibitors?
Q3 What are the genes related to the gene ADRB1 via a gene-gene relation chain of
   length at most 3?
Q4 What are the 3 most similar genes that are targeted by the drug Epinephrine?
    Most of the existing biomedical querying systems (e.g, web services built over
the available knowledge resources) support keyword search but not complex queries
like Q1–Q4. Some of these complex queries, such as Q1 and Q2, can be represented
in a formal query language (e.g., SQL/SPARQL) and then answered using Semantic
Web technologies. However, queries, like Q3, that require auxiliary recursive defini-
tions (such as transitive closure) cannot be directly represented in these languages; and
2       Erdogan, Oztok, and Erdem

thus such queries cannot be answered directly using Semantic Web technologies. The
experts usually compute auxiliary relations externally, for instance, by enumerating all
drug-drug interaction chains or gene cliques, and then use these auxiliary relations to
represent and answer a query like Q3. Similarity/diversity queries, like Q4, cannot be
represented directly in these languages either, and require a sophisticated reasoning al-
gorithm. Also, none of the existing systems can provide informative explanations about
the answers, but point to related web pages of the knowledge resources available online.
    We have built a software system, called B IO Q UERY-ASP,1 that handles all these
challenges using Answer Set Programming (ASP) [9]. To address the first challenge,
we have developed a controlled natural language for biomedical queries about drug dis-
covery; this language is called B IO Q UERY-CNL [5]. For instance, the queries Q1–Q4
are in B IO Q UERY-CNL. Then we have built an intelligent user interface that allows
users to enter biomedical queries in B IO Q UERY-CNL and that presents the answers
(possibly with explanations or related links, if requested) in B IO Q UERY-CNL [6]. To
address the second challenge, we have developed a rule layer over biomedical ontolo-
gies and databases, that not only integrates the concepts in these knowledge resources
but also provides definitions of auxiliary concepts [1]. We have introduced an algorithm
to identify the relevant parts of the rule layer and the knowledge resources with respect
to the given query, and used automated reasoners of ASP to answer queries considering
these relevant parts [4]. To address the third challenge, we have developed an algorithm
to generate an explanation for a given answer, with respect to the query and the rel-
evant parts of the rule layer and the knowledge resources [10,4]. The overall system
architecture for B IO Q UERY-ASP is presented in Figure 1.
    We have shown the applicability of B IO Q UERY-ASP to answer queries over large
biomedical knowledge resources about genes, drugs and diseases, such as P HARM GKB,2
D RUG BANK,3 B IO G RID,4 CTD,5 and S IDER,6 using efficient solvers of ASP [4]. For
queries that are not concerned about similarity/diversity of genes/drugs, we have used
the ASP solver CLASP [7]. For similarity/diversity queries, we have utilized the on-
line methods of [2] for finding similar/diverse solutions, and thus used the ASP solver
CLASP - NK , a variant of CLASP .


2     Demonstration of B IO Q UERY-ASP by Examples
Let us demonstrate the use of B IO Q UERY-ASP with four examples: the queries Q1,
Q2, Q3 and Q4 presented in the introduction.

2.1   Representing a Query in B IO Q UERY-CNL
B IO Q UERY-ASP allows users to construct queries in the grammar of B IO Q UERY-CNL
(an extended version of the grammar introduced in our earlier work [5]), by providing
 1
   http://krr.sabanciuniv.edu/projects/BioQuery-ASP/
 2
   http://www.pharmgkb.org/
 3
   http://www.drugbank.ca/
 4
   http://thebiogrid.org/
 5
   http://ctd.mdibl.org/
 6
   http://sideeffects.embl.de/
                       B IO Q UERY-ASP: Querying Biomedical Ontologies using ASP            3

                              User Interface                         Databases/Ontologies


                       Query in B IO Q UERY-CNL


                            Query in ASP                            Rule Layer in ASP



                                                        Query Answering


                                               Relevant Part of the ASP program


                                                         ASP Solver


                                                            Answer



                                                      Explanation Generation


                                                 Shortest Explanation in ASP


                                               Explanation in B IO Q UERY-CNL



                        Fig. 1. System overview of B IO Q UERY-ASP.

them two options: by showing them some templates so that they can choose one from
among them (as shown in Figure 2 for the query Q1); or by guiding them to construct
the query by showing the possibilities with an auto-completion feature (as shown in
Figure 3 for the query Q1).


2.2    Transforming the Query in B IO Q UERY-CNL to ASP

Once a query is constructed in B IO Q UERY-CNL, we transform the query into ASP by
an extension of the algorithm introduced in our earlier work [5]. For instance, the query
Q1 is translated into the following ASP program:

what_be_genes(GN1) :- condition1(GN1), condition2(GN1).
condition1(GN) :- drug_gene("Epinephrine",GN).
condition2(GN) :- gene_gene(GN,"DLG4").
answer_exists :- what_be_genes(GN1).

   where condition1 and condition2 are invented relations, and gene gene and
drug gene are defined in the rule layer as follows:

drug_gene(D,G) :- drug_gene_pharmgkb(D,G).
drug_gene(D,G) :- drug_gene_ctd(D,G).

gene_gene(GN1,GN2) :- gene_gene_biogrid(GN1,GN2).
gene_gene(GN2,GN1) :- gene_gene(GN1,GN2).

      Similarly, the query Q2 is translated into the following ASP program:
4       Erdogan, Oztok, and Erdem




     Fig. 2. In B IO Q UERY-ASP a query can be constructed by making use of a template.

what_be_genes(GN1) :- not notcommon(GN1),
   gene_name(GN1), notcommon_exists.
notcommon(GN1) :- not drug_gene(D2,GN1),
   condition1(D2), gene_name(GN1).
notcommon_exists :- notcommon(X).
condition1(D) :- drug_category(D,"Hmg-coa reductase inhibitors").
answer_exists :- what_be_genes(GN).
    The query Q3 is translated into the following ASP program:
what_be_genes(GN) :- condition1(GN).
condition1(GN) :- gene_reachable_from(GN,L).
start_gene("ADRB1").
max_chain_length(3).
answer_exists :- what_be_genes(GN).
where gene reachable from is defined in the rule layer as follows:
                      B IO Q UERY-ASP: Querying Biomedical Ontologies using ASP         5




Fig. 3. In B IO Q UERY-ASP a query can be constructed by making use of the auto-completion
feature.

gene_reachable_from(X,1) :- gene_gene(X,Y), start_gene(Y).
gene_reachable_from(X,N+1) :- gene_gene(X,Z),
   gene_reachable_from(Z,N), 0 < N, N < L,
   max_chain_length(L).

    Note that unlike the rules for gene gene and drug gene that integrate knowledge
resources, the rules for gene reachable from define an auxiliary concept to be used
for deep reasoning.
    The query Q4 is translated into the following ASP program:

1{what_be_genes(GN) : condition1(GN)}1.
condition1(GN) :- drug_gene("Epinephrine", GN).
answer_exists :- what_be_genes(GN).
6       Erdogan, Oztok, and Erdem

2.3   Extracting Information from the Knowledge Resources using ASP
Some of the ASP solvers, such as DLVHEX [3], provide constructs to import external
theories that may be in different formats (e.g., ontologies in RDF(S)/OWL). For in-
stance, consider as an external theory a Drug Ontology described in RDF. All triples
from this theory can be exported using the external predicate &rdf:
triple_drug(X,Y,Z) :- &rdf["URI for Drug Ontology"](X,Y,Z).

Then the names of drugs can be extracted by DLVHEX using the rule:
drug_name(A) :- triple_drug(_,"drugproperties:name",A).

    Similarly, gene-gene interactions could be extracted from a Gene Ontology by DLVHEX
using the rules
gene_gene(G1,G2) :- triple_gene(X,"geneproperties:name",G1),
   triple_gene(X,"geneproperties:related_genes",B),
   triple_gene(B,Z,Y), Z!="rdf:type",
   triple_gene(Y,"geneproperties:name",G2).

    Some knowledge resources are provided as relational databases, or more often as a
set of triples (probably extracted from ontologies in RDF). In such cases, we introduce
special algorithms to transform the relations into ASP.

2.4   Answering the Queries using ASP
At this stage, the given biomedical query Q, the rule layer, and the information extracted
from the knowledge resources are all in ASP. Let us denote by ⇧ the union of the
rule layer and the information extracted from the knowledge resources. To be able to
answer the given query Q efficiently, B IO Q UERY-ASP extracts the relevant part of ⇧
with respect to Q [4], and then computes a model (called “an answer set” [8]) for the
relevant part (if exists) using a state-of-the-art ASP system, such as CLASP. After that,
B IO Q UERY-ASP extracts the answers to Q from the answer set, and presents them as
a list. For instance, the answers to the query Q3 is shown in Figure 4.
     For queries about similarity/diversity of genes, B IO Q UERY-ASP uses the answer
set solver CLASP - NK [2], an extension of CLASP that can compute similar/diverse an-
swer sets for an ASP program with respect to a given distance measure. The idea be-
hind CLASP - NK is to define the distance measure (as a C++ program) and modify the
search algorithm of CLASP accordingly in the style of a branch-and-bound algorithm.
B IO Q UERY-ASP considers the semantic and functional similarity of genes defined over
the gene ontology [11]; this measure can be computed by the software GOS EM S IM.

2.5   Generating Explanations for the Answers
Once an answer to a query (that does not involve aggregates as in Q3 and Q4) is com-
puted, B IO Q UERY-ASP can generate an explanation for it [4,10]. For instance, an ex-
planation for the answer “ADRB1” to the query Q1 is shown in Figure 5. If an explana-
tion cannot be found then related links to the knowledge resources are provided.
                       B IO Q UERY-ASP: Querying Biomedical Ontologies using ASP               7




                 Fig. 4. B IO Q UERY-ASP presents answers to a query as a list.

3   Conclusion
We have described the software system B IO Q UERY-ASP that finds answers and gen-
erates explanations to complex biomedical queries over the available knowledge re-
sources, such as, P HARM GKB, D RUG BANK, CTD, S IDER, B IO G RID, using the com-
putational methods/tools of Answer Set Programming. These complex biomedical queries
require appropriate integration of relevant knowledge from different knowledge re-
sources; auxiliary definitions, such as, chains of drug-drug interactions, cliques of genes
based on gene-gene relations, or similarity/diversity of genes/drugs; and further deep
reasoning, like finding similar/diverse genes/drugs. No existing biomedical query an-
swering systems (e.g., web services built over the available knowledge resources, that
answer queries by means of keyword search) can directly answer such queries, or can
generate explanations for answers. In that sense, B IO Q UERY-ASP is a novel biomedi-
cal query answering system that can be useful for experts.

Acknowledgments.
We are grateful to Yelda Erdem (Sanovel Pharmaceuticals) for her suggestions about
the biomedical queries related to drug discovery. This work has been supported by
TUBITAK Grant 108E229.

References
 1. Bodenreider, O., Coban, Z.H., Doganay, M.C., Erdem, E.: A preliminary report on answer-
    ing complex queries related to drug discovery using answer set programming. In: Proc. of
    ALPSWS (2008)
 2. Eiter, T., Erdem, E., Erdogan, H., Fink, M.: Finding similar or diverse solutions in answer
    set programming. In: Proc. of ICLP. pp. 342–356 (2009)
 3. Eiter, T., G.Ianni, R.Schindlauer, H.Tompits: Effective integration of declarative rules with
    external evaluations for Semantic-Web reasoning. In: Proc. of ESWC (2006)
8        Erdogan, Oztok, and Erdem

 4. Erdem, E., Erdem, Y., Erdogan, H., Oztok, U.: Finding answers and generating explanations
    for complex biomedical queries. In: Proc. of AAAI (2011)
 5. Erdem, E., Yeniterzi, R.: Transforming controlled natural language biomedical queries into
    answer set programs. In: Proc. of the Workshop on BioNLP. pp. 117–124 (2009)
 6. Erdogan, H., Oztok, U., Erdem, Y., Erdem, E.: Querying biomedical ontologies in natural
    language using answer set programming. In: Proc. of SWAT4LS (2010)
 7. Gebser, M., Kaufmann, B., Neumann, A., Schaub, T.: clasp: A Conflict-Driven Answer Set
    Solver. In: Proc. of LPNMR. pp. 260–265 (2007)
 8. Gelfond, M., Lifschitz, V.: Classical negation in logic programs and disjunctive databases.
    New Generation Computing 9, 365–385 (1991)
 9. Lifschitz, V.: What is answer set programming? In: Proc. of AAAI (2008)
10. Oztok, U., Erdem, E.: Generating explanations for complex biomedical queries. In: Proc. of
    AAAI (2011)
11. Wang, J.Z., Du, Z., Payattakool, R., Yu, S.P., Chen, C.F.: A new method to measure the
    semantic similarity of go terms. Bioinformatics 23, 1274–1281 (2007)




    Fig. 5. B IO Q UERY-ASP can generate an explanation for an answer to the given query.