<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A logical approach to working with biological databases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicos Angelopoulos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Surgery and Cancer, Imperial College</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <history>
        <date date-type="accepted">
          <day>5</day>
          <month>6</month>
          <year>2015</year>
        </date>
      </history>
      <abstract>
        <p>It has been argued before that Prolog is a strong candidate for research and code development in bioinformatics and computational biology. This position has been based on both the intrinsic strengths of Prolog and recent advances in its technologies. Here we strengthen the case for the deployment and penetration of Prolog into bioinformatics, by introducing bio db, a comprehensive and extensible system for working with biological data. We focus on databases that translate between biological products and product-to-product interactions, the latter of which can be visualised as graphs. This library allows easy access to high quality data in two formats: as Prolog fact les and as SQLite databases. On-demand downloading of prepacked data les in these two formats is supported in all operating system architectures as well as reconstruction from latest data les from the curated databases. The methods used to deliver the data are transparent to the user and the data are delivered in he familiar format of Prolog facts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Prolog's traditional playground is that of knowledge representation and AI
applications on crisp, logical inference and search. In addition to being a research tool
in these areas, Prolog implementations have been developing to full edged general
purpose programming environments. These developments have start shaping a role
for logic programming in a variety of new areas.</p>
      <p>
        Bioinformatics has been been the meeting point of a number of in uences since
its emergence as a eld of study. Being on the intersection of biology, statistics and
computing, it has meant that a multitude of languages, systems and paradigms
has been developed and utilised for bioinformatics research. One of the strongest
contestants in this eld comes from the statistics community in the shape of the
R
        <xref ref-type="bibr" rid="ref15">(R Core Team, 2014)</xref>
        language and its Bioconductor
        <xref ref-type="bibr" rid="ref9">(Gentleman et al., 2004)</xref>
        bioinformatics suite. The strength of these statistical tools is on providing a versatile
platform that can incorporate a menagerie of paradigms and programming styles.
      </p>
      <p>
        Bridges between Prolog systems and R exist in two forms: (a) running the R
executable session and communicating with it via the standard i/o streams, and
(b) connecting to an R shared library and exchange data and invoke functions via
a C language based interface. The former approach is suitable for working with R
code that depends on the executable's running environment. An example of a session
based approach is r session
        <xref ref-type="bibr" rid="ref2 ref3 ref4">(Angelopoulos, 2013)</xref>
        . An example of a shared library
approach is Real,
        <xref ref-type="bibr" rid="ref2 ref3 ref4">(Angelopoulos et al., 2013)</xref>
        , which is suitable for communicating
large volumes of data between Prolog and R.
      </p>
      <p>
        Using an interface to R would be one way to access biological databases via
packages such as org.Hs.eg.db
        <xref ref-type="bibr" rid="ref5">(Carlson, 2014)</xref>
        . However, this approach would increase
reliance to R and create a further layer of complications. Here, we take a logical
approach to incorporating biological knowledge. With the advances in modern Prolog
systems in database integration
        <xref ref-type="bibr" rid="ref22 ref4">(Canisius et al., 2013; Wielemaker, 2014)</xref>
        and
indexing technologies
        <xref ref-type="bibr" rid="ref12 ref15 ref16 ref3 ref4">(Santos Costa and Vaz, 2013; Morales and Hermenegildo, 2014)</xref>
        working with big data within Prolog is set to become an important application area
for Prolog.
      </p>
      <p>
        In this paper we describe the capabilities and design structure of an extensible
library for working with and managing biological databases. Distinctive features
of the package include: on-demand downloading of prepacked databases, ability to
download and reconstruct databases from primary sources, single entry interface
for accessing databases in 2 underlying serving mechanisms. Our library focuses
on Homo sapiens databases and uses high-quality curated databases. Furthermore,
it works on 2 Prolog systems, SWI-Prolog
        <xref ref-type="bibr" rid="ref24">(Wielemaker et al., 2012)</xref>
        and YAP
Prolog
        <xref ref-type="bibr" rid="ref6">(Costa et al., 2012)</xref>
        . Although our current implementation only supports
SQLite databases due to their zero-con guration approach, it can be easily extended
to other relational database systems. The intuitiveness of bio db along with its
relational design principles make it a natural way for handling biological databases
in logic programming.
      </p>
      <p>There are alternative ways to view this kind of data which depend on more
evolved technologies Mungall (2009); Vassiliadis et al. (2009). The strengths of
our approach in contrast are its intuitiveness, simplicity and the closeness of the
produced data to the way the data are stored in the source databases.</p>
      <p>The remainder of this paper is structured as follows. Section 2 presents the
datasets readily available and the main mechanisms for using them. Section 3
describes the facilities for building the available datasets ab initio and how to
incorporate new datasets. Section 4 shows some experimental results and example usage
regarding the available datasets. Finally, Section 5 holds the concluding remarks.</p>
    </sec>
    <sec id="sec-2">
      <title>2 A logical approach to big biological datasets</title>
      <p>Data from biological experiments and data codifying biological knowledge have
seen a sharp increase in the last decades due to the ever increasing number of high
throughput techniques and the explosion in the number of researchers working on
these areas. Here we will concentrate on two main categories of databases although
the methodologies employed can be readily applied to any database, biological or
otherwise.</p>
      <p>The rst category of databases we consider is that of mapping biological products</p>
      <sec id="sec-2-1">
        <title>Pairwise maps</title>
      </sec>
      <sec id="sec-2-2">
        <title>Database</title>
      </sec>
      <sec id="sec-2-3">
        <title>HGNC</title>
      </sec>
      <sec id="sec-2-4">
        <title>NCBI/entrez</title>
      </sec>
      <sec id="sec-2-5">
        <title>Uniprot GO</title>
      </sec>
      <sec id="sec-2-6">
        <title>Abbv.</title>
      </sec>
      <sec id="sec-2-7">
        <title>Description</title>
        <p>hgnc
entz
unip
gont</p>
      </sec>
      <sec id="sec-2-8">
        <title>HUGO Gene Nomenclature Committee</title>
      </sec>
      <sec id="sec-2-9">
        <title>Nat. Center for Biot. Inf.</title>
      </sec>
      <sec id="sec-2-10">
        <title>Universal Protein Resource</title>
      </sec>
      <sec id="sec-2-11">
        <title>Gene Ontology</title>
      </sec>
      <sec id="sec-2-12">
        <title>Interactions database</title>
      </sec>
      <sec id="sec-2-13">
        <title>String string protein-protein interactions</title>
        <p>
          and nomenclatures. A prime example in this area is the mapping genes to a unique
gene names. Due to the decentralised way gene names are assigned, particularly in
the early years of biological research before standardisation e orts took place, each
gene is usually known by a number of di erent names, this is an example of an
many-to-one mapping, from synonyms to unique gene name. Mapping proteins to
genes is also many-to-one, but in this case because a single gene can be transcribed
to a number of proteins. Many-to-many maps can be used to de ne membership
to multiple sets. Maps are conveniently and e ciently implemented as Prolog facts
of arity 2. The e ciency derives from rst argument indexing
          <xref ref-type="bibr" rid="ref1 ref21">(Warren, 1983;
AtKaci, 1991)</xref>
          . When bi-directional translation is required, fact databases that reverse
the order of the arguments are constructed.
        </p>
        <p>
          A summary of the databases supported are shown in Table 1. Here we give a brief
description of the databases included in bio db. HGNC
          <xref ref-type="bibr" rid="ref11">(Gray et al., 2015)</xref>
          , is our
primary gene naming data source. It is a curated and well cross referenced resource
that is held at EBI. Each gene is assigned a unique incremental integer identi er
and each current identi er is mapped to a unique symbol which is the short name
for each gene. Example of symbols are: LMTK3, EGFR and BRC1. We will use
hgnc to refer to both the database and the unique integer identi er eld of the
database. Symbols are shorted to symb. As can be seen in Figure 1 HGNC database
entries play a central role in bio db. The HGNC identi er connects to protein and
gene resources, and Symbol connects to gene ontology terms and other naming
conventions (previously-known-as and synonyms). The data populating many of
the relation in Figure 1 come from the HGNC database which contains curated and
submitted data from the other databases.
        </p>
        <p>
          The National Center for Biotechnology Information (NCBI) makes available a
large number of datasets
          <xref ref-type="bibr" rid="ref14">(NCBI Resource Coordinators, 2013)</xref>
          . Here we only
incorporate their unique gene identi er. This is often referred to as gene id and was for
many years the main way to uniquely refer to genes. Including this ensures that a
number of tools and services can be used via translation of any of the other protein
and gene elds to gene ids (here shortened to entz which is a reference to NCBI's
Entrez on-line tool that uses gene identi ers as a gateway to its services).
        </p>
        <sec id="sec-2-13-1">
          <title>GO●NaMe</title>
          <p>SYN O●nym
● GONTerm</p>
        </sec>
        <sec id="sec-2-13-2">
          <title>E● NTreZ</title>
          <p>● SYMBol
●
PREVious symbol</p>
          <p>HGNC
Ensembl
NCBI/Entrez
UNIPROT</p>
          <p>GO
● HGNC</p>
          <p>● ENSGene
● UNIProtein
● ENSProtein</p>
          <p>
            Uniprot
            <xref ref-type="bibr" rid="ref19">(The UniProt Consortium, 2015)</xref>
            is a curated and well established
database of proteins and related information. The relation between proteins and genes
is a many to one correspondence. Each protein is transcribed from a single gene but
each gene can be transcribed to more than one protein. Many biological databases
record information at the protein level as this is the level at which physical
interactions take place. Uniprot contains two parts, a curated resource where each protein
is known to be transcribed from a speci c gene and a non-curated part where not all
information is complete. As our approach is gene-centric we incorporate all those
proteins from both curated and non-curated parts that have an association to a
gene.
          </p>
          <p>
            Gene ontology (GO)
            <xref ref-type="bibr" rid="ref18">(The Gene Ontology Consortium, 2000)</xref>
            provides a
controlled vocabulary to describe biological knowledge. It has 3 main sections:
biological process, molecular function and cellular component. The basic representation
unit in GO are its GO terms. They are connected in a web of referential relations.
Each term, in addition to its relative position to other terms, contains a number
of genes which are involved in the process characterised by the term. Here we
concentrate on this membership, which de nes a many to many relation. Each term
contains a number of genes and each gene can potential belong to a number of
terms.
          </p>
          <p>Database
ense
gont
hgnc
ncbi
unip
5e+06
4e+06
3e+06
n
o
ilt
a
u
p
o
P2e+06
1e+06
0e+00</p>
          <p>Database
string
ensg ensp entz gont Fhigenlcd prev symb syno unip
gene</p>
          <p>protein
Edge</p>
          <p>
            String
            <xref ref-type="bibr" rid="ref17">(Szklarczyk et al., 2015)</xref>
            is a comprehensive protein-protein interactions
database that incorporates a large number of interactions present in one of a large
number of species. Here we concentrate on the 4850628 interactions in String that
pertain to human proteins (Figure 2). When mapped to symbols these form 1936162
interactions. This database collates information on each protein-protein interaction
from a variety of sources such as experimental and algorithmically predicted along
with publication information for papers that refer to speci c links. In addition, an
overall integer score in (0; 1000) is provided. The closer to 1000 this score is, the
more con dent the curators are that this is a real physical interaction between two
proteins. Bio db models interactions of proteins and genes as weighed graph edges
using the overall score as the weight for each edge.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Data management</title>
      <p>In bio db the native representation of the biological knowledge described above is
as Prolog facts. The library presents those facts to the programmer as a unifying
level of abstraction. Beneath this, there are two mechanisms via which the data are
delivered to the predicates: (a) Prolog fact les and (b) SQLite databases.</p>
      <p>3.1 Predicate naming
An example of a map predicate is</p>
      <p>map_hgnc_hgnc_symb( Hgnc, Symb ).</p>
      <p>The predicate translates between HGNC identi ers and HGNC symbols. The
predicate name consists of 4 components, the rst of which determines the type of data,
which in this case is a map. The second component, hgnc corresponds to the source
database and the third component, also hgnc, identi es the rst argument of the
map to be the unique identi er eld for that database (here a positive integer
starting at 1 and with no gaps. The last part of the predicate name corresponds to the
second argument, which here is the unique Symbol assigned to a gene by HGNC.
In the current version of bio db, all tokens in map predicate names are 4
characters long. The abbreviations for the database component are shown in the second
column of Table 1 whereas the abbreviations for the database elds are the
capitalised parts of vertice's names of Figure 1. The following interaction shows how
the predicate can be used to nd the symbol of a gene given its HGNC identi er.
?- map_hgnc_hgnc_symb( 19295, Symb ).</p>
      <p>Symb = 'LMTK3'.</p>
      <p>
        3.2 Data serving methods
There are two mechanisms via which the library's data predicates can be stored
and served. One is as plain Prolog fact les, and the other is via SQLite databases
as implemented in the proSQLite Prolog library
        <xref ref-type="bibr" rid="ref4">(Canisius et al., 2013)</xref>
        . The former
requires in-memory loading for serving, thus it requires more memory and time
for loading irrespective of the fact that a particular interaction with the predicate
may not require the whole data set. The bene ts of Prolog facts is that there are
extremely fast particularly when requests for data instantiate the rst argument of
their call. Memory itself is in our experience not a particular limitation as computer
memory is readily available in bioinformatic settings and SWI-Prolog along with
most modern Prolog systems are well tuned to dealing with such data.
      </p>
      <p>The time taken when loading everything to memory is a more severe limitation
particularly in development settings where the data needs to be loaded a number
of times in short space of time. It might thus be desirable to use SQLite during
development and testing and Prolog for when big time consuming searches are
required. One additional considerations is that the fact that the Prolog facts are
stored in plain text les which can be helpful when debugging. Switching between
the mechanisms for serving the les is done via a simple call to a predicate,
bio_db_interface( ?Interface ).</p>
      <p>All data predicates loaded after such a call will be following the interface method
dictated by Interface. The following example shows how the interface is switched
from the default prolog to prosqlite.</p>
      <p>?- debug( bio_db ).
true.
?- bio_db_interface( Iface ).</p>
      <p>Iface = prolog.
?- map_hgnc_symb_hgnc( 'LMTK3', Hgnc ).
% Loading prolog db: .../maps/hgnc/map_hgnc_symb_hgnc.pl
Hgnc = 19295.
?- bio_db_interface( prosqlite ).
% Setting bio_db_interface prolog_flag, to: prosqlite
true.
?- map_hgnc_prev_symb( Prev, Symb ).
% Loading prosqlite db: .../map_hgnc_prev_symb.sqlite
Prev = 'A1BG-AS',
Symb = 'A1BG-AS1';
Prev = 'A1BGAS',
Symb = 'A1BG-AS1' ;
Prev = 'A1BG-AS',
Symb = 'A1BG-AS1'...</p>
      <p>3.3 Downloading datasets
The library comes with placeholder code for each supported database table. On
rst call the relevant data le is downloaded from the web-server and consulted
onthe- y after the place-holding code is removed. In each new interactive invocation,
hot-swapping and then consulting of the relevant and data le will make the data
available as facts. The facts are served transparently to the user by the two di erent
technologies detailed above.</p>
      <p>The downloading of non-installed datasets occurs automatically and
transparently to the user. This is triggered by a call to the corresponding data predicate
and the actual call is served within the same interaction as demonstrated below
?- debug( bio_db ).
?- map_hgnc_symb_hgnc( 'LMTK3', Hgnc ).
% prolog DB:table hgnc:map_hgnc_symb_hgnc/2 is not installed,
do you want to download (Y/n) ?
% Trying to get: url_file(.../map_hgnc_symb_hgnc.pl,</p>
      <p>.../hgnc/map_hgnc_symb_hgnc.pl)
% Loading prolog db: .../hgnc/map_hgnc_symb_hgnc.pl</p>
      <p>Hgnc = 19295.</p>
      <p>The data les are stored in a directory organised in maps and graphs re ecting the
two main type of information supported. Within these two sub directories data are
organised as per database of origin. The root of this lestore organisation defaults
to the data directory of the library or can be set via an environment variable or by
using the set prolog flag/2 predicate.</p>
      <p>The default location for storing data les is at the level of an SWI-Prolog pack</p>
      <sec id="sec-3-1">
        <title>GO term</title>
      </sec>
      <sec id="sec-3-2">
        <title>GO name</title>
        <p>GO:0003674
GO:0004674
GO:0004713
GO:0005524
GO:0005575
GO:0006468
GO:0010923
GO:0016021
GO:0018108
molecular function
protein serine/threonine kinase activity
protein tyrosine kinase activity
ATP binding
cellular component
protein phosphorylation
negative regulation of phosphatase activity
integral component of membrane
peptidyl-tyrosine phosphorylation
population
located at pack(bio db repo). Alternatively to loading each le piecemeal, users
can download the data with a single download as a pack via</p>
        <p>?- pack_instal( bio_db_repo ).</p>
        <p>Each dataset contains a set of house keeping information that show among other
things the date the set was downloaded and built.
map_hgnc_hgnc_symb_info(date, date(2015, 4, 28)).
map_hgnc_hgnc_symb_info(map_type, map_type(1, 1)).
map_hgnc_hgnc_symb_info(unique_lengths, c(43592, 43592, 43592)).
map_hgnc_hgnc_symb_info(header, row('HGNC ID', 'Approved Symbol')
3.4 Reconstruction and new datasets
The Prolog scripts used to download and convert the data are given in the library
source code. The overall work- ow normally is as follows: (a) download a remote
le to a local date-stamped le, (b) read the downloaded le, (c) produce bio db
outputs, and (d) move or link les from downloads directory to loadables directory.
These scripts can be used to reconstruct the datasets in di erent time points to
those provided by bio db repo, thus a ording more autonomy to the users.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Examples</title>
      <p>
        Gene ontology terms are routinely used in the analysis of biological data,
particularly functional analysis of target lists. For instance from a list of genes di erentially
expressed in an set of microarray experiments, GO term over-representation seeks
to identify GO terms in which members of the di erential list are present in numbers
more than expected by random selection
        <xref ref-type="bibr" rid="ref8">(Falcon and Gentleman, 2007)</xref>
        .
      </p>
      <p>
        Here we will look into the GO terms of the LMTK3 tyrosine kinase
        <xref ref-type="bibr" rid="ref10">(Giamas
et al., 2011)</xref>
        . The following code shows how to produce the GO terms, their names
and their populations, which are shown in Table 2.
      </p>
      <p>DCUN1D3 ●</p>
      <p>PTPRC●
GATA3 ●</p>
      <p>CXC●L10
BAX ●</p>
      <p>BAK1●
● MYC
● TRIM13
● TIGAR
CYP11A1●</p>
      <p>● GPX1
C●CL7</p>
      <p>E● RCC6
CDS1●
C●CL2
TP73●</p>
      <p>TP63●
●
PRKAA1</p>
      <p>●</p>
      <p>CHEK2
SOD 2●
●</p>
      <p>PML
S●CG2
● MEN1</p>
      <p>L●IG4
FANC●D2 ● PRKDC</p>
      <p>● XRCC4
●</p>
      <p>TP53</p>
      <p>● XRCC2
●BRCA2</p>
      <p>● BCL2
● APOBEC1</p>
      <p>As a second example we combine GO terms with String interactions. For a given
GO term we can construct a weighted graph re ecting the interactions from the
String database. This is build by rst mapping an input GO term to the list of
symbols it contains and then collecting all edges amongst these symbols that have
a weight that exceeds that of a provided limit. The graph in Figure 3 shows such a
graph for term GO:0010332 for a minimum weight of 500.
go_term_graph(GoTerm,Min,Graph):findall( Symb, map_gont_gont_symb(Gont,Symb), Symbs ),
findall( Symb1-Symb2:W, (
member(Symb1,Symbs),
member(Symb2,Symbs),
edge_string_hs_symb(Symb1,Symb2,W),
Lim &lt; W
),</p>
      <p>Graph ).
?- go_term_graph( 'GO:0010332', 500, W ).</p>
      <p>4.1 Availability
The software described in this paper is available as an easy to install library for the
SWI-Prolog system. Installation can be done within the system with a single call
?- pack_install( bio_db ).</p>
      <p>This will only install the library source code but not the datasets. These will be
downloaded on demand and transparently to the user upon the rst call to a
predicate.</p>
      <p>All but one dataset, which have been excluded due to its size, can be also uploaded
proactively with a single call,
?- pack_install( bio_db_repo ).</p>
    </sec>
    <sec id="sec-5">
      <title>5 Conclusions</title>
      <p>We have argued that Prolog is a powerful language for building bioinformatics
pipelines and that its role can be of crucial importance as biological data is
increasingly needed to be viewed as knowledge both in the contexts of analysis and
that of statistical inference or machine learning. Prolog's knowledge representation
credentials are highly relevant in this context.</p>
      <p>
        We presented a library that is easily installed from within SWI-Prolog
        <xref ref-type="bibr" rid="ref23">(Wielemaker et al., 2008)</xref>
        . This library presents a convenient and intuitive way for working
with biological data. All available data have been sourced from high quality and
wherever possible curated databases. The emphasis of our approach is to provide
easy of use, via automatically downloading datasets and using code hot-swapping,
as well as exibility by de-coupling data from code and allowing transparent ways
of only downloading the necessary datasets.
      </p>
      <p>
        There are alternative ways to view this kind of data which depend on more
evolved technologies Mungall (2009); Vassiliadis et al. (2009). The strengths of
our approach in contrast are its intuitiveness, simplicity and the closeness of the
produced data to the way the data are stored in the source databases. Current
work on the library includes extending to other databases and particularly the
Reactome database
        <xref ref-type="bibr" rid="ref7">(Croft et al., 2014)</xref>
        , as well as to other database interfaces such
as ODBC. Prolog is well suited for research and code development in the areas
of bioinformatics and computational biology. The code presented here, can play a
strong role in promoting Prolog in these areas.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Hassan</surname>
            <given-names>A</given-names>
          </string-name>
          
          <string-name>
            <surname>t-Kaci</surname>
          </string-name>
          .
          <article-title>The WAM: A (real) tutorial. In Warren's Abstract Machine: A Tutorial Reconstruction</article-title>
          . MIT Press,
          <year>1991</year>
          .
          <source>Also Technical report 5</source>
          , DEC Paris Research Laboratory,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Nicos</given-names>
            <surname>Angelopoulos</surname>
          </string-name>
          . R session,
          <year>2013</year>
          . URL http://stoics.org.uk/~nicos/ sware/r_session/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Nicos</given-names>
            <surname>Angelopoulos</surname>
          </string-name>
          , Vitor Santos Costa, Joao Azevedo, Jan Wielemaker, Rui Camacho, and
          <string-name>
            <given-names>Lodewyk</given-names>
            <surname>Wessels</surname>
          </string-name>
          .
          <article-title>Integrative functional statistics in logic programming</article-title>
          .
          <source>In Proc. of Practical Aspects of Declarative Languages</source>
          , volume
          <volume>7752</volume>
          <source>of LNCS</source>
          , pages
          <volume>190</volume>
          {
          <fpage>205</fpage>
          , Rome, Italy, Jan.
          <year>2013</year>
          . URL http://stoics.org.uk/ ~nicos/sware/real/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Sander</given-names>
            <surname>Canisius</surname>
          </string-name>
          , Nicos Angelopoulos, and Lodewyk Wessels.
          <article-title>ProSQLite: Prolog le based databases via an SQLite interface</article-title>
          .
          <source>In Proc. of Practical Aspects of Declarative Languages</source>
          , volume
          <volume>7752</volume>
          <source>of LNCS</source>
          , pages
          <volume>222</volume>
          {
          <fpage>227</fpage>
          , Rome, Italy, Jan.
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Marc</given-names>
            <surname>Carlson</surname>
          </string-name>
          . org.
          <source>Hs.eg.db: Genome wide annotation for Human</source>
          ,
          <year>2014</year>
          .
          <source>R package version 2.14.0.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>V tor Santos</surname>
            <given-names>Costa</given-names>
          </string-name>
          ,
          <article-title>Ricardo Rocha, and Lu s Damas. The yap prolog system</article-title>
          .
          <source>Theory and Practice of Logic Programming</source>
          ,
          <volume>12</volume>
          :5{
          <issue>34</issue>
          ,
          <fpage>1</fpage>
          <lpage>2012</lpage>
          . ISSN 1475-3081.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>David</given-names>
            <surname>Croft</surname>
          </string-name>
          , Antonio Fabregat Mundo, Robin Haw, Marija Milacic, Joel Weiser,
          <string-name>
            <surname>Guanming Wu</surname>
          </string-name>
          , Michael Caudy, Phani Garapati, Marc Gillespie,
          <string-name>
            <surname>Maulik R. Kamdar</surname>
            , Bijay Jassal, Steven Jupe, Lisa Matthews, Bruce May, Stanislav Palatnik, Karen Rothfels, Veronica Shamovsky, Heeyeon Song,
            <given-names>Mark</given-names>
          </string-name>
          <string-name>
            <surname>Williams</surname>
          </string-name>
          , Ewan Birney, Henning Hermjakob, Lincoln Stein, and
          <string-name>
            <surname>Peter D'Eustachio.</surname>
          </string-name>
          <article-title>The reactome pathway knowledgebase</article-title>
          .
          <source>Nucleic Acids Research</source>
          ,
          <volume>42</volume>
          (
          <issue>D1</issue>
          ):
          <source>D472{D477</source>
          ,
          <year>2014</year>
          . .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Falcon</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Gentleman</surname>
          </string-name>
          .
          <article-title>Using GOstats to test gene lists for go term association</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>23</volume>
          (
          <issue>2</issue>
          ):
          <volume>257</volume>
          {
          <fpage>8</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Robert C.</given-names>
            <surname>Gentleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Vincent J.</given-names>
            <surname>Carey</surname>
          </string-name>
          ,
          <string-name>
            <surname>Douglas M. Bates</surname>
          </string-name>
          , and others. Bioconductor:
          <article-title>Open software development for computational biology and bioinformatics</article-title>
          .
          <source>Genome Biology</source>
          ,
          <volume>5</volume>
          :
          <fpage>R80</fpage>
          ,
          <year>2004</year>
          . URL http://genomebiology.com/
          <year>2004</year>
          /5/10/ R80.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Georgios</given-names>
            <surname>Giamas</surname>
          </string-name>
          , Aleksandra Filipovic, Jimmy Jacob, Walter Messier, Hua Zhang, Dongyun Yang, Wu Zhang, Belul Assefa Shifa, Andrew Photiou,
          <string-name>
            <surname>Cathy</surname>
            <given-names>TralauStewart</given-names>
          </string-name>
          , Leandro Castellano,
          <string-name>
            <surname>Andrew R Green</surname>
            ,
            <given-names>R Charles</given-names>
          </string-name>
          <string-name>
            <surname>Coombes</surname>
          </string-name>
          , Ian O Ellis, Simak Ali,
          <string-name>
            <surname>Heinz-Josef Lenz</surname>
            , and
            <given-names>Justin</given-names>
          </string-name>
          <string-name>
            <surname>Stebbing</surname>
          </string-name>
          .
          <article-title>Kinome screening for regulators of the estrogen receptor identi es lmtk3 as a new therapeutic target in breast cancer</article-title>
          .
          <source>Nat Med</source>
          ,
          <volume>17</volume>
          :
          <fpage>715</fpage>
          {
          <issue>719</issue>
          ,
          <fpage>6</fpage>
          <lpage>2011</lpage>
          . .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>K.A. Gray</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Yates</surname>
            ,
            <given-names>R.L.</given-names>
          </string-name>
          <string-name>
            <surname>Seal</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          <string-name>
            <surname>Wright</surname>
            , and
            <given-names>E.A.</given-names>
          </string-name>
          <string-name>
            <surname>Bruford</surname>
          </string-name>
          . Genenames.
          <article-title>org: the hgnc resources in 2015</article-title>
          .
          <source>Nucleic Acids Res</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>J.F.</given-names>
            <surname>Morales</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hermenegildo</surname>
          </string-name>
          .
          <article-title>Towards pre-indexed terms</article-title>
          .
          <source>In Workshop on Implementation of Constraint and Logic Programming Systems and Logic-based Methods in Programming Environments</source>
          <year>2014</year>
          , pages
          <fpage>79</fpage>
          {
          <fpage>92</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Chris</given-names>
            <surname>Mungall</surname>
          </string-name>
          .
          <article-title>Experiences using logic programming in bioinformatics</article-title>
          .
          <source>In Logic Programming</source>
          , pages
          <fpage>1</fpage>
          <lpage>{</lpage>
          21. Springer Berlin Heidelberg,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>NCBI Resource</surname>
          </string-name>
          <article-title>Coordinators</article-title>
          .
          <article-title>Database resources of the national center for biotechnology information</article-title>
          .
          <source>Nucleic Acids Research</source>
          ,
          <volume>41</volume>
          (Database issue):
          <source>D8{D20</source>
          ,
          <year>2013</year>
          . .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>R Core</given-names>
            <surname>Team. R:</surname>
          </string-name>
          <article-title>A Language and Environment for Statistical Computing</article-title>
          . R Foundation for Statistical Computing, Vienna, Austria,
          <year>2014</year>
          . URL http:// www.R-project.
          <source>org/.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <article-title>V tor Santos Costa and David Vaz</article-title>
          .
          <article-title>BigYAP: Exo-compilation meets udi</article-title>
          .
          <source>Theory and Practice of Logic Programming</source>
          ,
          <volume>13</volume>
          (
          <issue>4-5</issue>
          ):
          <volume>799</volume>
          {
          <fpage>813</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Damian</given-names>
            <surname>Szklarczyk</surname>
          </string-name>
          , Andrea Franceschini, Stefan Wyder, Kristo er Forslund, Davide Heller, Jaime Huerta-Cepas, Milan Simonovic, Alexander Roth, Alberto Santos,
          <string-name>
            <given-names>Kalliopi P.</given-names>
            <surname>Tsafou</surname>
          </string-name>
          , Michael Kuhn, Peer Bork,
          <string-name>
            <given-names>Lars J.</given-names>
            <surname>Jensen</surname>
          </string-name>
          , and
          <article-title>Christian von Mering. String v10: proteinprotein interaction networks, integrated over the tree of life</article-title>
          .
          <source>Nucleic Acids Research</source>
          ,
          <volume>43</volume>
          (
          <issue>D1</issue>
          ):
          <source>D447{D452</source>
          ,
          <year>2015</year>
          . .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>The</given-names>
            <surname>Gene Ontology Consortium</surname>
          </string-name>
          .
          <article-title>Gene ontology: tool for the uni cation of biology</article-title>
          .
          <source>Nat. Genet</source>
          .,
          <volume>25</volume>
          (
          <issue>1</issue>
          ):
          <volume>25</volume>
          {9,
          <string-name>
            <surname>May</surname>
          </string-name>
          <year>2000</year>
          . URL http://www.geneontology.org.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>The UniProt Consortium</surname>
          </string-name>
          .
          <article-title>Uniprot: a hub for protein information</article-title>
          .
          <source>Nucleic Acids Res</source>
          .,
          <source>pages D204{D212</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Vangelis</given-names>
            <surname>Vassiliadis</surname>
          </string-name>
          , Jan Wielemaker, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Mungall</surname>
          </string-name>
          .
          <article-title>Processing owl2 ontologies using thea: An application of logic programming</article-title>
          .
          <source>OWLED</source>
          ,
          <volume>529</volume>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>David H. D. Warren</surname>
          </string-name>
          .
          <article-title>An abstract prolog instruction set</article-title>
          .
          <source>Technical Report 309</source>
          , AI Center, SRI International, 333 Ravenswood Ave., Menlo Park, CA 94025,
          <year>Oct 1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Jan</given-names>
            <surname>Wielemaker. SWI-Prolog</surname>
          </string-name>
          <string-name>
            <surname>ODBC</surname>
          </string-name>
          interface,
          <year>2014</year>
          . URL http://www. swi-prolog.org/pldoc/package/odbc.html.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Jan</given-names>
            <surname>Wielemaker</surname>
          </string-name>
          , Zhisheng Huang, and Lourens van der Meij.
          <article-title>SWI-Prolog and the web</article-title>
          .
          <source>TPLP</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <volume>363</volume>
          {
          <fpage>392</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Jan</given-names>
            <surname>Wielemaker</surname>
          </string-name>
          , Tom Schrijvers,
          <string-name>
            <given-names>Markus</given-names>
            <surname>Triska</surname>
          </string-name>
          , and Torbjorn Lager.
          <source>SWI-Prolog. Theory and Practice of Logic Programming</source>
          ,
          <volume>12</volume>
          (
          <issue>1-2</issue>
          ):
          <volume>67</volume>
          {
          <fpage>96</fpage>
          ,
          <year>2012</year>
          . ISSN 1471-0684.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>