<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Advances in integrating statistical inference</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicos Angelopoulos</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samer Abdallah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgios Giamas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University College London Gower Street</institution>
          ,
          <addr-line>London WC1E 6BT</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Surgery and Cancer, Division of Cancer, Imperial College London, Hammersmith Hospital Campus</institution>
          ,
          <addr-line>Du Cane Road, London W12 ONN</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <fpage>9</fpage>
      <lpage>18</lpage>
      <abstract>
        <p>We present recent developments on the syntax of Real, a library for interfacing two Prolog systems to the statistical language R. We focus on the changes in Prolog syntax within SWI-Prolog th at accommodate greater syntactic integration, enhanced user experience and improved features for web-services. We recount the full syn tax and functionality of Real as well as presenting sister packages which include Prolog code interfacing a number of common and useful tasks that can be delegated to R. We argue that Real is a powerful extension to logic programming, providing access to a popular statistical system that has complementary strengths in areas such as machine learning, statistical inference and visualisation. Furthermore, Real has a central role to play in the uptake of computational biology and bioinformatics as application areas for research in logic programming.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>is straight forward with a basic grasp of R to call its functions on Prolog data.
However for users with no prior exposure to R there still might be a barrier. To
address this, and in order to increase general usability of the library a number
of sister packages have been developed. We highlight some of the predicates that
enable access to R code without any knowledge of R.</p>
      <p>
        Central application areas in the inception of Real and its recent advances, has
been the areas of bioinformatics and computational biology. The sister libraries
we describe here have evolved in addressing real world bioinformatics tasks in
the context of a variety of projects:
        <xref ref-type="bibr" rid="ref11 ref4 ref5 ref5">(Zhang et al., 2015; MacIntyre et al., 2015;
Stebbing et al., 2015)</xref>
        . The main thesis of this paper is that Prolog can play a
central role as a unifying platform in research in bioinformatics, taking advantage
of its strong grip in knowledge representation and reasoning and in combinations
with recent advances with Real and web programming
        <xref ref-type="bibr" rid="ref3 ref7 ref9">(Wielemaker et al., 2008;
Lager and Wielemaker, 2014)</xref>
        .
2
      </p>
      <p>Real
Here we first present the innovations of Real 1.4 before we summarise its overall
syntax and usage with particular focus on new features.</p>
      <sec id="sec-1-1">
        <title>2.1 Innovations</title>
        <p>In terms of syntax, Real faced three major clashes between Prolog notation and
syntax acceptable to R. Those were the use of ‘.’ in R identifiers, the use of
double quotes (‘ ” ’) to represent strings and the representation of terms with
0 arity ‘ f oo()’. In previous versions the library was able to bypass those by
employing a number of indirect techniques concentrating on keeping as faithful
as possible to the original syntax. Briefly,
– operator ‘..’ was used to construct arity 2 terms that were behind t he scenes
converted to a Prolog atom interpreted as an R identifier (Prolog term
my..variable was translated to R variable my.variable, my..variable →
my.variable).
– operator + on non numerical values was used to convert atoms and code
lists to strings (+f oo → ” f oo”)
– with the newly, at the time, introduced block operator ‘()’ it was pos sible to
parse f oo‘() ′ as f oo()</p>
        <p>
          With Real in mind, SWI-7
          <xref ref-type="bibr" rid="ref3 ref7">(Wielemaker, 2014)</xref>
          introduced syntax that
legalised all of the above constructs, as well as the implementation of lists as
primary data structures (as oppose to ./2 terms). Dots in atoms and the use of
double quotes are now controlled by global flags, the former’s defa ult being off
and the latter’s being on. Real has been adapted to utilise the new changes in
a backwards compatible manner. All of the following are now valid Real syntax
mapping to the corresponding R constructs, proviso of the appropriate global
flags been enabled,
– func.foo(a,b,c)
– write.csv( ”to file.csv”, x )
– foo()
        </p>
        <p>Under the bonnet, list representations were additionally generalised to
accommodate the new data type. Also in the C interface, Real 1.4 includes
improvements in that it can be employed within a web-service, thus allow ing the
R-server thread to be an arbitrary one. This is of particular interes t, as in itself
R is single threaded.</p>
        <p>A final innovation at the syntactic level has been the introduction of ‘NA’
values in the interface. In R, NA values stand for not available or unknown value
placeholders. Prolog does not internally support such values, but the interface
enables mapping of such values within arithmetic vectors and matrices to ’$NaN’.
When passing numeric data from Prolog to R in addition to $NaN, the empty
atom (‘’) is also translated to R ’s NA value.</p>
        <p>Taken together these innovations allow a tighter and smoother integration
of R code and enable Prolog programmers to tap in the wealth of statistical
functions implemented in R.
2.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>Communication with R</title>
        <p>The bulk of the communication with R is via a single predicate ← /2 which is
also defined as an infix operator. This is an alternative assignment operator in R.
Within Real it can be used to transfer data between R and Prolog, to apply, in
an in-line fashion, R functions to Prolog data as well as destructively assigning
values to R variables. Disambiguation clearly distinguishes the different modes,
which can be summarised by:
+Rexpr ← +Rexpr
−P lV ar ← +Rexpr
+Rexpr ← +P lData</p>
        <p>When the LHS of the operator is a uninstantiated variable, the second mode
is assumed, where the value of Rexpr is passed to P lV ar after it has been
evaluated in R. When the RHS is a c/n term or a list then the third mode is
applied and the data in the RHS is transferred to the LHS Rexpr (usually an R
variable).</p>
        <p>The following examples show how to: transfer Prolog data to R and back
(1), transfer Prolog data to R and get the result of applying a function to the
data in the new R variable (2) and demonstrating how to apply an R function
on Prolog data without the use of an explicit R variable (3).</p>
        <p>? − a ← [1, 2, 3], A ← a.</p>
        <p>A = [1, 2, 3].</p>
        <p>Indicator
r/1
r/2
r new/1
&lt;&lt;- /2
r call/2
r library/1
r start/0
r stop/0
r remove/1
r thread loop/0
r serve/0</p>
        <p>Operator Symbol Description
&lt;- ← evaluate R expression (no return value)
&lt;- ← main communication to R library
&lt;&lt;- և argument is a fresh ˚variable
&lt;&lt;- և r/2 but with error if R variable exists
&lt;C-++O ← ++ r/1,2 with options (O)
load R library in a hookable manner
start the connection to R
stop the connection to R
remove R variable
start an R thread server
serve all R expressions on queue thread</p>
        <p>Table 1. Librarys’ main predicates
? − a ← [1, 2, 3], M ean ← mean(a).</p>
        <p>M ean = 2.0.</p>
        <p>M ean = 2.0.</p>
        <p>? − M ean ← mean([1, 2, 3]).
(2)
(3)
2.3</p>
      </sec>
      <sec id="sec-1-3">
        <title>Real’s predicates</title>
        <p>Real 1.4 adopts the convention of a uniform prefix to all the library predicates.
The full list of Real ’s predicates along with the associated operators and brief
descriptions are shown in Table 1. New additions include a hookable locator for
R libraries, web server support, intuitive syntax for non-destruct ive assignment
and a generic predicate for mixing Prolog and R options and directing output
to graphic devices.</p>
        <p>With new predicate r library/1 the user can load the standard R libraries
in their local installation. In addition, the predicate can be directed to user
specified locations where local, possibly, changed sources of such libraries can be
loaded preferentially. The flexibility allows for (a) specific code to be loaded only
known to Real thus living the remainder of the R installation intact, and (b)
user code that can be made available and can work either with the distributed
version while having extra functionality when used with the altered sources.</p>
        <p>
          Real is inherently single threaded. To support the use of Real in
multithreaded applications, in particular in web servers built on SWI Prolog’s HTTP
libraries
          <xref ref-type="bibr" rid="ref9">(Wielemaker et al., 2008)</xref>
          , Real 1.4 allows a single designated Real server
thread to be started, which then takes over the task of executing or evaluating
        </p>
        <p>R commands or expressions. Then, when the ←/1 and ←/2 predicates are used
on any other thread, the requests are redirected to the Real server thread and
the results awaited. Communication is handled synchronously using SWI Prolog
queues.</p>
        <p>This system was implemented to support an application in the area of large
scale computational musicology, the Digital Music Laboratory, which is built on
SWI-Prolog’s semantic web server Cliopatria. Here, Real is used both for
general numerical computations and the generation of high-quality sc alable vector
graphics. In comparison with previous versions of the system which used
Matlab’s engine API to communicate with a separate Matlab process, the lower
overhead of communicating with Real ’s in-process embedded R yields much better
performance when numerous relatively small computations are required.</p>
        <p>As R supports destructive assignment, it can be the case that the programmer
might unwittingly overwrite variables already in the working space. To ease and
provide visual cues of the fact that a variable is fresh in a specific context, we
introduced operators և/2 and և/1 and predicate r new/1. The first ensures
that its first argument (an R variable) did not exist prior to assigning to it some
new values. The second removes its arguments from the R work-space and the
third fails if its argument is already a known R variable.</p>
        <p>Integral to the R language design and practice is the use of options that
control the details of function calls. These are = pairs of argument name to values,
which more often than not do not have to be present at invocation. When not
present, default values supplied by the function developers are used. Similarly
but not as widely used is the use of list of terms that control calls to Prolog
predicates. By convention an options list is placed at the last argument of a predicate
and commonly contains a number of single arity terms. Real now provides a
uniform way to marry the two conventions and a flexible way of handling options
addressed to Prolog predicates accessing R functions. In addition, a number of
standard tasks have been incorporated to a new interface predicate:
which can also be accessed as
r call(F unc, Opts).</p>
        <p>← F unc ++ Opts</p>
        <p>F unc is a compound term which is translated to an R function call and
Opts can be a combination of: (a) =/2 terms, which are added to F unc, (b)
options controlling r call/2 ’s own execution and (c) Prolog style options which
can influence the caller’s behaviour but are ignored in the R call. Some of r call/2
options are:
rvar(Rvar) when given call becomes: Rvar ← F call
rmv(Rmv=false) removes Rvar after end of call
stem(Stem=real plot) stem to use for output files
outputs(Outs=false) a list of output devices
debug(Dbg=false) sets debug(real) for the duration of call
fcall(FinCall) returns the term constructed after =/2 additions
post call(Post) call this after the function call
b
a
3
3.1
as.factor(pos)
3
2
1
main
lege213nd
0
2
4
6
a</p>
        <p>b
x
b real is a library based on Real which contains a collection of predicates that
aim to provide a Prolog based interface to a number of simple tasks. The target
audience is Prolog users that have no previous experience with R. The predicates
described here can use the basic functionality of the underlying R functions and
can adjust some of the behaviour entirely in Prolog, while allowing arbitrary
option passing to users with some familiarity with R.</p>
        <p>
          Bar plots are basic plots that can present comparative information in a
intuitive manner. Here we present a Prolog interface to ggplot2
          <xref ref-type="bibr" rid="ref6">(Wickham, 2009)</xref>
          .
In its most general form, predicate gg bar plot/2 displays a number of grouped
measurements such as, for instance, the cpu-timings of a number of machine
learning algorithms ran on a number of datasets. The following query, produces
the plot in the LHS of Fig. 3.1.
        </p>
        <p>? − P airs = [a − [1, 2, 3], b − [2, 4, 6]], gg bar plot(P airs, [ ]).
(4)
ggplot2 is a complex piece of software able to display many types of plots while
gg bar plot/2 only accessing the bar plotting part. Within this, a number of plot
elements can be controlled with Prolog options passed in the second argument.
The following query changes elements such as the colour of the drawing pen
(black) the labels (x,y and main), legend title and fill colours, producing the
plot in the RHS of Fig. 3.1.</p>
        <p>? − P airs = [a − [1, 2, 3], b − [2, 4, 6]],
(5)
Opts = [ geom bar draw colour(black),
f ill colours([” skyblue2” , ” khaki2” , ”# F B9A99]”) ,
itrrsaoeaaBM lilIrrrsyepehaCm liilttcononnenLanC liltcaeeodadoFwC tttrrpoeabonouSH iiitrrcbeadonFP trraenaLodFP r82aoaZCm trs063euD iltnaaV itrrv4eenoDH ilvJenaACM lrhaeneegodgCD rc05L4eSCM rc504eESM rc5L04eSM iicvdaonCH ltryaooaooTC it281aF it19a−FX iirrrnoaeFD trsopauLouE rc230eM lv421ooEV ts107nuaD rsc914e2ho−P tryooanaooTC rc042eDM z4adaXRM z4aagdaXRMW rc082eM rc802eCM
displacement
horsepower</p>
        <p>Heatmap functions are ubiquitous in R. b real provides a Prolog interface to
the aheatmap library. In addition to some simple option mapping aheatmap/2
provides polymorphic support for the first argument which could be a matrix R
variable or a Prolog representation of one. The following code uses the mtcars
example dataset, from which it plots a heatmap of two variables: hp (horsepower)
and disp (displacement).</p>
        <p>? − M tC ← as.list(mtcars), memberchk(hp = HP, M tC),
(6)
memberchk(disp = Disp, M tC), x ← [HP, Disp],
rownames(x) ← c(” horsepower” , ” displacement”) ,
&lt; −aheatmap(x).
3.2</p>
        <p>wgraph
R has a number of plotting functions for drawing graphs formed of nodes and
edges. Two of these are igraph() and qgraph(). The latter being based on the
former with some extra options and facilities for grouping nodes. The Prolog
pack wgraph provides a uniform Prolog interface to these R libraries. A plot
4
2
1</p>
        <p>3
3
2
4
3.3</p>
      </sec>
      <sec id="sec-1-4">
        <title>Availability</title>
        <p>The three libraries discussed here, (Real, b real and wgraph) are available as
SWI-Prolog packages 3 which can be installed easily from within SWI-Prolog. To
download and install Real the user needs to query with:
3 http://swi-prolog.org/pack/list
? − install pack(real).
with the default renderings can be easily drawn from a list representing the
graph connections and the weights on the edges:
? − G = [1 − 2 : 200, 2 − 3 : 400, 2 − 4 : 300],</p>
        <p>wgraph plot(G, [ ]).</p>
        <p>A set of Prolog options that control the choice of the drawing function and basic
parameters of the graph, and which work irrespective of the drawing function
can be provided in the second argument of wgraph plot/2. In the following
example igraph() is passed the size of nodes to use, the degree at which the node
labels should be displayed and the distance of the label from the node edge. The
resulting graph is shown in the RHS of Fig. 3.</p>
        <p>? − G = [1 − 2 : 200, 2 − 3 : 400, 2 − 4 : 300],</p>
        <p>Opts = [ plotter(igraph), label distance(−1),</p>
        <p>label degree(2), node size(4) ],
wgraph plot(G, Opts).
1
(7)
(8)
(9)</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Conclusions</title>
      <p>We presented a number of recent advances in Real and in particular shown
how developments in Prolog syntax have made Real syntax blend naturally into
Prolog code. The resulting syntax provides a powerful platform for accessing the
extensive collection of freely available R code. As a consequence Real can have a
strong positive influence into the penetration of Prolog to new application areas
such as bioinformatics and machine learning. With version 1.4 Real has reached a
new level of maturity including facilities for using R in web-servers. In addition
we highlighted some predicates from two sister packages. As with Real itself,
these are freely available and can be easily installed via the SWI-Prolog package
manager. In the future we plan to work towards suggesting internal ways for
Prolog to work better, or more confluent to R, with NA values and infinity.</p>
      <p>Real has been used in a number of projects in the area of bioinformatics and
has a steady stream of downloads via SWI-Prolog’s package manage r. With the
enhanced level of integration, Real is becoming a powerful hybrid programming
language.</p>
      <p>Bibliography</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Nicos</given-names>
            <surname>Angelopoulos</surname>
          </string-name>
          , Vitor Santos Costa, Joao Azevedo, Jan Wielemaker, Rui Camacho, and
          <string-name>
            <given-names>Lodewyk</given-names>
            <surname>Wessels</surname>
          </string-name>
          .
          <article-title>Integrative functional statistics in logic programming</article-title>
          .
          <source>In Proc. of Practical Aspects of Declarative Languages</source>
          , volume
          <volume>7752</volume>
          <source>of LNCS</source>
          , pages
          <fpage>190</fpage>
          -
          <lpage>205</lpage>
          , Rome, Italy, Jan.
          <year>2013</year>
          . URL http: //stoics.org.uk/~nicos/sware/real/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>Vıt´or Santos Costa, Ricardo Rocha, and Luıs´ Damas. The YAP Pro log system</article-title>
          .
          <source>Theory and Practice of Logic Programming</source>
          ,
          <volume>12</volume>
          :
          <fpage>5</fpage>
          -
          <issue>34</issue>
          ,
          <fpage>1</fpage>
          <lpage>2012</lpage>
          . ISSN 1475-3081.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Torbjorn</given-names>
            <surname>Lager</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jan</given-names>
            <surname>Wielemaker</surname>
          </string-name>
          . Pengines:
          <article-title>Web logic programming made easy</article-title>
          .
          <source>In International Conference of Logic Programming</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>David MacIntyre</surname>
            , Manju Chandiramani, Yun S Lee, Lindsay Kindinger, Ann Smith,
            <given-names>Nicos</given-names>
          </string-name>
          <string-name>
            <surname>Angelopoulos</surname>
          </string-name>
          , Benjamin C. Lehne, Shankari Arulkumaran, Richard Brown, Tiong Ghee Teoh, Elaine Holmes,
          <string-name>
            <surname>Jeremy K. Nicholson</surname>
          </string-name>
          , Julian Marchesi, and
          <string-name>
            <surname>Phillip</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Bennett</surname>
          </string-name>
          .
          <article-title>The vaginal microbiome during pregnancy and the postpartum period in a european population</article-title>
          .
          <source>Scientific Reports</source>
          ,
          <volume>5</volume>
          :Article number:
          <volume>8988</volume>
          ,
          <year>2015</year>
          . URL http://www.nature.com/srep/ 2015/150311/srep08988/full/srep08988.html.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Justin</given-names>
            <surname>Stebbing</surname>
          </string-name>
          , Hua Zhang, , Yichen Xu, Adam Sanit Nicos Angelopoulos, and
          <string-name>
            <given-names>Georgios</given-names>
            <surname>Giamas</surname>
          </string-name>
          .
          <article-title>Global mapping of tyrosine kinase signalling</article-title>
          .
          <source>Journal Title</source>
          ,
          <year>2015</year>
          .
          <article-title>Accepted for publication</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Hadley</given-names>
            <surname>Wickham</surname>
          </string-name>
          .
          <article-title>ggplot2: elegant graphics for data analysis</article-title>
          . Springer New York,
          <year>2009</year>
          .
          <source>ISBN 978-0-387-98140-6</source>
          . URL http://had.co.nz/ggplot2/book.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Jan</given-names>
            <surname>Wielemaker. SWI-Prolog</surname>
          </string-name>
          <string-name>
            <surname>ODBC</surname>
          </string-name>
          interface,
          <year>2014</year>
          . URL http://www. swi-prolog.org/pldoc/package/odbc.html .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Jan</given-names>
            <surname>Wielemaker</surname>
          </string-name>
          and
          <article-title>Vıt´or Santos Costa. On the portability of Prolog applications</article-title>
          .
          <source>In Practical aspects of Declarative Languages</source>
          , pages
          <fpage>69</fpage>
          -
          <lpage>83</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Jan</given-names>
            <surname>Wielemaker</surname>
          </string-name>
          , Zhisheng Huang, and Lourens van der Meij.
          <article-title>SWI-Pr olog and the web</article-title>
          .
          <source>TPLP</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <fpage>363</fpage>
          -
          <lpage>392</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Jan</given-names>
            <surname>Wielemaker</surname>
          </string-name>
          , Tom Schrijvers,
          <string-name>
            <given-names>Markus</given-names>
            <surname>Triska</surname>
          </string-name>
          , and
          <article-title>Torboj¨rn La ger</article-title>
          .
          <source>SWIProlog. Theory and Practice of Logic Programming</source>
          ,
          <volume>12</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>67</fpage>
          -
          <lpage>96</lpage>
          ,
          <year>2012</year>
          . ISSN 1471-0684.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Hua</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Nicos Angelopoulos, Yichen Xu, Arnhild Grothey, Joao Nunes, Justin Stebbing, and
          <string-name>
            <given-names>Georgios</given-names>
            <surname>Giamas</surname>
          </string-name>
          .
          <article-title>Proteomic profile of KSR1-re gulated signaling in response to genotoxic agents in breast cancer</article-title>
          .
          <source>Breast Cancer Research and Treatment</source>
          ,
          <year>2015</year>
          . URL http://link.springer.com/article/10. 1007/s10549-015-3443-y .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>