Systems and Learning Algorithms for
        Probabilistic Logical Knowledge Bases

                                  Giuseppe Cota

                Dipartimento di Ingegneria – University of Ferrara
                      Via Saragat 1, I-44122, Ferrara, Italy
                           giuseppe.cota@unife.it


      Abstract. In real world domains the information is often uncertain,
      hence it is of foremost importance to be able to model uncertainty and
      to reason over it. In this paper we show tools and learning systems un-
      der development for probabilistic structured data. Four systems will be
      considered and an overview of the related issues and of future work will
      be given. The first described system is cplint on SWISH, a web ap-
      plication that allows the user to write Probabilistic Logic Programs and
      submit the computation of the probability of queries with a web browser.
      Then two distributed structure learning algorithm are illustrated: SEM-
      PRE (“distributed Structure lEarning by MaPREduce”) and LEAPMR
      (“LEArning Probabilistic description logics by MapReduce”), the former
      learns new clauses of Probabilistic Logic Programs, the latter is used in
      the context of Probabilistic Description Logics. The last system taken
      into account is SML-Bench, developed by the research group AKSW of
      Leipzig, a benchmarking tool for structured data that has been extended
      to deal with algorithms for probabilistic structured data.

      Keywords: Probabilistic Structured Data, Probabilistic Logic Program-
      ming, Probabilistic Description Logics, Structure Learning, MapReduce


1   Introduction
Representing uncertain information and being able to reason over it is of fore-
most importance for real world applications. In the last decades several semantics
where proposed to represent uncertainty, one of the most prominent approaches
for representing probabilistic information in Logic Programming is the distribu-
tion semantics [15]. This semantics is at the basis of many languages, such as
Independent Choice Logic, PRISM, Logic Programs with Annotated Disjunc-
tions (LPADs) and ProbLog.
    In [3] the authors proposed an application of the distribution semantics to
Description Logics (DLs) and called the resulting semantics DISPONTE (“DIs-
tribution Semantics for Probabilistic ONTologiEs”).
    With these two semantics it is possible to build algorithms and applications
for Probabilistic Logic Programming and Description Logics. Here we discuss the
current development of tools and learning systems for probabilistic logical knowl-
edge bases (KBs) that follow either the distribution semantics or DISPONTE.
    We proceed as follows. Section 2 provides a brief introduction to the syntax
and the semantics on which the discussed systems are based. Section 3 presents
cplint on SWISH, a web application that allows the user to write Probabilistic
Logic Programs and submit the computation of the probability of queries with
a web browser. Section 4 illustrates SEMPRE and LEAPMR , two distributed
structure learning algorithm. Section 5 quickly describe the changes made to
SML-Bench, a benchmarking tool developed by the research group AKSW of
Leipzig, in order to deal with algorithms for probabilistic structured data. Finally
section 6 discuss the future work and draws conclusions.


2     Syntax and Distribution Semantics

LPADs [16] consist of a finite set of annotated disjunctive clauses Ci of the
form hi1 : Πi1 ; . . . ; hini : Πini : −bi1 , . . . , bimi . Here, bi1 , . . . , bimi are logical
literals which form the body of Ci , denoted by body(Ci ), while hi1 , . . . hini are
logical
      Pnatoms   and {Πi1 , . . . , Πini } are real numbers in the interval [0, 1] such
          i
that k=1     Πik ≤ 1. Note that if ni =P    1 and Πi1 = 1 the clause corresponds to a
                                              ni
non-disjunctive clause. Otherwise, if k=1            Πik < 1, the head of the annotated
disjunctive clause implicitly contains an extra atom null         Pnthat        does not appear
in the body of any clause and whose annotation is 1 − k=1             i
                                                                            Πik .
    Given an LPAD P , the grounding ground (P ) is obtained by replacing vari-
ables with terms from the Herbrand universe in all possible ways. If P does not
contain function symbols and P is finite, ground (P ) is finite as well. ground (P )
is still an LPAD from which we can obtain a normal logic program by selecting
a head atom for each ground clause. In this way we obtain a so-called world to
which we can assign a probability by multiplying the probabilities of all the head
atoms chosen. We thus get a probability distribution over worlds from which we
can define a probability distribution over the truth values of a ground atom: the
probability of an atom q being true is the sum of the probabilities of the worlds
where q is true1 .
    Description Logics (DLs) are a family of logic based knowledge representation
formalisms which are of particular interest for representing ontologies and for
the Semantic Web. For an extensive introduction to DLs we refer to [2].
    DISPONTE [3], like the distribution semantics, defines a probability distribu-
tion over regular knowledge bases (also called worlds). A probabilistic knowledge
base is a set of certain axioms or probabilistic axioms. Certain axioms take the
form of regular DL axioms. Probabilistic axioms take the form p :: E where p is
a real number in [0, 1]. To create a world, we decide whether to include or not
each probabilistic axiom, then we multiply the probability of the choices done to
compute the probability of the world. The probability of a query is then obtained
from the joint probability of the worlds and the query by marginalization.

1
    We assume that the worlds all have a two-valued well-founded model.
3   cplint on SWISH
To reach a wider audience and popularize Probabilistic Logic Programming we
developed cplint on SWISH [13]. This is a web application for running the
Probabilistic Logic Programming system cplint [10] with just a web browser:
the algorithms run on a server and the user can post queries and see the re-
sults in his browser. The application is available at http://cplint.lamping.
unife.it. In recent times the system has been extended with the inclusion of
algorithms for computing conditional probabilities with exact, rejection sam-
pling and Metropolis-Hasting methods. Moreover, the system now allows hybrid
programs, i.e., programs where some of the random variables are continuous.
To perform inference on such programs likelihood weighting is used that makes
it possible to also have evidence on continuous variables. cplint on SWISH
offers also the possibility of sampling arguments of goals, a kind of inference
rarely considered but useful especially when the arguments are continuous vari-
ables. Finally, cplint on SWISH offers the possibility of graphing the results,
for example by drawing the distribution of the sampled continuous arguments
of goals.

4   Distributed Structure Learning Systems
In order to reduce the learning time, we tried to distribute it by using a MapRe-
duce approach. Two algorithms has been proposed: SEMPRE and LEAPMR .
The former learns new clauses for Probabilistic Logic Programs that follow the
distribution semantics [15], the latter learns new axioms for Probabilistic De-
scription Logics that follow DISPONTE [3].
    SEMPRE [14] parallelizes three operations of the structure learning algo-
rithm SLIPCOVER [5] by employing n workers, one master and n − 1 slaves.
All the workers initially receive all the input data.
    The first operation is scoring clause refinements: when the revisions for a
clause are generated, the master process splits them evenly into n subsets. Then,
SEMPRE enters the Map phase, where each worker scores a set of refinements
and returns with their log-likelihood (LL). Scoring is performed using (serial)
EMBLEM [4] which is run over a theory containing only one clause. Once the
master has received all sets of scored refinements from the workers, it enters the
Reduce phase, where it updates the beam of promising clauses and the sets of
target and background clauses (TC and BC respectively).
    The second parallelized operation is parameter learning for the theories. In
this phase, each clause from T C is tentatively added to the theory. In the end,
it contains all the clauses that improved its LL (search in the space of theories).
During this phase a MapReduce version of EMBLEM called EMBLEMMR is
used.
    The third parallelized operation is the final parameter optimization for the
theory including also the background clauses. All the background clauses are
added to the theory previously learned and the parameters of the theory are
learned by means of EMBLEMMR .
    SEMPRE was tested on the following seven real world datasets: Hepatitis,
Mutagenesis, UWCSE, Carcinogenesis, IMDB, HIV and WebKB. The speedup
is always larger than 1 and grows with the number of workers, except for HIV
and IMDB, where there is a slight decrease for 16 and 32 workers due to the
overhead; however, these two datasets were the smallest and less in need of a
parallel solution.
    LEAPMR [6] is an evolution of the LEAP system [12] that performs structure
and parameter learning of probabilistic ontologies under DISPONTE. While
the latter exploits EDGE [11], the former was adapted to be able to perform
EDGEMR [7]. EDGE is a system for learning the parameters of DISPONTE KB
and EDGEMR is its distributed version.
    In order to learn an ontology, LEAPMR first searches for good candidate
probabilistic subsumption axioms by means of CELOE [9], then it performs a
greedy search in the space of theories using EDGEMR to learn the parameters
and to evaluate the theories using the log-likelihood as heuristic.
    LEAPMR takes as input the knowledge base K and the configuration settings
for CELOE and EDGEMR , then generates a set of candidate axioms by exploiting
CELOE and the sets of positive and negative examples (concept membership
axioms) for EDGEMR . Then LEAPMR adds to K one probabilistic subsumption
axiom at a time. After each addition, EDGEMR is performed on the extended
KB to compute the LL of the data and the parameters. If the LL is better
than the current best, the new axiom is kept in the knowledge base and the
parameters of probabilistic axioms are updated, otherwise the learned axiom is
removed from the ontology and the previous parameters are restored. The final
theory is obtained from the union of the initial ontology and the probabilistic
axioms learned.
    In order to test how much the exploitation of EDGEMR can improve the
performances of LEAPMR , we did a preliminary test where we considered the
Moral KB which qualitatively simulates moral reasoning. We performed the ex-
periments on a cluster of Linux machines.For each experiment 2 candidate proba-
bilistic axioms are generated by using CELOE and a maximum of 3 explanations
per query was set for EDGEMR . The obtained speedup is significant even if it is
sublinear, showing that a certain amount of overhead (the resources, and thereby
the time, spent for the MPI communications) is present.


5   SML-Bench

When a new learning system is under development a lot of time is spent for its
evaluation. If you want to compare your new system with existing other ones,
you have to learn how to use the other systems (usually a command line in-
terface), write a bunch of scripts, manually write the results, etc. In extreme
cases (especially if do not have any datasets) the setup of the experiment ses-
sion could be more time consuming than the actual development of your new
algorithm/system.
     The AKSW research group of Leipzig is currently developing SML-Bench2 ,
a benchmark tool to ease the testing and the comparison of learning systems
for structured data. Unfortunately SML-Bench currently supports only non-
probabilistic algorithms and provides measures for evaluation that are not suit-
able for probabilistic learners. AUCROC (“Area Under the Receiver Operating
Characteristic curve”) and AUCPR (“Area Under the Precision Recall curve”)
are measures widely used for the evaluation of probabilistic and scoring clas-
sifiers. We extended SML-Bench to use this kind of measures. Moreover, we
added the probabilistic structure learning systems SLIPCOVER and LEAP in
this benchmark. We are currently evaluating LEAP by means of this benchmark-
ing tool.


6     Conclusions and Future Work
LEAP has been integrated into DL-Learner [8] and it is part of the release
1.3. In such a manner, we extended this framework to the field of Probabilistic
Description Logics. As next step in the immediate future we plan to integrate
LEAPMR into DL-Learner and to adapt SML-Bench in order to be able to use
distributed system as such LEAPMR and SEMPRE.
    The main problem of structure learning algorithms is that they often scale
poorly. This is problematic to handle Big Data. Several solutions could be
adopted. In order to fit a dataset in main memory, distributed reasoning ap-
proaches could be used [1]. Reducing the knowledge base by removing the irrel-
evant parts is another way to reduce the reasoning time.
    For LEAPMR we are currently working for distributing both the structure and
the parameter learning of probabilistic knowledge bases by exploiting EDGEMR
also when building class expressions. We would like to distribute the scoring
function used to evaluate the obtained refinements. In this function EDGEMR
takes as input a KB containing only the individuals and the class expression
to test. Finally, the class expressions found are sorted according to the LL re-
turned by EDGEMR and their initial probability are the probability learned
during the execution of EDGEMR . Currently LEAP and LEAPMR support only
supervised learning, we plan to add semi-supervised or unsupervised learning.
Another branch of research is to adapt LEAPMR to exploit Apache Spark and
to run the queries on GPUs.


References
 1. B. Ahmadi, K. Kersting, M. Mladenov, and S. Natarajan. Exploiting symmetries
    for scaling loopy belief propagation and relational training. Mach. Learn., 92(1):91–
    132, 2013.
 2. F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider.
    The Description Logic Handbook: Theory, Implementation, and Applications. Cam-
    bridge University Press, New York, NY, USA, 2003.
2
    https://github.com/AKSW/SML-Bench
 3. E. Bellodi, E. Lamma, F. Riguzzi, and S. Albani. A distribution semantics for prob-
    abilistic ontologies. In URSW-2011, volume 778, pages 75–86, Aachen, Germany,
    2011. Sun SITE Central Europe.
 4. E. Bellodi and F. Riguzzi. Expectation Maximization over Binary Decision Dia-
    grams for probabilistic logic programs. Intell. Data Anal., 17(2):343–363, 2013.
 5. E. Bellodi and F. Riguzzi. Structure learning of probabilistic logic programs by
    searching the clause space. Theor. Pract. Log. Prog., 15(2):169–212, 2015.
 6. G. Cota, R. Zese, E. Bellodi, E. Lamma, and F. Riguzzi. Structure learning with
    distributed parameter learning for probabilistic ontologies. In J. Hollmen and
    P. Papapetrou, editors, Doctoral Consortium of ECMLPKDD 2015, pages 75–84,
    2015.
 7. G. Cota, R. Zese, E. Bellodi, F. Riguzzi, and E. Lamma. Distributed param-
    eter learning for probabilistic ontologies. In K. Inoue, H. Ohwada, and A. Ya-
    mamoto, editors, 25th International Conference on Inductive Logic Programming
    (ILP 2015), 2015.
 8. J. Lehmann. DL-Learner: learning concepts in description logics. J. Mach. Learn.
    Res., 10:2639–2642, 2009.
 9. J. Lehmann, S. Auer, L. Bühmann, and S. Tramp. Class expression learning for
    ontology engineering. Web Semantics: Science, Services and Agents on the World
    Wide Web, 9(1):71–81, 2011.
10. F. Riguzzi. A top down interpreter for LPAD and CP-logic. In Proceedings of the
    10th Congress of the Italian Association for Artificial Intelligence, volume 4733 of
    LNAI, pages 109–120. Springer, 2007.
11. F. Riguzzi, E. Bellodi, E. Lamma, and R. Zese. Learning the parameters of prob-
    abilistic description logics. In Inductive Logic Programming Late Breaking papers.
    CEUR Workshop Proceedings, volume 1187, pages 46–51. Sun SITE Central Eu-
    rope, 2014.
12. F. Riguzzi, E. Bellodi, E. Lamma, R. Zese, and G. Cota. Learning probabilis-
    tic description logics. In F. Bobillo, R. N. Carvalho, P. C. Costa, C. d’Amato,
    N. Fanizzi, K. B. Laskey, K. J. Laskey, T. Lukasiewicz, M. Nickles, and M. Pool,
    editors, Uncertainty Reasoning for the Semantic Web III, LNCS, pages 63–78.
    Springer International Publishing, Berlng, Heidelberg, 2014.
13. F. Riguzzi, E. Bellodi, E. Lamma, R. Zese, and G. Cota. Probabilistic logic pro-
    gramming on the web. Software Pract. and Exper., 2015.
14. F. Riguzzi, E. Bellodi, R. Zese, G. Cota, and E. Lamma. Structure learning of
    probabilistic logic programs by MapReduce. In K. Inoue, H. Ohwada, and A. Ya-
    mamoto, editors, 25th International Conference on Inductive Logic Programming
    (ILP 2015), 2015.
15. T. Sato. A statistical learning method for logic programs with distribution seman-
    tics. In L. Sterling, editor, ICLP-95, pages 715–729, Cambridge, Massachusetts,
    1995. MIT Press.
16. J. Vennekens, S. Verbaeten, and M. Bruynooghe. Logic programs with annotated
    disjunctions. In 20th International Conference on Logic Programming, volume 3131
    of LNCS, pages 195–209, Berlin Heidelberg, Germany, 2004. Springer.