<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Probabilistic Logical Modelling of Quantum and Geometrically-Inspired IR</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fabrizio Smeraldi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel Martinez-Alvarez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ingo Frommholz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Roelleke</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computing Science, University of Glasgow</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Electronic Engineering and Computer Science, Queen Mary University of London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Information Retrieval approaches can mostly be classed into probabilistic, geometric or logic-based. Recently, a new unifying framework for IR has emerged that integrates a probabilistic description within a geometric framework, namely vectors in Hilbert spaces. The geometric model leads naturally to a predicate logic over linear subspaces, also known as quantum logic. In this paper we show the relation between this model and classic concepts such as the Generalised Vector Space Model, highlighting similarities and di erences. We also show how some fundamental components of quantum-based IR can be modelled in a descriptive way using a well-established tool, i.e. Probabilistic Datalog.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Three main branches of IR are probabilistic, geometric or logic-based IR. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
discusses the relationship between these branches, showing that geometric
approaches can have a probabilistic or logic-based interpretation, as it is known
from quantum probabilities and quantum logics. Subsequent work discusses the
prospect of such an interpretation for context-based or interactive IR [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and
speci c retrieval tasks like diversity and novelty [
        <xref ref-type="bibr" rid="ref5 ref7">7, 5</xref>
        ]. On the other hand, logic-based
approaches combine concepts from databases and IR and o er advanced means
to exibly create sophisticated retrieval functions and for structured queries.
Combining logic-based approaches with geometric ones is thus a straightforward
step that has been started in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        In this paper, we contribute a rst step in extending with concepts from quantum
mechanics a well-established logic-based framework (probabilistic Datalog) that
has been used for modelling several IR tasks such as Information Extraction,
being reported that it produces programs than are easy to understand, debug
and modify [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. After introducing main geometrical concepts from quantum
mechanics, we show how a well-known geometric retrieval approach, the
generalised vector space model (GVSM), relates to quantum probabilities and the total
probability. We then explain how geometric concepts (e.g. Euclidean
normalisation) can be realised in probabilistic Datalog. In particular, we address how
traditional maximum-likelihood estimates (L1-normalisation) and Euclidean
estimates (L2-normalisation) are expressed and related in PDatalog. The main
technical contribution of this paper is the probabilistic logical modelling of the
mathematical concept of GIR, and the theorems and proofs to show the
correctness of the PDatalog programs to model L1 and L2 probabilities.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Geometric IR (GIR)</title>
      <p>2.1</p>
      <p>
        Background
Quantum logic, i.e. logic on Hilbert spaces, allows us to cast information retrieval
in a geometric context. A Hilbert space is a vector space endowed with an inner
product - in the nite-dimensional case, we can think of the Euclidean space.
In quantum logic predicates are represented by linear subspaces. If V and W
are two subspaces (predicates), conjunction is given by their intersection (also a
subspace) and alternation by the span of their union [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Negation is represented
by the orthogonal. With this in mind, projections and orthogonality become
important notions, and a special notation (the bra-ket notation) is introduced
to facilitate computation.
2.2
      </p>
      <p>Notation and computation
Given a Hilbert space H, vectors are denoted by greek letters in an angled
bracket, eg j i. This is called a \ket". The corresponding element of the dual is
denoted by h j, a \bra". The inner product of two vectors j i and j i is given
by h j i, the aptly-named \bracket" of the two vectors.</p>
      <p>In Euclidean spaces the scalar product establishes a natural correspondence
between the bra and ket spaces as follows:
h K j i := K( ; )
(1)
where K( ; ) is the scalar product and is a (ket) vector in H. This
correspondence is invariant up to a rotation of the basis. We can therefore use the scalar
product in the space to compute brackets, essentially thinking of j i as a column
vector, h j as a row vector, and h j i as a scalar product t (observables H
below would be symmetric matrices)
2.3</p>
      <p>
        Representing probabilities
As we have seen, a subspace identi es a predicate. We can represent
probabilities by a state vector normalised to one, so that the square of the norm of its
projection onto the subspace represents the probability of the predicate being
true [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. It is natural in the light of what we have seen above to use a Euclidean
normalisation h j i=1. Remembering the analogy with the dot product above,
this is expressed in components as follows: let fjeiig be the basis vectors, then
h j i = X h jeii heij i = X j heij i j2 = X
i i i
2
i
where by i we indicate the component of along ei. The projection of a state
j i onto jeii is obtained by applying the projection operator jeiiheij to j i, which
following the mechanics of the notation gives
      </p>
      <p>(jeii heij) j i = (heij i) jeii = i jeii :
If j i is normalised, it is immediate that i2 can be interpreted as a probability.
2.4</p>
      <p>Representing retrieval
Documents are represented as vectors in a Hilbert space. In a two-dimensional
space with basis vectors jdrivei and jschooli, a document about driving schools
might be seen as normalised coherent mixture of the basis states, taken for
instance with equal weight:
j i =
p2 p2
2 jdrivei + 2 jschooli
j i is a so-called superposition of the two states jdrivei and jschooli. This di ers
from the case in which we do not know if the document is purely about driving or
purely about schools. In quantum mechanics, such a condition is called a mixed
state which is represented by a density operator</p>
      <p>1 1
= 2 jdrivei hdrivej + 2 jschooli hschoolj :
This is the analogous of classical density matrices, see Equation 19.
These two alternative descriptions seem similar if we are interested for instance
in the probability that the document is about driving:</p>
      <p>p2 p2 1
j hdrivej i j2 = j 2 hdrivejdrivei + 2 hdrivejschooli j2 = (6)
2
because jdrivei and jschooli are orthogonal. Similarly,
(2)
(3)
(4)
(5)
T r( jdrvi hdrvj) =</p>
      <p>1 1
hdrvj jschi 2 hschj + jdrvi 2 hdrvj (jdrvi hdrvj) jdrvi +</p>
      <p>1 1
hschj jschi 2 hschj + jdrvi 2 hdrvj (jdrvi hdrvj) jschi =</p>
      <p>1 1 1
hdrvj jschi 2 hschj + jdrvi 2 hdrvj jdrvi = 2
where T r denotes the trace function, the sum of the diagonal elements of a
matrix. However the superposition case Equation 4 expresses the extent to which
the document part-takes of both concepts jdrivei and jschooli. This is made
evident by the following example. Suppose we are interesting in nding documents
about driving schools. We can then de ne the following observable:</p>
      <p>H = jdrivei hschoolj + jschooli hdrivej
In matrix notation it is represented by a symmetric matrix (one of Pauli)
It is easy to see that the average value of the observable H on j i is as follows:
h j H j i =
p2 p2
2 (hdrvj + hschj) (jdrvi hschj + jschi hdrvj) 2 (jdrvi + jschi) =
1 1
2 hschj (jdrvi + jschi) = 2
Actually one can show that H has Eigenvalues +1 and -1 and that the
corresponding Eigenvectors are respectively j i as in Equation 4 and
j i =
p2
2 jdrivei
p2
2 jschooli
(8)
(9)
(10)
(11)
(12)
Now if we interpret the components i of the vectors
then we obtain:</p>
      <p>i = TF(ti; ) IDF(ti)
Hereby, ti is the term corresponding to dimension i, TF(ti; ) is a frequency
component, and IDF(ti) is a measure to re ect the inverse document frequency
(e.g. IDF(ti) = log(nD(ti)=(ND nD(ti))), where nD(ti) is the number of
documents in which ti occurs, and ND is the total number of documents). Note
that the vector component is negative for the case nD(ti) &gt; N2D .
Of course, any other frequency or probabilistically motivated measure can be
chosen as a vector component.</p>
      <p>We can see that Span(j i) represents the subspace on which the frequencies
of the terms are positively correlated while Span(j i) is the subspace on which
they are negatively correlated.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Relationships between GIR and Traditional Concepts</title>
      <p>This section outlines the relationships between \Geometric IR" and traditional
concepts, namely the GVSM (section 3.1) and the total probability (section 3.2).
as TF-IDF frequencies,
3.1</p>
      <p>Generalised Vector-Space Model (GVSM)
The scalar product of two vectors can be re-written using the identity matrix as
the intermediate between the vectors:
where by T we indicate transposition.</p>
      <p>In more general, the VSM is based on the idea to use the matrix G (a
termtimes-term) matrix between document and query.</p>
      <p>d T</p>
      <p>q := d T I q
d T
q := d T</p>
      <p>G q
Here, q and d are events, and T is a set of disjoint events. In the context of IR,
let q be a query, d be a document, and t be a term. Using Bayes' theorem for
P (tjd), the theorem can be rewritten:</p>
      <p>P (qjd) =</p>
      <p>1
P (d)</p>
      <p>X P (qjt) P (djt) P (t)</p>
      <p>t
This form relates the total probability to the GVSM, as is demonstrated in the
following:
The matrix G may be used to associate semantically related terms. For example,
setting g12 = 1 leads to the following equation:
d T</p>
      <p>G q = d1g11q1 + d1g12q2 + d2g22q2
Now, the rst term (e.g. the term \dog") is related to the second term (e.g. the
term \animal"), i.e. a query containing \animal" will retrieve documents that
contain \dog". Here, g21 may be not equal to g12, if we wish to model a
generalisation of terms. For synonyms, e.g. \classi cation" and \categorisation", we
have gij = gji, i.e. the matrix G is symmetric for synonyms.</p>
      <p>In a real-valued Hilbert space, a symmetric matrix G is a Hermitian operator
and corresponds exactly to the observable H we introduced in Equation 8.
3.2</p>
      <p>Total Probability
The total probability theorem is as follows:</p>
      <p>P (qjd) =</p>
      <p>X P (qjt) P (tjd)
t2T
d = (P (djt1); : : : ; P (djtn))</p>
      <p>2 P (t1)
q = (P (qjt1); : : : ; P (qjtn))</p>
      <p>3
0
G = 46 0
0
. . .</p>
      <p>0
P (tn)
Matrix G thus de ned is a representation of a document vector d in terms of
probabilities of disjoint events. In the quantum framework, disjoint events are
orthogonal subspaces; thus G corresponds to the density matrix introduced in
Equation 5. We note that there is no classical analogy for the coherent quantum
superposition as introduced in Equation 4.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Probabilistic Datalog: A Language for Probabilistic</title>
    </sec>
    <sec id="sec-5">
      <title>Logical Modelling</title>
      <p>
        Probablistic Logical Modelling (PLM) is a modelling approach that is based on
probability theory and logic. In principle, PLM is a theoretical framework
composed of possible world semantics (logic and probability theory). Probabilistic
extensions of standard languages (e.g. Probabilistic Datalog, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], probabilistic
SQL, [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]) are instances of probabilistic modelling languages. We present a brief
overview over Probabilistic Datalog and show its application for the probabilistic
logical modelling of the GVSM, the total probability, and TF-IDF.
Probabilistic Datalog (PDatalog) combines Datalog (query language used in
deductive databases) and probability theory [
        <xref ref-type="bibr" rid="ref3 ref9">3, 9</xref>
        ]. It was extended to improve its
expressiveness and scalability for modelling ranking models [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In addition, it
is a exible platform that has been used as an intermediate processing layer for
semantic/terminological logics in di erent IR tasks such as ad-hoc retrieval [
        <xref ref-type="bibr" rid="ref4 ref6">4,
6</xref>
        ] or annotated document retrieval [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].It allows Bayesian goals and subgoals
(for the modelling of probability estimation and conditional probabilities) and
assumptions like SUM, PROD, SQRT or ROOT for combining probabilities of
tuples. For example, given the information about grades for a given degree.
P (gradejperson) can be inferred with the model shown in Figure 1.
4.1
      </p>
      <p>
        Probabilistic Logical Modelling of the GVSM
The modelling of the Total Probability is illustrated in gure 3. P (termjdoc)
and P (queryjterm) are obtained using the Bayesian assumption of PDatalog.
3 # Extensional evidence:
4 grade(john, "B", art) ; grade(anna, "A", art);
5 grade(mary, "B", maths); grade(peter, "B", maths); grade(paul, "C", maths);
7 ? p grade degree(Grade, Degree);
8 # 0.5 ("B", art); 0.5 ("A", art); 0.667 ("B", maths); 0.333 ("C", maths)
10 # For a person Mr. X that has joined both arts and maths, what is the probability
of "A", i.e. P(gradejperson)?
11 0.5 register (mr x, maths); 0.5 register (mr x, arts) ;
13 # P(degreejperson)
14 p degree person (Degree, Person) : register (Person, Degree) j (Person)
16 # P(gradejperson): Using Total Probability
17 p grade person SUM(Grade, Person) :
18 p grade degree(Grade, Degree) &amp; p degree person(Degree, Person);
8 # Lower and upper triangles:
9 g( sailing ,boats); # For a query with "sailing" retrieve docs containing "boats"
10 g(boats, sailing ) ; # For q query with "boats" ...
7 #P(queryjdoc)
8 p q d SUM(DocId, QueryId) : p q t(Term, QueryId) &amp; p t d(term, DocId);
For \pidf(T)", the probability estimation is based on idf(t)= maxidf, and this
value between 0 and 1 has a probabilistic semantics (see [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]), namely the
probability to occur (P (t occurs)) is equal to being not informative in maxidf trials,
where the probability of being informative is P (t informs) := idf(t)= maxidf.
The details of the meaning of TF and IDF-based probabilities is beyond the focus
of this paper; however, important is that the TF and IDF-based probabilities
described in the PDatalog program have a proper probabilistic semantics, and
this leads to a probabilistic interpretation of the TF-IDF score.
      </p>
      <p>The rule for \w qterm(T,Q)" models IDF-based query term weighting. This is
followed by a normalisation. The normalised tuple probabilities are then used
for obtaining a probabilistic TF-IDF-based score in \retrieve(D,Q)".
In summary, this example illustrating the probabilistic logical modelling of
TFIDF highlighted that TF-based and IDF-based probabilities are combined to
obtain a probabilistic TF-IDF-based score.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Probabilistic Logical Modelling of Geometric IR</title>
      <p>Geometric IR can be viewed as a perspective for IR where the modelling of
documents and queries is based on vectors. Quantum-inspired IR may be viewed
as a modelling approach that combines geometric IR and probability theory.
Essentially, the vector components are probabilities, and the combination of
vectors (and/or matrices) yields probabilities.</p>
      <p>The following sections present the modelling of GIR. Each section is related to
the respective GIR section in which the mathematical foundations were reviewed.
As pointed out above, a central property of Quantum-inspired IR is related to
the Euclidean norm, also referred to as the L2 norm.</p>
      <p>The L1 norm is simply the sum over the vector components:</p>
      <p>L2(x) := p( X xi2)</p>
      <p>i
L1(x) :=</p>
      <p>X xi
i
(20)
(21)
With respect to probabilistic logical modelling, the L1 norm corresponds to the
assumption 'DISJOINT' (corresponds to maximum-likelihood (ML) estimate),
and the L2 norm is covered by the assumption 'EUCLIDEAN'.
10 # Euclidean P(tjd); based on the L2 norm.
11 # P L2(tjd) = P L1(tjd) / sqrt ( sum ft' in dg square(P L1(t'jd)) )
12 p L2 t d(T, D) : p L1 t d(T, D) j EUCLIDEAN(D);
# Example:
# L2(doc2) = sqrt(5) = sqrt(2^2 + 1^2)
# 0.894 (sailing ,doc2) # 2 / sqrt(5)
# 0.447 (boats,doc2) # 1 / sqrt(5)
Theorem 1. The rule for \p L1 t d" is correct, i.e. the tuple probabilities in
\p L1 t d" correspond to ML-probabilities of the form n=N where n is the sum
of tuple probabilities in \term(t,d)", and N is the sum of tuple probabilities
\term( ,d)", i.e. the sum of document tuples.</p>
      <p>Proof. The L1-based probability PL1(tjd) is modelled in the rule for relation
\p L1 t d". The rule body generates a probabilistic relation in which each rule
probability (from relation \term") is divided by the evidence probability, i.e. the
sum of the tuple probabilities of the tuples that share the same evidence key.
Here, \(D)" is the evidence key, i.e. the document id constitutes the evidence key.
Therefore, the tuple probabilities generated by the rule body have the semantics
Pterm((t; d))= Pt02d Pterm((t0; d)).</p>
      <p>The aggregation assumption in the rule head, i.e. SUM in p t d SUM(T,D),
aggregates the tuple probabilities of non-distinct tuples.</p>
      <p>For a non-probabilistic relation \term", \p t d" is the normalised
withindocument term frequency, i.e. nL(t; d)= Pt02d nL(t0; d), where nL(t; d) is the total
occurrence of t (also denoted as tfd := nL(t; d)).</p>
      <p>Theorem 2. The rule for \p L2 t d" is correct, i.e. the tuple probabilities in
\p L2 t d" correspond to the probabilities as required for GIR.</p>
      <p>Proof. Let xi be the vector component for the i-th dimension. The Euclidean
normalisation is:</p>
      <p>xi
qPj xj2
Let PL1(t) := xt= Pt0 xt0 be the L1-based probability, and according to
theorem 1, we nd this probability in relation \p L1 t d".</p>
      <p>The norm EUCLIDEAN in the subgoal of \p L2 t d" forms the sum of the
squares of the probabilities that share the same evidence key. Then, for each
tuple, the tuple probability is computed as follows:
Given the computation of PL1(tjd), the following equation holds:
PL2(tjd) = qP</p>
      <p>PL1(tjd)
t02d (PL1(t0jd))2
qP</p>
      <p>PL1(tjd)
t02d (PL1(t0jd))2
= pP
xt
t0 xt0
(22)
Thus, the tuple probabilities in name p L2 t d are correct in the sense that they
are based on the Euclidean normalisation.
5.2</p>
      <p>Modelling retrieval (GIR 2.4)
The following PDatalog program illustrates the modelling of the GIR-based
approach where L2-norm-based probabilities are combined in the rule body.
The PD program shows some rules to illustrate the modelling of various retrieval
models. The rules shown underline that the models share the same pattern: a
query representation is joined (matched) with a document representation, and
the evidence from this match is aggregated.
5 # Geometric IR:
6 gir retrieve SUM(D, Q) : p L2 t q idf(T, Q) &amp; p L2 t d idf(T, D);
8 # TF IDF:
9 tf idf retrieve SUM(D, Q) : tf q(T, Q) &amp; pidf(T) &amp; tf d(T, D);
11 # VSM:
12 vec q(T, Q) : tf q(T, Q) &amp; pidf(T);
13 vec d(T, D) : tf d(T, D) &amp; pidf(T);
14 vsm retrieve SUM(D, Q) : vec q(T, Q) &amp; vec d(T, D);
16 # GVSM:
17 gvsm retrieve SUM(D, Q) : vec q(T q, Q) &amp; g(T q, T d) &amp; vec d(T d, D);
This paper reviewed the basic concepts of a geometric and probabilistic approach
to IR. In essence, vector and matrix components correspond to probabilities, and
multiplications in the underlying vector spaces generate probabilities.
This paper has two main contributions. Firstly, section 3 relates the terminology
of \Geometric IR" to traditional concepts such as the generalised vector space
model (GVSM) and the total probability theorem, whereby we also underline
the relationship between the GVSM and the total probability.</p>
      <p>Secondly, section 4 reviews Probabilistic Datalog, and shows the probabilistic
logical modelling of some standard models (TF-IDF, GVSM). Thirdly, section 5
added the modelling of selected concepts of \Geometric IR" (Euclidean-based
probability estimation, modelling of retrieval models to outline the dualities
between IR models).</p>
      <p>This paper advocates probabilistic logic (PDatalog and descriptive modelling in
general) as a potential platform to model IR. The overall nding of this paper is
that the expressiveness of PDatalog is su cient to model traditional IR models
and concepts of \Geometric IR".</p>
      <p>Future work will include to investigate quality and scalability of the shown
approach. Given that PDatalog scales for TF-IDF, VSM, PIN, and language
modelling, the hypothesis is that Euclidean-based normalisations and other concepts
of GIR can be transferred to large-scale. Having discussed Euclidean
probability estimations in this work, we will include other components of quantum
mechanics, like projection, state vector ensembles (mixed states) and compound
systems expressed as tensor spaces, together with dynamics like state changes
due to observation, to probabilistic Datalog.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>We acknowledge the support of the UK EPSRC Project EP/F015984/1
Renaissance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>I.</given-names>
            <surname>Frommholz</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Fuhr</surname>
          </string-name>
          .
          <article-title>Probabilistic, object-oriented logics for annotationbased retrieval in digital libraries</article-title>
          .
          <source>In Proceedings of Joint Conference on Digital Libraries (JCDL'06)</source>
          , pages
          <fpage>55</fpage>
          {
          <fpage>64</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>I.</given-names>
            <surname>Frommholz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Larsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Piwowarski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lalmas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ingwersen</surname>
          </string-name>
          , and
          <string-name>
            <surname>K. van Rijsbergen. Supporting</surname>
          </string-name>
          <article-title>Polyrepresentation in a Quantum-inspired Geometrical Retrieval Framework</article-title>
          .
          <source>In Proceedings of IIiX 2010</source>
          , pages
          <fpage>115</fpage>
          {
          <fpage>124</fpage>
          ,
          <string-name>
            <surname>New</surname>
            <given-names>Brunswick</given-names>
          </string-name>
          , Aug.
          <year>2010</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>N.</given-names>
            <surname>Fuhr</surname>
          </string-name>
          .
          <article-title>Probabilistic Datalog - a logic for powerful retrieval methods</article-title>
          .
          <source>In Proceedings of the 18th ACM SIGIR Conference on Research and development in information retrieval (SIGIR'95)</source>
          , pages
          <fpage>282</fpage>
          {
          <fpage>290</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Meghini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Straccia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Thanos</surname>
          </string-name>
          .
          <article-title>A model of information retrieval based on a terminological logic</article-title>
          .
          <source>In ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>298</volume>
          {
          <fpage>307</fpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Melucci</surname>
          </string-name>
          .
          <article-title>A basis for information retrieval in context</article-title>
          .
          <source>ACM Transactions on Information Systems (TOIS)</source>
          ,
          <volume>26</volume>
          (
          <issue>3</issue>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>H.</given-names>
            <surname>Nottelmann</surname>
          </string-name>
          . PIRE:
          <article-title>An Extensible IR Engine Based on Probabilistic Datalog</article-title>
          .
          <source>In Proceedings of the European Conference on Information Retrieval (ECIR'05)</source>
          , pages
          <fpage>260</fpage>
          {
          <fpage>274</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B.</given-names>
            <surname>Piwowarski</surname>
          </string-name>
          , I. Frommholz,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lalmas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. Van</given-names>
            <surname>Rijsbergen</surname>
          </string-name>
          .
          <article-title>What can Quantum Theory Bring to Information Retrieval?</article-title>
          <source>In Proc. 19th International Conference on Information and Knowledge Management</source>
          , pages
          <volume>59</volume>
          {
          <fpage>68</fpage>
          ,
          <string-name>
            <surname>Oct</surname>
          </string-name>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>T.</given-names>
            <surname>Roelleke</surname>
          </string-name>
          .
          <article-title>A frequency-based and a Poisson-based probability of being informative</article-title>
          .
          <source>In ACM SIGIR</source>
          , pages
          <volume>227</volume>
          {
          <fpage>234</fpage>
          , Toronto, Canada,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>T.</given-names>
            <surname>Roelleke</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Fuhr</surname>
          </string-name>
          .
          <article-title>Information retrieval with probabilistic Datalog</article-title>
          . In F. Crestani,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lalmas</surname>
          </string-name>
          , and C. J. Rijsbergen, editors, Uncertainty and
          <article-title>Logics - Advanced models for the representation and retrieval of information</article-title>
          . Kluwer Academic Publishers,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. T. Roelleke,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Azzam</surname>
          </string-name>
          .
          <article-title>Modelling retrieval models in a probabilistic relational algebra with a new operator: The relational Bayes</article-title>
          .
          <source>VLDB Journal</source>
          ,
          <volume>17</volume>
          (
          <issue>1</issue>
          ):5{
          <fpage>37</fpage>
          ,
          <year>January 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>I. Schmitt.</surname>
          </string-name>
          <article-title>QQL : A DB&amp;IR Query Language</article-title>
          .
          <source>The VLDB journal</source>
          ,
          <volume>17</volume>
          (
          <issue>1</issue>
          ):
          <volume>39</volume>
          {
          <fpage>56</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>W.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Naughton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          .
          <article-title>Declarative information extraction using datalog with embedded extraction predicates</article-title>
          .
          <source>In VLDB '07: International conference on Very large data bases</source>
          , pages
          <volume>1033</volume>
          {
          <fpage>1044</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>C. J. van Rijsbergen</surname>
          </string-name>
          .
          <source>The Geometry of Information Retrieval</source>
          . Cambridge University Press, New York, NY, USA,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>