<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Modeling Emergent Associations of Nominal Compounds: Ongoing Research and Preliminary Results?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Massimo Melucci</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laurianne Sitbon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Queensland University of Technology</institution>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a roadmap to deliver tools able to model and predict the behaviour of neologisms in the form of nominal compounds. Quite often these compounds yield meanings (in our case, word associations) that are not related to the components of the compounds taken individually. Classical probabilities cannot handle this e ect, thus a framework based on quantum probability to model the phenomenon is proposed.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Most people agree that the combination pet tree is likely a bonzai and
that a pet human is probably a slave, although none of these
combinations exist in the English language, there is no connection between pet
and the interpretations of the compounds, bonzai is a quite rare type of
tree and slave is a rather rare type of human. Words that are connected
to a combination but are not connected to the single constituent words
of the combination are termed \emergent associates". This capacity that
humans have to share a creative understanding of novel combinations has
been at the centre of several studies in psychology and cognitive science
trying to understand what are the processes conducting to their
interpretation. In particular, we have evidence that some novel combinations
yield to associates that cannot be predicted from the associates of the
words taken separately nor from a collection of documents containing
both words together.</p>
      <p>
        Information Retrieval (IR) systems can deal with combinations only if
they occur in the documents, but they are still largely ignoring novel
combinations and emergent associates. In IR, novel combinations not
only can be encountered within queries and documents, but also are
essential to capture the intentions of authors or users. Multilingual users
may express queries using novel combinations assuming a consensus on
their interpretation when either they don't know the correct words
(paraphrasing) or they translate literally from their native language a concept
expressed as a combination. For example, a Chinese user may search
? The research leading to these results has received funding from the European Union
Seventh Framework Programme (FP7/2007-2013) under grant agreement N. 247590.
documents relevant to hippopotamus after literally translating it river
horse, and lobster would be literally translated dragon shrimp. This
strategy could work in human to human communication, but automatic
systems are to date not able to infer such new meanings as early work on
cross-lingual information retrieval using dictionary based translation has
proven [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
Additionally, native speakers of a language may voluntarily create novel
combinations that cannot be classically interpreted (e.g. from a
combination of the associates of the words taken individually) in order to
emphasize the importance of the emerging associates and attract the
attention of the reader, in the same way neologisms are intended for [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
News headlines are typical examples of application of such strategies,
with headlines such as Oklahoma Surprise: Islam as an Election Issue
or Brother's transplant gift carries unbearable cost, as well as video titles
with for example sh wish or a Wallmart adventure that attracted a lot
of attention on a single day. More generally, it was also found \that no
less than 39% of the neologisms had idiomatic meanings at their very
birth" [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The previous examples of usage of novel combinations re ect the main
problems (among others) QONTEXT (EU funded contract N. 247590) is
striving to address: (i ) can we predict whether a combination will yield
to an emergent associate? (ii ) can we predict what will be the meaning
of an novel combination?
2</p>
    </sec>
    <sec id="sec-2">
      <title>Research Question and Contribution</title>
      <p>
        The key to solve these problems is to nd a representation that allows
for the modelling of the emergence of unexpected associates from novel
combinations. Classical representations of words and their meaning are
either based on semantic networks (formal relations) or semantic spaces
(based on co-occurrences). Composition is then expressed as a function of
the neighbours or the dimensions of each word taken individually within
a single space [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] which doesn't allow for the modelling of emergent
associates. QONTEXT will explore the suitability of quantum probability
to model the combined representations as well as its ability to act as a
predictive model.
      </p>
      <p>
        Quantum Theory (QT) has been previously shown to be the only way
of going beyond the classical interpretation of compositionality to solve
the problem of typicality of combinations [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This problem is classically
known as the guppy e ect because a guppy is found not to be a typical
example of pet nor sh, but highly typical of \pet sh". The goal is not
only to model existing combination and examples, but also to derive a
computational model that can be used to interpret novel combinations
enough to generate query expansion terms or accurate translations. To
this end, a quantum probability space has to be de ned. In this paper,
it is shown that a single quantum probability space in one vector space
can be de ned.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Words, Vectors and Probability</title>
      <p>
        In the classical probabilistic model, events (e.g., associates or
combinations) are represented as sets and the probability measure is based on
a set measure, e.g., set cardinality. In contrast, in quantum probability,
events are represented as subspaces and probabilities are generated by
density matrices3, the most simple case being vectors. Suppose that the
vectors jwi, jbi represent an event and a density, respectively.4 The
probability that w occurs given b is provided by the quantum probability rule,
that is, jhwjbij2 = jawbj2 where awb is termed amplitude and jhxjxij2 = 1
for every jxi.5. In general, the probability measure is given by the trace
of the product between the density matrix which corresponds to the
probability distribution, and the matrix representing an event. A
quantum probability space is then given by a set of subspaces and a density
matrix. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Combination and Quantum Probability</title>
      <p>Suppose that b = b1b2 is a combination (e.g. pet tree) and v is an associate
(e.g., slave or bonzai ). Using classical probability, these events are sets
and then b b1; b2 (if b occurs, then both b1 and b2 occur but the
viceversa does not hold). Suppose also that b1 and b2 are quite common
words, thus their probability is not negligible. Moreover,</p>
      <p>P (v; b1)</p>
      <p>P (v; b1; b2)</p>
      <p>P (v; b)</p>
      <p>P (v; b2)</p>
      <p>P (v; b1; b2)</p>
      <p>P (v; b)
If v (e.g., bonzai ) rarely occurs with b1 (e.g., pet ) or with b2 (e.g., tree), it
is also rare within b and thus it cannot be an emergent associate, which
cannot be detected by classical probability.</p>
      <p>Using quantum probability, instead,
jbi = aibjbii + aibjbii
jaibj2 + jaibj2 = 1
hbijbii = 0
where the sum of the rst two terms of the right-hand side is the classical
probability P (vjb) and</p>
      <p>I = jaibjjaivjjaibjjaivj cos
(1)
(2)
is the interference term. The conditions for emergence can be studied
using (1) and (2). As 1 I +1, (1) can be lower or higher than
P (vjb) and then can be used for predicting emergent associates. The
estimation of I is thus crucial. In particular, whereas the the jaijj2's are
3 In general, density operators.
4 The Dirac notation is used in QT.
5 hxj is the transpose of jxi.</p>
      <p>empirical probabilities, the estimation of cos is the most di cult step
because it is mainly due to the interpretation of , which is the angle of
the complex number aibaivaibaiv { further investigation are then needed.
The vector formalism allows us to build the quantum probability space.
Suppose that the empirical probabilities jaijj2 are estimated for n
associates w1; : : : ; wn with respect to the bi's, i = 1; 2. If the jaijj2's are
arranged in a 2 n matrix,
(jw1i
jwni) = jbii jbii
ain
ain
Through SVD of the 2 n matrix, the jwji's and the jbii's can be built,
thus obtaining the space and the probability jhb1jb2ij2 which measures
the distance between the constituent words of the combination b.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Concluding Remarks and Future Work</title>
      <p>A single representation of the events is obtained so that the empirical
probabilities can be reproduced by the quantum probability rule. The
modeling results are still quite preliminary, however, we are con dent
that the ongoing research may provide the useful theoretical framework
for detecting emergent associates.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kwok</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Evaluation of an english-chinese cross-lingual retrieval experiment</article-title>
          .
          <source>In: Working Notes of AAAI-97 Spring Symposiums on Cross-Language Text and Speech Retrieval</source>
          . (
          <year>1997</year>
          )
          <volume>110</volume>
          {
          <fpage>114</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Lehrer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Understanding trendy neologisms</article-title>
          .
          <source>Italian Journal of Linguistics</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Schmid</surname>
          </string-name>
          , H.J.:
          <article-title>New words in the mind: Concept-formation and entrenchment of neologisms</article-title>
          .
          <source>Anglia-Zeitschrift Fur Englische Philologie</source>
          <volume>126</volume>
          (
          <year>2008</year>
          )
          <volume>1</volume>
          {
          <fpage>36</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lapata</surname>
          </string-name>
          , M., Mitchell, J.:
          <article-title>Composition in distributional models of semantics</article-title>
          .
          <source>Cognitive Science</source>
          <volume>34</volume>
          (
          <year>2010</year>
          )
          <volume>1388</volume>
          {
          <fpage>1429</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Aerts</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gabora</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Diederik aerts and liane m. gabora, a theory of concepts and their combinations ii: A hilbert space representation</article-title>
          .
          <source>Kibernetes</source>
          <volume>34</volume>
          (
          <year>2004</year>
          )
          <volume>192</volume>
          {
          <fpage>221</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Melucci</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>van Rijsbergen</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>Quantum mechanics and information retrieval: a probabilistic overview</article-title>
          .
          <source>In: Advanced Topics in Information Retrieval</source>
          . Springer (Forthcoming)
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>