<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Probabilistic Field Mapping for Product Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aman Berhane Ghirmatsion</string-name>
          <email>ab.ghirmatsion@stud.uis.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krisztian Balog</string-name>
          <email>krisztian.balog@uis.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Stavanger</institution>
          ,
          <addr-line>Stavanger</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes our participation in the product search task of the CLEF 2015 LL4IR Lab. Working within a generative language modeling framework, we represent products as semi-structured documents. Our focus is on establishing a probabilistic mapping from query terms to document fields. We present and experimentally compare three alternatives. Our results show that term-specific mapping is beneficial. We also find evidence suggesting that estimating field mapping priors based on historical clicks outperforms the setting where the priors are uniformly distributed.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Online shopping has become a common practice and one of the most popular activities
people perform on the Internet. Its success can be credited to ease and convenience,
large selection of products, 24/7-availability, low pricing, just to mention some of the
benefits. Just with any other website or online service, effective information retrieval
systems are indispensable for e-commerce sites. Webshops need to offer search
functionality that makes their available products easy to find. Providing customers with a
positive shopping experience can increase sales, and is thereby essential to success of
the business. Users are accustomed to the “single search box” paradigm and expect the
search engine to understand their information need. In this paper we address the task of
ad-hoc product search, defined as follows: given a keyword query, return a ranked list
of products from a product catalog that are relevant to the query.</p>
      <p>
        Products are described by a number of attributes, such as name, brand, description,
categories, and so on. These attributes have a specific semantic meaning (one that can
even be known a priori), albeit the amount of text associated with each is typically
rather small. Recognizing for each query term which of the product attributes (if any) is
targeted, therefore, is believed to lead to improved retrieval performance. Our objective
in this work is to develop, implement, and evaluate methods that establish a mapping
from individual query terms to specific product fields. We represent products as
semistructured documents (following the predominant approach to entity retrieval [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) and
employ the Probabilistic Retrieval Model for Semistructured Data (PRMS) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a model
that is known to perform well under conditions where the collection is homogeneous
and fields have distinctive term distributions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Our setting is exactly like that. Given
the inherent uncertainty, the term-field mapping is represented as probability
distribution over fields. We decompose the estimation into term-field probability and prior field
probability components and propose three specific instantiations of the PRMS model.
We evaluate our approaches and report results using the LL4IR platform.
We address the task of ranking products (from a product catalog) in response to a (short)
keyword query. In this section we resort to a brief description of what and how we used
from the living labs API in our participation. For a detailed description of the task and
setup, we refer to the LL4IR Lab overview paper [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>For each query, the living labs platform makes a set of candidate products
available via the doclist API endpoint. For each of these products, the details can be
requested via the doc API endpoint, which provides a product description in the form
of key-value pairs. We built a single Lucene index from all unique products made
available to us (i.e., products from all doclists) with the fields shown in Table 1. Note that
contents is a catch-all field that does not come from the API; it is the concatenation
of all field content associated with the product, created at indexing time. In addition,
we also used the historical API endpoint to obtain aggregated click-through rate
(CTR) for the training queries.</p>
      <p>We produce rankings offline using the retrieval framework and field-mapping
methods described in Sections 3 and 4. Once uploaded to the API, these are interleaved with
and compared against the site’s production ranking system. The numbers reported in
Section 5 are obtained via the outcome API endpoint.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Retrieval Framework</title>
      <p>
        We base our approach on a generative language modeling framework. Language
models provide a transparent and effective means for incorporating structural cues. They are
especially appropriate under circumstances where training data is scarce. Our goal with
this section is merely to present a brief introduction to the general framework and
notation we use, thereby making the paper self-contained; for a more detailed description
of these models we refer the reader to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
3.1
      </p>
      <sec id="sec-2-1">
        <title>Standard Language Modeling Approach</title>
        <p>The standard language modeling approach works as follows. Given a query q and a
document d, the relevance of the document with regard to the query can be expressed
as the conditional probability P (djq). Using Bayes theorem, P (djq) is rewritten as
shown in Eq. 1 below. Since the P (q) is identical for all candidate documents, it can
be safely discarded. Thus, the posterior probability P (djq) is given by the product of
query generation probability P (qjd) and document prior probability P (d):
P (djq) =</p>
        <p>P (qjd)P (d)</p>
        <p>P (q)</p>
        <p>P (qjd)P (d):
Assuming uniform document priors, the ranking of documents is then proportional to
P (qjd). The simplest solution to estimating this probability is by assuming a
bag-ofwords document representation. We let d be an unigram document language model
where P (tj d) expresses the probability of term t given the document. Ranking
documents is done according to the probability that a query q is observed during repeated
random sampling from the model of document d (where n(t; q) is the number of times
t occurs in q):</p>
        <p>P (qj d) =</p>
        <p>Y P (tj d)n(t;q):
t2q
(1)
(2)
(3)
(4)
(5)
To estimate P (tj d) we use a linear combination of maximum-likelihood document and
collection language models (i.e., employ Jelinek-Mercer smoothing):</p>
        <p>P (tj d) = X</p>
        <p>f P (tj df );
f2F
3.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Mixture of Language Models</title>
        <p>
          The standard language modeling approach treats the document as “flat text,” and is not
able to make use of various document fields. An extension called Mixture of Language
Models (MLM) is proposed by Ogilvie and Callan [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], where a separate language model
is calculated for each document field, and these field language models are then
combined into a single document-level representation. Under this approach the document
language model is taken to be:
        </p>
        <p>P (tj d) = (1
)PML(tjd) +</p>
        <p>PML(tjC);
where P (tjd) and P (tjC) are relative frequencies of term t in the document and in the
collection, respectively, and is the smoothing parameter.
where f is a field from the set of available fields F , f is the relative importance of
the field such that Pf2F f = 1, and df is a field-specific language model. The field
language model can be estimated the same way as the document language model, except
that term occurrences are considered only within the given field (this is indicated by the
f subscript):</p>
        <p>P (tj df ) = (1</p>
        <p>
          f )PML(tjdf ) + f PML(tjCf ):
Ogilvie and Callan [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] suggest to set the field weights proportional to the length of the
fields or to their individual retrieval performance. We also use the latter approach in one
of our methods. We set the smoothing parameter to be the same for all fields: f = 0:1.
Instead of using a fixed field weight that is the same for all query terms, Kim et al.
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] propose the use of mapping probabilities. In their approach, called Probabilistic
Retrieval Model for Semistructured Data (PRMS), f in Eq. 4 is replaced with P (f jt):
the probability of term t being mapped to field f . The estimation of the document
language model then becomes:
        </p>
        <p>P (tj d) = X P (f jt)P (tj df ):</p>
        <p>f2F
For estimating the mapping probabilities we again make use of Bayes’ theorem:
P (f jt) =</p>
        <p>P (tjf )P (f )</p>
        <p>P (t)</p>
        <p>P (tjf )P (f )
= Pf02F P (tjf 0)P (f 0)
;
where P (tjf ) can be estimated conveniently by PML(tjCf ), and P (f ) is a prior
mapping probability. We shall present multiple alternatives for setting the components of
Eq. 7, P (tjf ) and P (f ), in the next section.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Estimating Mapping Probabilites</title>
      <p>We present three specific instantiations of the PRMS model, shown in Table 2, that
only differ in the estimation of the mapping probability, given in Eq. 7. This formula
has two components that need to be defined: the term-field probability P (tjf ) and the
field prior P (f ). Let us point out that MLM can be seen as a special case of PRMS,
where P (tjf ) = 1=jF j and P (f ) = f .</p>
      <p>
        The probability of a term occurring in a given field, P (tjf ), can be taken uniform
(i.e., 1=jF j) or calculated using the term counts in field f across the whole collection,
PMLP (tjCf ), as it is done in the original PRMS approach [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>For the field prior we consider a uniform and a non-uniform setting. In case of
the latter, we capitalize on the fact the we have access to training data that can be
used for evaluating the retrieval performance of each field individually. Specifically, we
rank training queries based on P (qj df ) and measure retrieval performance in terms of
NDCG, averaged over all training queries (denoted as NDCGf ). The gain of a
document is set to historical CTR (as provided by the historical feedback API endpoint).
P (f ) is then set proportional to NDCG as follows:</p>
      <p>P (f ) = Pf02F NDCGf0</p>
      <p>:
NDCGf
(6)
(7)
(8)
We report on two sets of experiments: traditional, test collection based offline results on
the training queries and online results collected via the living labs platform on the test
queries.
We use historical CTR as ground truth. For binary relevance (MAP and MRR) we
consider each product with at least 0:001 CTR as relevant; for graded relevance (NDCG)
we use CTR as the gain value. We note that for Methods 1 and 3 we train and test on
the same set of queries. The numbers reported here are only meant to show how well
the models can be fitted to the data.</p>
      <p>Table 4 presents the results. We find that all three methods perform very similarly for
all three metrics. While they achieve virtually perfect MRR, i.e., the first ranked results
is always a relevant one, there is room for improvement in terms of MAP and NDCG.
Our CTR-based ground truth is likely to be biased based on the site’s existing ranking.
Therefore, a comparison against the production system would reveal more about the
differences between our methods; this is exactly what follows next.
We submitted three rankings corresponding the Methods 1–3, as shown in Table 5. The
main evaluation metric is outcome, which is defined as:
outcome =</p>
      <p>#wins
#wins + #losses
;
(9)
where the number of wins and losses are measured against the site’s production ranking
system. Additionally, we report on the total number of impressions and the ratio of
wins, losses, and ties.</p>
      <p>
        Before discussing our observations, it is important to mention that the way
interleaving with the production system is performed has changed from Round #1 to Round
#2, to deal correctly with unavailable products (as explained in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). It is therefore
believed that the Round #2 results are more reliable. Nevertheless, there are some open
questions, such as how many impressions are needed before one can draw firm
conclusions, and whether the results obtained in Round #2 are an accurate reflection of the
performance of our methods.
      </p>
      <p>It is immediately apparent that the Round #2 numbers are higher, due to the
aforementioned change to interleaving. Outcomes for Round #1 are above the (corrected)
expected outcome of 0:28, while for Round #2 they are below the expected outcome
of 0:5. For neither round were we able to outperform the organizers’ baseline (with
an outcome of 0:4691 and 0:5284 for Rounds #1 and #2, respectively), which clearly
shows that there is considerable room for improvement.</p>
      <p>Concerning the comparison of Methods 1–3, we make the following observations.
All methods received about the same number of impressions, the relative difference
between them is within 10%. In over 70% of the cases there is a tie between the
experimental and production rankings; this is the same for all three methods. It is clear that
Method 1 is the worst performing out of the three; it is expected, as this method is
essentially MLM, which performs the field mapping independent of the query terms. As
for Method 2 vs. 3, the results are mixed. In Round #1 Method 2 performed slightly
better (+4% over Method 3), while in Round #2 Method 3 came first (+9% over Method
2). We further observe that Method 3 has the most ties with the production system and
it is also the one with the least number of losses against it. Based on these results we
can safely conclude that term-specific mapping is beneficial (Method 1 vs. Methods 2
and 3). There is also evidence suggesting that non-uniform field priors are preferred and
that historical CTR offers a simple and intuitive way of setting them (Method 2 vs. 3).
In our analysis section we ask the following question: How different are the mappings
created by the different methods? We answer this both qualitatively and quantitatively.
For the former, Table 6 shows the estimated mapping probabilities for a number of
training queries; we can see that these are indeed different. As for the latter, Table 7 presents
the Kendall’s rank correlation coefficient ( ); the numbers are averages computed over
all queries (both training and test). We observe that the correlation between all three
methods is high and that Methods 2 and 3 are more similar to each other than they are
to Method 1. The level of similarity ( = 0:95) between Methods 2 and 3 explains why
it is so difficult to decide a winner. Our findings concerning Method 2 vs. 3 (uniform
vs. non-uniform field priors) should therefore be taken with a grain of salt.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>We have described our participation in the product search task of the CLEF 2015 LL4IR
Lab. Our focus has been on various field mapping approaches in a language
modeling framework, which we compared experimentally. We have shown that using
termspecific mapping has a positive effect on retrieval performance. We have also presented
evidence suggesting that estimating field mapping priors based on historical clicks
outperforms the setting where the priors are uniformly distributed. While we were able to
observe interesting differences between our methods, further work is needed to be able
to outperform the site’s production ranking system.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          .
          <article-title>Semistructured data search</article-title>
          . In N. Ferro, editor,
          <source>Bridging Between Information Retrieval and Databases</source>
          , volume
          <volume>8173</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>74</fpage>
          -
          <lpage>96</lpage>
          . Springer Berlin Heidelberg,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xue</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>A probabilistic retrieval model for semistructured data</article-title>
          .
          <source>In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR '09</source>
          , pages
          <fpage>228</fpage>
          -
          <lpage>239</lpage>
          . Springer-Verlag,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ogilvie</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          .
          <article-title>Combining document representations for known-item search</article-title>
          .
          <source>In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval</source>
          , pages
          <fpage>143</fpage>
          -
          <lpage>150</lpage>
          . ACM,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          .
          <article-title>Extended overview of the living labs for information retrieval evaluation (LL4IR) CLEF lab 2015</article-title>
          .
          <source>In CLEF 2015 Labs and Workshops</source>
          ,
          <string-name>
            <given-names>Notebook</given-names>
            <surname>Papers</surname>
          </string-name>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          .
          <article-title>Overview of the living labs for information retrieval evaluation (LL4IR) CLEF Lab 2015</article-title>
          . In Sixth International Conference of the CLEF Association, CLEF'
          <volume>15</volume>
          , volume
          <volume>9283</volume>
          of Lecture Notes in Computer Science (LNCS). Springer Berlin Heidelberg,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>