<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning-to-Rank Target Types for Entity-Bearing eries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dar´ıo Gariglio i</string-name>
          <email>dario.gariglio i@uis.no</email>
          <email>i@uis.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krisztian Balog</string-name>
          <email>krisztian.balog@uis.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Stavanger</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>is paper revisits the learning-to-rank approach we proposed for automatically identifying the target entity types of queries [6]. A er presenting our contributions and results, we draw on the learned lessons and encountered challenges to identify directions for future enhancements. A signi cant portion of information needs in web search target entities [11]. Entities, such as people, organizations, or locations are natural units for organizing information and for providing direct answers. A characteristic property of entities is that they are typed, where types are typically organized in a hierarchical structure, i.e., a type taxonomy. Previous work has shown that entity retrieval performance can be signi cantly improved when a query is complemented with explicit target type information, see, e.g., [1, 8, 10]. Recently, Gariglio i and Balog[5] have conducted a systematic evaluation of dimensions of type information. Identifying and exploiting target type information falls within the broad area of query understanding, which, according to Cro et al. [4], refers to process of “identifying the underlying intent of the queries, based on a particular representation.” In a realistic Web search scenario, automatically detected target types avoid the possible cognitive e ort of the user to provide type information for her query. Furthermore, they can be used as facets, for ltering the results. Motivated by the above reasons, our main objective is to generate target type annotations of queries automatically. Following the hierarchical target type identi cationtask proposed in [2], we wish to identify the most speci c target types for a query, from a given type taxonomy, such that they are su cient to cover all relevant results. We introduced a relaxation to the task de nition, by allowing for a query to have multiple target types (or none). One main contribution of this work is a test collection we built for the revised hierarchical target type identi cation task. We used the DBpedia ontology (version 2015-10) as our type taxonomy and collect relevance labels via crowdsourcing for the 485 queries in the entity ranking DBpedia-Entity collection [3]. We noted that none of the elements of our approach are speci c to this taxonomy, and our methods could be applied on top of any type taxonomy. As our second main contribution, we developed a learning-torank (LTR) approach with a rich set of features, including termbased, linguistic, and distributional similarity, as well as taxonomic features. We compared our LTR method versus two competitive baselines from the literature. One approach is an entity-centric model [2, 9, 12], which rstly ranks the entities based on their relevance to the query, then look at what types the top ranked entities have. Alternatively, a type-centric model presented in [2] ranks direct term-based representations (pseudo type description documents), built for each type, by aggregating descriptions of entities</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        LEARNER’17, October 1, 2017, Amsterdam, e Netherlands
Copyright ©2017 for this paper by its authors. Copying permi ed for private and
academic purposes.
of that type. ese baseline models also t, respectively, the late
fusion and early fusion design pa erns for object retrieval1[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Our experiments were performed using Nordlys [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a toolkit for
entity-oriented and semantic search. We employed the Random
Forest algorithm for regression as the supervised ranking method.
We found that our supervised learning approach signi cantly and
substantially outperforms all baseline methods. Also, an analysis
of the discriminative power of our features was performed, by
sorting them according to their information gain. is showed the
e ectiveness of textual similarity features, enriched with
distributional semantic representations, measured between the query and
the type label. Furthermore, we observed the robustness of our
proposed method, as it succeeds in automatically detecting target
types for a wide variety of queries.
      </p>
      <p>In the forward-looking spirit of the workshop, we identify three
main themes for future development. Some of these are closely tied
to the target type identi cation task, while others concern
learningbased approaches in general. A rst challenge is the generalization
of the proposed approach to other type systems. So far, the
suitability of our learning-to-rank target type detector was demonstrated
only using training data manually labeled with target types from
the DBpedia Ontology. It remains an open question whether these
particular features are applicable to other type systems. Prior to
that, an issue to be addressed is how to ease the acquisition of type
labels for training. While manual annotation can be performed via
crowdsourcing, with very large type systems it becomes practically
unfeasible (selecting a single/handful of target type(s) from several
hundred options would create cognitive overload). One possible
strategy that we can imagine would be an instance of knowledge
transfer from the available test collection of DBpedia target types.</p>
      <p>As part of a more general line of discussion, another item to be
mentioned has to do with how training data is obtained. It is still
to be answered whether more training data helps. And, if that is
the case, the acquisition of high-quality labeled data becomes a
bottleneck. Automatically obtaining target labels by weak supervision,
thereby avoiding human annotation e ort, may be a plausible way
to alleviate this challenge.</p>
      <p>Finally, we draw our a ention to more recent developments with
an increasing impact in many problem domains within information
retrieval. Speci cally, using features obtained by representation
learning via deep arti cial neural networks. Our approach exploits
semantic similarities purely based on pre-trained word embeddings.
What we do here, using distinguished neural features in a LTR
approach, has become a noticeable trend in recent years. e question
that arises is whether it is possible to embrace an alternative, fully
neural learning approach. We emphasize that this question applies
beyond our speci c task of target types identi cation. Indeed, it
likely extends to a whole range of tasks where the current state
of the art is constituted by a learning-to-rank approach, using a
manually engineered set of features.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Krisztian</given-names>
            <surname>Balog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Marc</given-names>
            <surname>Bron</surname>
          </string-name>
          , and Maarten De Rijke.
          <year>2011</year>
          .
          <article-title>ery Modeling for Entity Search Based on Terms, Categories, and Examples</article-title>
          .
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>29</volume>
          ,
          <issue>4</issue>
          (
          <year>2011</year>
          ),
          <volume>22</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          :
          <fpage>31</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Krisztian</given-names>
            <surname>Balog</surname>
          </string-name>
          and
          <string-name>
            <given-names>Robert</given-names>
            <surname>Neumayer</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Hierarchical Target Type Identi cation for Entity-oriented eries</article-title>
          .
          <source>In Proc. of CIKM</source>
          .
          <volume>2391</volume>
          -
          <fpage>2394</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Krisztian</given-names>
            <surname>Balog</surname>
          </string-name>
          and
          <string-name>
            <given-names>Robert</given-names>
            <surname>Neumayer</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A Test Collection for Entity Search in DBpedia</article-title>
          .
          <source>In Proc. of SIGIR</source>
          .
          <volume>737</volume>
          -
          <fpage>740</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W</given-names>
            <surname>Bruce Cro</surname>
          </string-name>
          , Michael Bendersky,
          <string-name>
            <given-names>Hang</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Gu</given-names>
            <surname>Xu</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>ery Representation and Understanding Workshop</article-title>
          . In SIGIR Forum.
          <volume>48</volume>
          -
          <fpage>53</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[5] Dar´ıo Gariglio i and Krisztian Balog</source>
          .
          <year>2017</year>
          .
          <article-title>On Type-Aware Entity Retrieval</article-title>
          .
          <source>In Proc. of ICTIR</source>
          .
          <volume>27</volume>
          -
          <fpage>34</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Dar</surname>
          </string-name>
          <article-title>´ıo Gariglio i, Faegheh Hasibi</article-title>
          , and
          <string-name>
            <given-names>Krisztian</given-names>
            <surname>Balog</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Target Type Identication for Entity-Bearing eries</article-title>
          .
          <source>In Proc. of SIGIR</source>
          .
          <volume>845</volume>
          -
          <fpage>848</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Faegheh</given-names>
            <surname>Hasibi</surname>
          </string-name>
          , Krisztian Balog,
          <source>Dar´ıo Gariglio i, and Shuo Zhang</source>
          .
          <year>2017</year>
          .
          <article-title>Nordlys: A Toolkit for Entity-Oriented and Semantic Search</article-title>
          .
          <source>In Proc. of SIGIR</source>
          .
          <volume>1289</volume>
          -
          <fpage>1292</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Rianne</given-names>
            <surname>Kaptein</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jaap</given-names>
            <surname>Kamps</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Exploiting the Category Structure of Wikipedia for Entity Ranking</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>194</volume>
          (
          <year>2013</year>
          ),
          <fpage>111</fpage>
          -
          <lpage>129</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Rianne</given-names>
            <surname>Kaptein</surname>
          </string-name>
          , Pavel Serdyukov,
          <string-name>
            <surname>Arjen P. De Vries</surname>
            , and
            <given-names>Jaap</given-names>
          </string-name>
          <string-name>
            <surname>Kamps</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Entity Ranking Using Wikipedia as a Pivot</article-title>
          .
          <source>In Proc. of CIKM</source>
          .
          <volume>69</volume>
          -
          <fpage>78</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Jovan</surname>
            <given-names>Pehcevski</given-names>
          </string-name>
          , James A om,
          <string-name>
            <surname>Anne-Marie Vercoustre</surname>
            , and
            <given-names>Vladimir</given-names>
          </string-name>
          <string-name>
            <surname>Naumovski</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Entity Ranking in Wikipedia: Utilising Categories, Links and Topic Di culty Prediction</article-title>
          .
          <source>Information Retrieval</source>
          <volume>13</volume>
          ,
          <issue>5</issue>
          (
          <year>2010</year>
          ),
          <fpage>568</fpage>
          -
          <lpage>600</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Je rey Pound</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter Mika</surname>
            , and
            <given-names>Hugo</given-names>
          </string-name>
          <string-name>
            <surname>Zaragoza</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Ad-hoc Object Retrieval in the Web of Data</article-title>
          .
          <source>In Proc. of WWW</source>
          .
          <volume>771</volume>
          -
          <fpage>780</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>David</given-names>
            <surname>Vallet</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hugo</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Inferring the Most Important Types of a ery: a Semantic Approach</article-title>
          .
          <source>In Proc. of SIGIR</source>
          .
          <volume>857</volume>
          -
          <fpage>858</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Shuo</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Krisztian</given-names>
            <surname>Balog</surname>
          </string-name>
          .
          <article-title>Design Pa erns for Fusion-Based Object Retrieval</article-title>
          .
          <source>In Proc. of ECIR</source>
          .
          <volume>684</volume>
          -
          <fpage>690</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>