<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Webis at the CLEF 2017 Dynamic Search Lab</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matthias Hagen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Kiesel</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Milad Alshomary</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benno Stein</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>We briefly describe our approach to the query suggestion task at the CLEF 2017 Dynamic Search Lab. The general research idea of our contribution is to evaluate query suggestions in form of keyqueries for clicked documents. A keyquery for a document set D is a query that returns the documents from D among the top-k ranks. Our query suggestion approach derives keyqueries for pairs of documents previously clicked by the user. The assumption then is that the not-already-clicked documents in the top results of the keyqueries could also be interesting to the user. The keyquery suggestions thus focus on retrieving more documents similar to the ones already clicked. Another reasonable approach might instead focus on suggesting queries covering aspects of a user's information need different to the ones in the already seen documents. However, in our this year's contribution to the Dynamic Search Lab, we explore the utility of the “similar” suggestions that keyqueries probably produce.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Approach
A query q is a keyquery for a document set D iff q returns the documents from D
in its top-k ranks, q has at least l results, and no subquery of q has the previous
two properties [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The parameters k and l describe the specificity and generality
of a keyquery (typically k would be small and l be in the range of 10 or a 100)
while the last property ensures minimality in a set theoretic sense. We have
succesfully applied keyqueries to generate dynamic taxonomies in the context
of digital libraries [
        <xref ref-type="bibr" rid="ref10 ref5">5,10</xref>
        ], to identify similar web pages [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], to support scholarly
search for related work [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and for document clustering + labeling [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The research question addressed in our Dynamic Search Lab contribution
is that of how useful query suggestions in form of keyqueries for clicked
documents are. We conjecture that such queries represent different formulations of
information needs very related to the one that made the user click.</p>
      <p>
        The Dynamic Search Lab data contains 26 topics [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], along with one query
submitted by some user in a search session and the shown results from the whole
session with some indicated as being clicked by the user. Exactly for these clicked
documents, we derive keyqueries as query suggestions. For topic i, we identify
the documents Di that were clicked from the result lists in the session and that
are contained in the ClueWeb12 Part B (the collection behind the Dynamic
Search API) and that have a spam ranking of at least 30 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] (i.e., at least 30% of
the ClueWeb12 are more spammy). Thus, Di consists of documents that can be
assumed to be somewhat relevant to qi, are in the document collection of the
Dynamic Search API, and are not the most spammy.
      </p>
      <p>
        For the documents in every Di we try to derive five keyqueries as query
suggestions. We first extract the main content from the documents in Di by
using all sentences with at least one English stop word from text paragraphs of
at least 400 characters [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The clicked documents are arranged in pairs (first
and second clicked document, third and fourth clicked document, etc.). For each
such pair, the contents are concatenated and the top-10 keyphrases (not longer
than three words) are extracted using the head noun extractor [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We also filter
out keyphrases that are too “specific,” namely returning fewer than 100 results
against the Dynamic Search API. For the extracted keyphrases of the first pair
of clicked documents, we derive a keyquery cover [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] setting k = 50 (i.e., the
two clicked documents are returned in the top-50 results) and l = 100 (i.e., the
query needs to return at least 100 results).
      </p>
      <p>If the keyquery cover of the first pair already contains five keyqueries, we
stop. Otherwise, a keyquery cover for the second pair is computed etc. Using
this approach, we generate a total of 66 query suggestions for 19 topics. For
the remaining seven topics, one does not have any clicks and for six topics no
keyquery cover could be computed for any pair of clicked documents.</p>
      <p>As for the ranked list for a topic, we use the top-10 results of each of the at
most five derived keyqueries returned by the Dynamic Search API and merge
them as follows: first the first ranks of the queries, then the second ranks, etc.;
duplicate results that already are in the merged list are replaced by the next
result from the same keyquery.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Barker</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cornacchia</surname>
          </string-name>
          , N.:
          <article-title>Using noun phrase heads to extract document keyphrases</article-title>
          .
          <source>AI</source>
          <year>2000</year>
          , pp.
          <fpage>40</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cormack</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smucker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clarke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Efficient and effective spam filtering and re-ranking for large web datasets</article-title>
          .
          <source>Information Retrieval</source>
          <volume>14</volume>
          (
          <issue>5</issue>
          ),
          <fpage>441</fpage>
          -
          <lpage>465</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Busse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Keyqueries for clustering and labeling</article-title>
          .
          <source>AIRS</source>
          <year>2016</year>
          , pp.
          <fpage>42</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>From keywords to keyqueries: Content descriptors for the web</article-title>
          .
          <source>SIGIR</source>
          <year>2013</year>
          , pp.
          <fpage>981</fpage>
          -
          <lpage>984</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Völske</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Dynamic taxonomy composition via keyqueries</article-title>
          .
          <source>JCDL</source>
          <year>2014</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beyer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Komlossy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Supporting scholarly search with keyqueries</article-title>
          .
          <source>ECIR</source>
          <year>2016</year>
          , pp.
          <fpage>507</fpage>
          -
          <lpage>520</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Supporting more-like-this information needs: Finding similar web content in different scenarios</article-title>
          .
          <source>CLEF</source>
          <year>2014</year>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kanoulas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azzopardi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Overview of the CLEF Dynamic Search Evaluation Lab 2017</article-title>
          .
          <source>CLEF</source>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lucks</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A large-scale analysis of the mnemonic password advice</article-title>
          .
          <source>NDSS</source>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Völske</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A keyquery-based classification system for CORE</article-title>
          .
          <source>WOSP</source>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>