<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Query Suggestion by Concept Instantiation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jack Wei Sun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Franky</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kenny Q. Zhu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haixun Wang</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADAPT-Lab, Shanghai Jiao Tong University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>A class of search queries which contain abstract concepts are studied in this paper. These queries cannot be correctly interpreted by traditional keyword-based search engines. This paper presents a simple framework that detects and instantiates the abstract concepts by their concrete entities or meanings to produce alternate queries that yield better search results. 3</p>
      </abstract>
      <kwd-group>
        <kwd>Query Suggestion</kwd>
        <kwd>Concept-based Queries</kwd>
        <kwd>Search Logs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The quality of search results largely depend on the specificity of the keywords
they put in search engine, for modern search engine are mostly keyword-based.
When user queries do not general good results, search engine may suggest a list
of alternate queries for the user to select and re-submit.</p>
      <p>
        Traditionally, the function of query suggestion is implemented by
keywordbased methods and partially depends on the information from search log [
        <xref ref-type="bibr" rid="ref1 ref10 ref4 ref5">1,
4, 5, 10</xref>
        ], which includes click-through data, session data, etc. With the rapid
development of Internet, researchers are increasingly interested in semantic view
of the web and do a lot of work on semantic relation extraction from search log
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and concept search [
        <xref ref-type="bibr" rid="ref6 ref8">6, 8</xref>
        ]. There are also other attempts on sematic query
suggestion without search log [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>This paper focuses on a class of difficult queries which include abstract
concepts, e.g., “hurricane in US state”. The likely intention of this query is to look
for a specific instance of a hurricane that happened to a US state, e.g., Katrina
in Louisiana. Here both hurricane and state are used as abstract concepts which
contain many specific entities. The reason for such an abstract query may be
because the user has forgotten the name of the hurricane or the state.</p>
      <p>Search engines today return pages about these abstract concepts, and that
usually means a list of huricanes in the US history for the above example (see
Figure 1). It takes the user at least a few more clicks and scanning through
the pages to find the information needed. Our objectives is to return a list of
suggested queries such as: Katrina in Lousiana, Sandy in Connecticut, Dolly in
Texas.
3 Kenny Q. Zhu (corresponding author) is partially supported by NSFC grants
61100050 and 61033002.</p>
      <p>
        To this end, our main approach utilizes a comprehensive, probabilistic
taxonomy and a search log with access statistics (click-through rate, etc.). The
probabilistic taxonomy, called Probase[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], provides large number of concepts and their
instances in is-a relations (e.g. katrina isa hurricane) as well as the typicality of
an instance belonging to a concept. For example, a robin is a typical bird, but
an ostrich is not. Given a user query, we use this taxonomy to detect high level
concepts in it and translate them into likely instances to produce a new query,
and then return the queries from the query log which are closest to the newly
transformed query. Next, we present the prototype system in some detail and
some preliminary experimental results.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Prototype System</title>
      <p>
        Our system is divided into two parts, an offline part which creates an index on
all historical search queries from the search log and an online runtime system
which retrieves a number of relevant queries to an input concept-based query
and ranks them according to a scoring function. The architecture of the system
is shown in Figure 2. Next, we briefly discuss the two key components in the
architecture.
First, we parse each historical query in the search log and identify all noun-phrase
terms which exist as entities in Probase. Entities are those terms which appear
as the instance in at least one is-a pair in the taxonomy, e.g., katrina, louisina,
etc. Then, we conceptualize these instances into their most likely concepts by
considering the neighboring instances in the same query, using the technique
by Song et al.[
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. For example, query “katrina victims from texas” maybe
conceptualized into “[hurricane] victims from [state]”. Finally, we build an index
of the search log using the concepts as keys.
2.2
      </p>
      <p>Rank Candidates
Given a query, our system first parses the query and recognizes the concepts
in it and then fetch a number of historical queries which are indexed by these
concepts as candidate suggestions. Ranking the candidate is the main challenge
in this work. We apply a hybrid method to calculate the ranking score. This score
takes three factors into consideration, i.e. semantic similarity, context similarity
and quality of suggestion query.</p>
      <p>The context similarity, CScore, is measured by the edit distance between
the input query and the candidate in terms of the number of insertion, deletion
or replacement of words. Matched instances in the candidate and the matched
concepts in the input queries which represent semantic information are removed
before calculating the edit distance.</p>
      <p>The semantic similarity between a candidate query and an input query is
the combined distance between the instances in the candidate and the
corresponding concept in the input. The distance between an instance and a concept
is measured by the typicality score between these two terms in Probase.
Because typicality score is a value between 0 and 1 and can be dense in some range
(around 0.01 in practice), we take logistic function on the typical value to expand
the range into something more distinguishable. We also use a scalar to make it
linear comparable with CScore according to statistics. That is,</p>
      <p>SScore(t) =
(</p>
      <p>1
1 + e
t + 1)
where t is the typicality value and is the factor to locate the dense range and
is a scalar factor. Both factors are learned from some training examples.</p>
      <p>We also utilize the click-through rate from the search log, in order to measure
the quality or effectiveness of candidate suggestion. In this paper, we use a simple
measure defined as</p>
      <p>Num clicks(q)
QScore(q) =</p>
      <p>Frequency(q)
where q is a candidate suggestion query. Besides the click-through rate, the
quality score can be extended to include other information from search log.</p>
      <p>Currently, the overall score (the lower, the better) is defined as:</p>
      <p>OverallScore = [CScore + SScore] + e QScore
We take 1/10 of a 6-month worth of Bing search log as our experimental data
source. To evaluate the system, we arbitrarily select 10 concept-based queries
related to the events happening during the 6-month time span. All suggested
queries from the system are given to 3 human judges who would grade the
suggestion on the scale of 1-5, 1 being the least helpful and 5 being the most
helpful. The averaged normalized precision score of of these results with different
size of search log and the suggestions of two example queries are shown in Figure
3. The results show that having more search log is helpful but the effect saturates
at some point. Complete data set as well as evaluation results can be found at
http://adapt.seiee.sjtu.edu.cn/~jack/query/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baeza-Yates</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurtado</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendoza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Query recommendation using query logs in search engines</article-title>
          .
          <source>In: EDBT 2004 Workshops</source>
          . pp.
          <volume>588</volume>
          {
          <fpage>596</fpage>
          . Springer (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Baeza-Yates</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tiberi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Extracting semantic relations from query logs</article-title>
          .
          <source>In: Proceedings of ACM SIGKDD</source>
          . pp.
          <volume>76</volume>
          {
          <fpage>85</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bhatia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majumdar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Query suggestions in the absence of query logs</article-title>
          .
          <source>In: SIGIR</source>
          . pp.
          <volume>795</volume>
          {
          <issue>804</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Context-aware query suggestion by mining click-through and session data</article-title>
          .
          <source>In: Proceedings of the 14th ACM SIGKDD</source>
          . pp.
          <volume>875</volume>
          {
          <fpage>883</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dupret</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendoza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Recommending better queries from click-through data</article-title>
          .
          <source>In: String Processing and Information Retrieval</source>
          . pp.
          <volume>41</volume>
          {
          <fpage>44</fpage>
          . Springer (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Giunchiglia</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kharkevich</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaihrayeu</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Concept search: Semantics enabled syntactic search</article-title>
          . Semantic Search p.
          <volume>109</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Short text conceptualization using a probabilistic knowledgebase</article-title>
          .
          <source>In: IJCAI</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>K.Q.</given-names>
          </string-name>
          :
          <article-title>Concept-based web search</article-title>
          .
          <source>In: Conceptual Modeling</source>
          , pp.
          <volume>449</volume>
          {
          <fpage>462</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>K.Q.</given-names>
          </string-name>
          :
          <article-title>Probase: A probabilistic taxonomy for text understanding</article-title>
          .
          <source>In: Proceedings of ACM SIGMOD</source>
          . pp.
          <volume>481</volume>
          {
          <fpage>492</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nasraoui</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Mining search engine query logs for query recommendation</article-title>
          .
          <source>In: Proceedings of the 15th WWW</source>
          . pp.
          <volume>1039</volume>
          {
          <fpage>1040</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>