<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Web Information Retrieval for Complex Not-Informational Intents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Debora Donato</string-name>
          <email>debora@yahoo-inc.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Yahoo! Labs</institution>
          ,
          <addr-line>Sunnyvale, CA</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The World Wide Web has been showing an incredible capacity of renewing itself not only by adapting in order to ful ll mutable users' needs but also fomenting new types of exigencies and requirements. For such a reason classical web information retrieval models developed around the concept of query seem not anymore adequate to satisfy complex and transactional needs for which the nal goal is to accomplish a task rather than to nd information. Transactional need satisfaction is not reached by showing the user with a list of documents but by reducing the total time from the moment the user issues the query to the moment the transaction is accomplished. Better support for complex queries can be obtained by a careful User Intent Analysis. In the rest of this paper, we present the reader with some of the most promising lines of research that are currently try to investigate intents and goals by focusing on all the activity related to intent satisfaction rather than on a single query.</p>
      </abstract>
      <kwd-group>
        <kwd>Web Information Retrieval</kwd>
        <kwd>Search Engines</kwd>
        <kwd>Users' intents</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Since their rst appearance, web search engines have developed their retrieving
models considering as input a single query and as output a list of pointers to
documents relevant for the query. Such retrieval paradigm was motivated by
the informational nature of the original Web, de ned as the collection of the
hyperlinked documents accessible through Internet.</p>
      <p>In the last decade, we have observed a progressive shift of a number of
human activities from the real life to the online world. Web sites are virtual places
where people socialize, chat, play and perform a wide range of activities like
bank transactions, shopping, event/travel booking and even voting. Even if
informational queries are still the most frequent, transactional intents have more
and more often motivated the queries issued to search engines.</p>
      <p>As the results of the process described above, users intents have become more
complex with the side e ect that a single query is hardly able to capture and
express all the possible facets of user needs. As an example let's consider the
sequence of actions performed by a user who wants to buy an apartment. Since the
complexity of the task, the user is likely to submit a set of semantically related
queries over a long temporal window; she will click on a high number of links
in the results set of each submitted query with the aim of comparing di erent
o ers, browsing photos, searching for public services (like bus and schools) in
the neighborhood of each of the apartments that have captured her interest. In
this scenario a single query can not express all the di erent but related aspects
behind the intent of \buying an apartment".</p>
      <p>In this scenario, it is becoming urgent to study all the activities related to
user satisfaction in order to modeling user behavior and understanding which
\patterns" are more likely to lead users to success.
2</p>
    </sec>
    <sec id="sec-2">
      <title>User Intent de nition</title>
      <p>
        User intents modeling has been a topic of interest for the last few years, but to
the best of our knowledge, there is no work that tries to formalize the de nition
of intent. Most previous work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] presents automatic
methods to classify query intents as informational, navigational, or transactional.
According to the taxonomy introduced by Broder [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] a query is considered i)
informational if the need behind the query is to nd the document(s) that contains
the desired information; ii) navigational if the intent is to nd a particular web
site; iii) transactional if the intent is to perform some Web-mediated activity.
      </p>
      <p>As a matter of fact, partitioning the set of all queries in these three broad
categories does not o er any deep insight that can be leveraged in order to better
support users in their search activity.</p>
      <p>From a qualitative point of view an intent I is comprised by:
- the object(s) O of intent;
- the verb V, i.e. the action that the user want to perform on the object;
- a set of parameters P or inputs for the action.</p>
      <p>As an example let consider the transactional query ticket from Rome to Milan.
In this case O = ticket, V = booking/buying/purchasing, P = Rome, Milano.</p>
      <p>As for the query we just consider, the action is often implicit. A
particular case is o ered by informational queries where the implicit action is always
find/read. Intents like purchasing an house result in a set of related queries
and hence in a set of objects and verbs.</p>
      <p>Search engine users submit queries to address information needs. The
expression physical session is used to address all the activity of a user interacting with
a search engine within an inactivity interval (often set to 30 minutes). Within
a single physical sessions users perform many tasks. A task or information need
results then in subsequence of queries, called logical sessions.</p>
      <p>
        Jones and Klinkner [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] break tasks into two groups: (1) goals, which consist
of atomic information needs, and (2) missions, which consist of one or more
goals. A typical mission is the activity needed for planning a trip, where single
goals are \booking ight tickets", \booking hotels", \compiling a list of points
of interest". In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the authors introduce a method of automatically segmenting
both goals and missions that also allows for interleaved tasks, which they found
to occur in 17% of tasks. Boldi et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] describe the creation of query- ow graphs
from query logs and show how they can be used to automatically identify chains
of queries forming search tasks. Automatically detecting the set of queries that
belong to the same task is a fundamental step for improving query suggestions or
for a better choice of bidding terms for advertising. Radlinski and Joachims [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
consider tasks|or query chains|to aid a document ranking function.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Inferring User Intents</title>
      <p>
        It is generally believed that inferring users' intents is di cult due to the fact
that users do not express themselves clearly in the form of queries. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] against
the general belief, the authors argue that users are capable of articulating their
intents by queries. This claim was indeed con rmed by a preliminary study that
reveals that in more than 78% of the cases users queries were demonstrative of
their intents. The real intent of the user was inferred by the set of all the activities
and interactions related to intent satisfaction. The authors propose a principled
way to study the problem in the context of user goals [
        <xref ref-type="bibr" rid="ref2 ref8">8, 2</xref>
        ]. The terms goal
and intent might be interchangeably used with the understanding that goals,
comprised by a single query or multiple queries, are representative of atomic
needs. The authors solve two di erent, though related, problems: understanding
if the user was able to articulate her search goal by a query and identifying
the query expressive of that intent. The two problems were formulated by a
combination of behavioral, contextual and lexical features. The proposed models
achieve 69% AUC on categorizing the multi-query goals and 62% AUC on
singlequery goals. Furthermore, the task of identifying the query that evinces the intent
has a performance score of 81% AUC. These are very promising results given
the highly challenging nature of the problem.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Supporting Complex Intents</title>
      <p>
        As already stressed, users sometimes cannot see their needs immediately
answered by search results, simply because these needs are too complex and involve
multiple aspects that are not covered by a single web page and hence can not
be expressed by a single query. Topics in domains such as education, travel or
health, often require users browsing many di erent pages in order to accomplish
the task they have in mind. Donato et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] refer to this type of complex
activities as \research missions". Research missions account for 10% of users' sessions
and more than 25% of all query volume, as veri ed by a manual analysis that
was conducted by Yahoo! editors. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] it was demonstrated that such missions
can be automatically identi ed on-the- y, as the user interacts with the search
engine, through careful runtime analysis of query ows and query sessions. The
on-the- y automatic identi cation of research missions has been implemented in
Search Pad, a Yahoo! application meant to help users keeping trace of results
they have consulted. Its novelty however is that unlike previous notes taking
products, it is automatically triggered only when the system decides, with a fair
level of con dence, that the user is undertaking a research mission and thus is in
the right context for gathering notes. The analysis presented in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is one of the
rst example of session-awareness methodology in which user intent modeling is
conducted by changing the level of granularity of the analysis, from an isolated
query to a list of queries pertaining to the same research missions so as to better
re ect a certain type of information needs.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Supporting Transactional Intents</title>
      <p>
        Transactional queries are characterized by distinctive elements that di erentiate
them from navigational and informational ones. In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], these distinctive elements
were analyzed and used to develop a template-based methodology with the
objective of directly supporting transactional queries and speed up tasks
accomplishment. Such a methodology matches n grams of lemmatized query terms
against hierarchical dictionaries like WordNet and Wikipedia. Matched n grams
are hence substituted with the categories in order to generate a set of candidate
\templates". The authors propose a probabilistic model to estimate the
likelihood of each template to be generated by transactional queries and select the
most likely ones to represent that transactional intent. Such a methodology can
be seen as a rst step in the attempt to change the current \informational"
business model of web search engines. The main objective is to understand from
the template the category to which the task belongs and to use the template to
extract the information necessary to nalize the transaction. The query tickets
from NY to LA clearly belongs to the travel booking category. All the queries
that match the pattern tickets from &lt;city&gt; to &lt;city&gt; can be safely add to
the same category. Such a pattern is responsible of deciding which application
must be triggered for the booking process but, in order to nalize the
transaction, the application needs to know two auxiliary inputs i.e. the origin (from
&lt;city&gt;) and destination (to &lt;city&gt;). A comprehensive experimental study was
conducted over eight di erent categories with a clear transactional intent varying
from ticket booking and restaurant reservation to software or music download.
The patterns were evaluated against a sample of queries randomly obtained from
eight months of data extracted from Yahoo! query-logs. The results demonstrate
that the methodology detects the transactional queries automatically and
assigns them to the correct transactional category with a precision ranging from
0:7 to 0:98 depending on the category of interest.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this short paper we presented some of the new lines of research conducted
by the User Intent Analysis Group at Yahoo! Labs who has focused on
understanding and modeling user intents. The common denominator for the most of
the described models is a session-awareness methodology that has been changing
the level of granularity of intent modeling, from an isolated query to a list of
queries pertaining to the same missions. This methodology is general and it is
our strong belief that it is likely to play, in the near future, a fundamental role
in many on-line tasks like detection of mission similarity or prediction of goal
success and o -line task like partitioning users activity in topics or user behavior
pro ling.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Aashkan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Donmez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Donato</surname>
          </string-name>
          .
          <article-title>Automatic rule extraction to identify transactional queries</article-title>
          .
          <source>Submitted for publication</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Boldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Castillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Donato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gionis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Vigna</surname>
          </string-name>
          .
          <article-title>The queryow graph: model and applications</article-title>
          .
          <source>In CIKM'08: Proceedings of the Information and Knowledge Management Conference</source>
          , pages
          <volume>609</volume>
          {
          <fpage>618</fpage>
          ,
          <year>October 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Broder</surname>
          </string-name>
          .
          <article-title>A taxonomy of web search</article-title>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>36</volume>
          (
          <issue>2</issue>
          ):3{
          <fpage>10</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Donato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Maarek</surname>
          </string-name>
          .
          <article-title>Do you want to take notes? Identifying research missions in Yahoo! search pad</article-title>
          .
          <source>In WWW '10: Proceedings of the 19th International Conference on World Wide Web</source>
          , pages
          <volume>321</volume>
          {
          <fpage>330</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Donato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Donmez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dumoulin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Feild</surname>
          </string-name>
          .
          <article-title>Users are not lazy: Exploiting activity of articulate users to infer search intents</article-title>
          . Submitted for publication,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Herrera</surname>
          </string-name>
          , E. S. de Moura,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cristo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. P.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. S.</surname>
          </string-name>
          da Silva.
          <article-title>Exploring features for the automatic identi cation of user goals in web search</article-title>
          .
          <source>Information Processing and Management</source>
          ,
          <volume>46</volume>
          (
          <issue>2</issue>
          ):
          <volume>131</volume>
          {
          <fpage>142</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Jansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Booth</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Spink</surname>
          </string-name>
          .
          <article-title>Determining the informational, navigational, and transactional intent of web queries</article-title>
          .
          <source>Information Processing and Management</source>
          ,
          <volume>44</volume>
          :
          <fpage>1251</fpage>
          {
          <fpage>1266</fpage>
          , May
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>R.</given-names>
            <surname>Jones</surname>
          </string-name>
          and
          <string-name>
            <given-names>K. L.</given-names>
            <surname>Klinkner</surname>
          </string-name>
          .
          <article-title>Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs</article-title>
          .
          <source>In CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge mining</source>
          , pages
          <volume>699</volume>
          {
          <fpage>708</fpage>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>I.-H.</given-names>
            <surname>Kang</surname>
          </string-name>
          .
          <article-title>Transactional query identi cation in Web search</article-title>
          .
          <source>In AIRS '05: Proceedings of Asian Information Retrieval Symposium</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>U.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Cho</surname>
          </string-name>
          .
          <article-title>Automatic identi cation of user goals in web search</article-title>
          .
          <source>In WWW '05: Proceedings of the 14th International Conference on World Wide Web</source>
          , pages
          <volume>391</volume>
          {
          <fpage>400</fpage>
          , New York, NY, USA,
          <year>2005</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>F.</given-names>
            <surname>Radlinski</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <article-title>Query chains: learning to rank from implicit feedback</article-title>
          .
          <source>In KDD'05: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge discovery in data mining</source>
          , pages
          <volume>239</volume>
          {
          <fpage>248</fpage>
          , New York, NY, USA,
          <year>2005</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Rose</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Levinson</surname>
          </string-name>
          .
          <article-title>Understanding user goals in web search</article-title>
          .
          <source>In WWW '04: Proceedings of the 13th International Conference on World Wide Web</source>
          , pages
          <volume>13</volume>
          {
          <fpage>19</fpage>
          , New York, NY, USA,
          <year>2004</year>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>