Different Degrees of Explicitness in Intentional Artifacts:
      Studying User Goals in a Large Search Query Log

             Markus Strohmaier                                     Peter Prettenhofer
Graz University of Technology and Know-Center                         Know-Center
    Inffeldgasse 21a, 8010 Graz, AUSTRIA                 Inffeldgasse 21a, 8010 Graz, AUSTRIA
          markus.strohmaier@tugraz.at                            pprett@know-center.at
                                            Mathias Lux
                                        Klagenfurt University
                        Universitätsstraße 65-67, 9020 Klagenfurt, AUSTRIA
                                       mlux@itec.uni-klu.ac.at
ABSTRACT
                                                                 ACM Classification Keywords
On the web, search engines represent a primary instrument
                                                                 H3.3: Information storage and retrieval: Information search
through which users exercise their intent. Understanding the
                                                                 and retrieval, H5.m. Information interfaces and presentation
specific goals users express in search queries could improve
                                                                 (e.g., HCI): Miscellaneous.
our theoretical knowledge about strategies for search goal
formulation and search behavior, and could equip search
                                                                 INTRODUCTION
engine providers with better descriptions of users’
                                                                 Studying users’ goals on the web in general and in web
information needs. However, the degree to which goals are
                                                                 search in particular has received increasing attention by
explicitly expressed in search queries can be suspected to
                                                                 scientists as well as industry recently [13,16,22]. While
exhibit considerable variety, which poses a series of
                                                                 industry has a strong interest in learning more about user
challenges for researchers and search engine providers. This
                                                                 goals in order to provide better search results, enable more
paper introduces a novel perspective on analyzing user
                                                                 targeted ad campaigns or increase click-through rates, the
goals in search query logs by proposing to study different
                                                                 research community aims to develop a profound theoretical
degrees of intentional explicitness. To explore the
                                                                 understanding about the different types of goals users have
implications of this perspective, we studied two different
                                                                 on the web [4], how users express their goals [25], how
degrees of explicitness of user goals in the AOL search
                                                                 goals can be identified automatically and how goal-
query log containing more than 20 million queries. Our
                                                                 orientation can be used to facilitate human-computer
results suggest that different degrees of intentional
                                                                 interaction [8].
explicitness represent an orthogonal dimension to existing
search query categories and that understanding these             The enormous power that search engines, such as Google,
different degrees is essential for effective search. The         Yahoo and Microsoft Live, have today has been described
overall contribution of this paper is the elaboration of a set   by John Batelle in 2003 with the notion of so-called
of theoretical arguments and empirical evidence that makes       “databases of intentions”1. This notion refers to the fact that
a strong case for further studies of different degrees of        user goals, something sensitive and private for users for a
intentional explicitness in search query logs.                   very long time, have become explicit and – to a certain
                                                                 extent - public with the advent of powerful search engines
Author Keywords                                                  on the web. John Batelle describes databases of intentions
Web search, user goals, query log analysis, AOL search           as “the aggregate results of every search ever entered,
database                                                         every result list ever tendered, and every path taken as a
                                                                 result. […]. This information represents […] a place holder
                                                                 for the intentions of humankind - a massive database of
                                                                 desires, needs, wants, and likes that can be discovered,
                                                                 subpoenaed, archived, tracked, and exploited to all sorts of
                                                                 ends. Such a beast has never before existed in the history of
                                                                 culture […].”


                                                                 1
                                                                   http://battellemedia.com/archives/000063.php,
                                                                 last accessed Nov 21, 2007


      © 2008 for the individual papers by the papers' authors. Copying permitted for private and academic purposes.
                 Re-publication of material from this volume requires permission by the copyright owners.
What has received only little attention so far is that the       organizational guidelines: The length of search queries is
intentions represented in such “databases of intentions” can     significantly shorter, the words used in search queries do
be suspected to exhibit considerable variety with respect to     not necessarily appear in lexica, and the text is not
their degree of explicitness. While some goals contained in      necessarily represented as natural language text but in some
search queries might be very explicit, other queries might       artificial language, such as an arbitrary concatenation of
contain more implicit goals, which would mean that they          terms that users suspect to yield to fruitful and relevant
are more difficult to recognize by, for example, an external     search results (such as “car miami”). We refer to this problem
observer. To give an example: in terms of intentional            as the linguistic artificiality problem.
explicitness, the query “car miami” differs significantly from
                                                                 While solving all of these problems in their entirety is well
the query “buy a used car in Miami”.
                                                                 beyond the scope of this work, in this paper we aim to 1)
While this observation appears rather intuitive, to the best     increase our understanding about the notion of different
of our knowledge there is no research effort                     degrees of explicitness in intentional artifacts theoretically,
comprehensively studying different degrees of intentional        and 2) explore related challenges, potentials, and
explicitness in search query logs, although the implications     implications empirically. For that purpose, we have adopted
seem profound: different degrees of intentional explicitness     selected concepts from the body of literature related to the
could put significant constraints on the general                 notion of goals in different research areas and conducted an
analyzability and ultimately the overall utility of so-called    exploratory study of a large search query log: the AOL
databases of intentions, and they could put an upper bound       search database released in 2006.
on the level of service that search engines can provide. As a
result, studying different degrees of intentional explicitness   WHAT ARE GOALS? DEFINITION AND RELATED WORK
in search queries appears relevant on at least two different     To establish a theoretical understanding about the
levels:                                                          fundamental constructs we work with, we introduce the
                                                                 following definitions based on related work in a series of
    •    On a theoretical level, better understanding            different, but related research areas. The most central
         different degrees of intentional explicitness in        concept in our paper is the concept of a goal, which we
         search queries could increase our knowledge about       define in our paper as “a condition or state of affairs in the
         the levels of abstractions users employ when            world that some agent would like to achieve or avoid. How
         searching, and could equip us with better               the goal is to be achieved or avoided is typically not
         distinctions and tools for studying, for example,       specified, allowing alternatives to be considered” (based on
         the way users refine or generalize goals during         [21]). An intentional artifact is an electronic artifact
         search.                                                 produced by users or user behaviour that contain
    •    On a practical level, understanding different           recognizable “traces of intent”, i.e. traces of users’ goals
         degrees of intentional explicitness in search           and intentions expressed in different degrees of
         queries could improve the ability of search engine      explicitness. The degree to which these traces can be
         vendors to better tailor their search results to        recognized as goals by some independent observer depends
         specific users and to link search queries at            on the artifact’s degree of intentional explicitness. In this
         different levels of explicitness.                       paper, we assume that search query logs at large represent
                                                                 intentional artifacts, meaning that they contain such traces
However, understanding the degree of explicitness of user        of intent at different levels of explicitness. Examples for
goals in search queries poses significant research and           search queries exhibiting different degrees of intentional
technical challenges: First and foremost, all goals contained    explicitness are shown in Figure 1.
in search query logs are of hypothetical nature in the sense
that verification is extremely hard – if not impossible. Most     car,   car   Miami,   car   Miami   dealer,
                                                                  buy a car in Miami, buy a used car in
query logs that are available to researchers have been
                                                                  Miami, get loan to buy a used car in Miami
anonymized, and even if information about the users would
be available, contacting and verifying hypothetical goals
                                                                     Figure 1. Queries with different degrees of explicitness
would be costly or hardly feasible due to geographical, time
and other constraints. We refer to this problem as the goal      The notion of goals has been used by researchers in
verification problem, which is extremely hard to overcome        different areas to represent and frame the desires and needs
in research on search query log analysis. Second, query logs     users have when interacting with software. In the following,
represent huge text corpora in terms of size, which renders      we will discuss selected research relevant to our work.
manual elicitation of goals by experts practically
impossible. We refer to this problem as the goal elicitation     The Notion of Goals in Human Computer Interaction
problem. Furthermore, query logs represent a                     Researchers have focused on studying user intentions long
fundamentally different text corpus to mine goals from,          before the current popularity of search engines, query log
compared to other corpora that have been studied from an         analysis and the web in general. In the broader human-
intentional perspective, such as interview transcripts or        computer interaction (HCI) context, Norman’s theory of
action [19], for example, describes the inherent gap              apply supervised and unsupervised learning techniques to
between a person’s goals and intentions and a system’s            study users’ goals in search query logs [2]. Faaborg [8] has
capabilities, features and structures. Norman’s research has      presented a prototype for goal-oriented browsing and Liu et
implicitly acknowledged the existence of different degrees        al [17] have presented a prototype for goal-oriented search
of explicitness in users’ goals by highlighting that user         based on intentional concepts retrieved from the
goals are often not well specified, opportunistic, ill-formed     ConceptNet commonsense knowledge base.
and vague and therefore hard to capture, identify and
                                                                  While state-of-the-art research offers a set of useful
represent. Any attempt studying goals in a web search
                                                                  categories, techniques and prototypes, we consider the
context must be suspected to face similar, if not the same,
                                                                  degree of intentional explicitness to be orthogonal to
challenges. Other work in HCI identifies basic types of so-
                                                                  existing intentional categories of search queries. In other
called Goal-Effect Problems, i.e. problems that characterize
                                                                  words, we assume that within each intentional category
system performance from an intentional perspective. In
                                                                  (such as informational or transactional queries), goals can
their paper [23] the authors distinguish between (I) Missing
                                                                  be expressed in different degrees of intentional explicitness.
cues for goal construction, where a system does not suggest
                                                                  Broder, for example, makes a similar point in his 2002
appropriate goals (II) Misleading cues for goal construction,
                                                                  paper, by mentioning that “many informational queries are
where a system suggests irrelevant goals (III) Missing cues
                                                                  extremely wide, for instance cars or San Francisco, while
for goal elimination, where a system does not eliminate
                                                                  some are narrow, for instance normocytic anemia, Scoville
completed goals, and (IV) misleading cues for goal
                                                                  heat units”. Our work in this paper is motivated by a desire
elimination, where a system does eliminate incomplete
                                                                  to characterize different degrees of intentional explicitness
goals. Translated to a web search context, these distinctions
                                                                  in search query logs, and identifying implications for the
highlight some of the implications of search queries
                                                                  process of search. Our own previous work explored how
expressed on different levels of intentional explicitness.
                                                                  users express their goals during search [25].
Further work in HCI, such as the work of [12] on the
Lumiere project, focuses particularly on studying                 Further related work has acknowledged this problem to
intentional artifacts with a low degree of explicitness.          some extent: in the paper of [22], for example, a tool that
                                                                  aims to support experts in categorizing search queries into
The Notion of Goals in Requirements Engineering                   goal categories is presented. While different degrees of
Goal Oriented Requirements Engineering (GORE)                     intentional explicitness were not in the explicit focus of this
conceptualizes software development as a process that aims        work, the development of the tool can be interpreted as an
to satisfy a series of stakeholder goals. The corresponding       early recognition of the problems that researchers face with
research community distinguishes between different types          different degrees of intentional explicitness in search
of goals such as: achieve and cease goals, which are said to      queries.
generate behavior, maintain and avoid goals, which are said
to restrict behaviors as well as optimize goals, which are        DEGREES OF           EXPLICITNESS        IN    INTENTIONAL
said to compare behaviors [21]. The distinction between           ARTIFACTS
goals and softgoals in GORE can be seen as an indicator for       In a web search context, we conceptualize the degrees of
the plausibility of studying different degrees of explicitness    explicitness in intentional artifacts to represent a broad,
in goals. While, for example, in the i* framework [29] a          continuous spectrum. On one end of this spectrum, we
goal has a clear cut criteria, a softgoal describes a goal for    would have queries that describe the users’ intent
which there is no such clear-cut criterion to be used for         completely and precisely, with nothing to add from an
deciding whether it is satisfied or not.                          intentional perspective. On the other end of the spectrum
                                                                  we would have queries that do not describe user intent at
The Notion of Goals in Web Search                                 all, such as blank queries.
On the web, search represents a primary instrument through
                                                                  For reasons of simplicity, in this paper we propose to
which users exercise their intent. This allows search
                                                                  distinguish – at a high, dichotomous level – between two
engines to have a tremendous corpus of intentional artifacts
                                                                  degrees of intentional queries only: explicit and implicit
at their disposal. This observation has led scientists to focus
                                                                  intentional queries. This allows us to study whether a
on studying user intentions in search query logs. In 2002,
                                                                  distinction between implicit and explicit intentional queries
Broder [4] has introduced a high level categorization of
                                                                  is reasonable in a web search context in the first place, and
web search intent, distinguishing between navigational,
                                                                  whether it yields interesting insights or implications. Given
informational and transactional queries. Based on this early
                                                                  that we can identify interesting differences between
work, Rose and Levinson [22] have refined this
                                                                  different degrees of intentional explicitness, it could be
categorization into a hierarchical taxonomy including more
                                                                  interesting to conduct research on more refined definitions
fine grained categories, such as entertainment or advice
                                                                  and more fine grained degree distinctions in the future.
seeking. In 2004, [16] have presented an automatic
                                                                  With these arguments in mind, we introduce the following
approach that aims to tell navigational and informational
                                                                  idealized definitions of explicit and implicit intentional
goals apart based on analyzing two parameters: user-click
                                                                  query. An explicit intentional query is a query that can be
behavior and anchor-link distribution. Baeza-Yates et al
related to a specific goal in a recognizable, unambiguous          the purpose of query expansion is to make the user query
way. Recognizable refers to what [15] defines as “trivial to       resemble more closely the documents it is expected to
identify” by a subject within a given attention span. On a         retrieve [26]. Our interest is rather the opposite: Because
more practical level, this idealized definition is related to      the precision with which users describe their goals in search
what other researchers have characterized as “better               queries puts an upper bound on the level of service search
queries”, or queries that have “more precise goals” (R.            engines can provide, our long term interest is to make
Baeza-Yates at the “Future of Web Search” workshop                 search queries resemble more closely the intentions users
2006, Barcelona). Examples of explicit intentional queries,        have (moving towards more explicit intentional queries).
i.e. queries that have more precise goals, would be “buy a         This could help to narrow the “gulf of execution” for users,
car”, “maximize adsense revenue” or “how to get revenge on         and could help computer scientists and search engine
neighbor within limits of law”. While these queries can still be   vendors to work with more accurate descriptions of users’
refined and elaborated, they are more unambiguous in a             intent – something search engine vendors are desperate to
sense that a user searching for “how to get revenge on             achieve today [10]. While some researchers have already
neighbor within limits of law” is unlikely to have the true goal   attempted to address similar issues, [1], our particular focus
of “buy a nice gift for neighbor”. We define an implicit           lies in exploring different degrees of intentional
intentional query as a query where it is difficult or              explicitness in large search query logs rather than ambiguity
extremely hard to elicit some specific goal from the               of queries in general.
intentional artifact. Examples include blank queries, or
queries such as “car” or “travel”, which embody user goals on      AN EXPLORATORY STUDY
a very general level. Queries on this kind of level are likely     Equipped with a theoretical understanding about explicit
to require further refinement in order to yield useful search      and implicit intentional queries, we are now interested in
results. Interestingly, a significant proportion of queries        empirically studying these different types of queries “in the
today are of length 1 or 2 (as it is evident in, for example,      wild”. In an exploratory study, we aim to identify and better
the AOL search database set [20]).                                 understand explicit intentional queries in the AOL search
                                                                   database, a large search query log database released in
Distinguishing between these two broad types of queries is         2006. We want to explore whether there are differences
important for several reasons: First, explicit (“better”)          between explicit and implicit intentional queries with
intentional queries could be used to disambiguate or refine        respect to, for example, the number of users issuing these
implicit intentional queries. For example: a search engine         types of queries or the type of URLs clicked as a result.
might be able to refine the implicit intentional query “car        Furthermore, we were interested in learning whether there
shop” with the explicit intentional queries “shop for a car”,      are certain words that indicate the presence of explicit
“repair a car”, “find a car shop” or “buy a car for shopping”      intentional queries, which could represent a relevant finding
with the help of user interaction. Second, we have found           for future research efforts.
anecdotal evidence that some users organize their search in
a way that can be understood as a traversal of goal graphs         Although our preliminary distinction between explicit and
[25], including iterative goal refinement and generalization.      implicit intentional queries equips us with an intuitive
This suggests that switching between more explicit and             criterion for classification, a sharper measure is needed to
more implicit intentional queries during search is a natural       separate explicit from implicit intentional queries on an
cognitive activity for at least some users. Third, our own         operational level. To simplify classification, we distinguish
recent research has indicated that only 1.69% to 3.01% of          between explicit and implicit intentional queries based on
queries have a high degree of intentional explicitness [25].       the following arbitrary criteria A) whether a query contains
While this percentage is rather small, we do not know              at least one verb and B) whether the goal elicited from the
whether users prefer to search via implicit intentional            intentional artifact conforms to our definition of a goal.
queries, or whether users have simply adapted to the non-          Note that for other or more refined degrees of intentional
intentional mode in which Google, Yahoo and other search           explicitness, different criteria might be used. We are now
engines operate today (cf. “bag-of-word principle”). Our           using our previous example of queries to illustrate the
research is driven by a desire to understand whether explicit      implications of our particular distinction in Figure 2, where
intentional queries have the potential to narrow the               queries in bold represent explicit intentional queries
cognitive gap between a user’s goals and the queries she           according to our classification criteria.
uses. We are interested in the implications of distinguishing       Car, car Miami, car Miami dealer, buy a
between explicit and implicit intentional queries and in            car in Miami, buy a used car in Miami, get
learning more about the explicit goals users have on the            loan to buy a used car in Miami
web, with the long term vision of enabling users to more
accurately express their goals in search in the long run              Figure 2. Distinguishing different degrees of explicitness
(towards “better queries” in Baeza Yates’ diction).
                                                                   While our example might imply that the degree of
This is in contrast to some past work in information               explicitness correlates with query length only, it does not
retrieval, for example in the area of query expansion, where       necessarily. Although the query “buying a car in the 1920’s”
contains a verb, it does not conform to our definition of a       Part of Speech Tagging
goal and would therefore not be considered to represent an        Our classification approach is based on the simplified
explicit intentional query. Our criteria thus allow to            assumption that explicit intentional queries can be
distinguish between “buy a car” or “sell a car” (explicit) and    distinguished from implicit intentional queries by the
“car dealer ads” (implicit). We are aware of the implications     occurrence of certain part-of-speech patterns. For this
of this simplification, and we discuss them in the “Threats       purpose the experimental setup incorporated a fast and
to validity” section at the end of this paper.                    reasonably accurate bigram part-of-speech tagger trained on
                                                                  a sample of the Penn Treebank corpus. We have focused on
We investigated explicit and implicit intentional queries in      tagging queries with query length > 2 only, because of the
the AOL search database. In addition to the AOL data,             inherent ambiguity of shorter queries, and the resulting
several other web search logs are available [13]. We used         difficulty of recognizing goals. We favored a bigram tagger
the AOL search database because it provides a very large          over more powerful approaches such as transformation-
dataset including comprehensive information about                 based taggers and Hidden Markov Model taggers due to
anonymous user IDs, time stamps, search queries, and              efficiency issues, the lack of contextual information and the
click-through events. It contains ~ 20 million search queries     rather naive (artificial) linguistic nature of search queries
collected from 657,426 unique user ID’s between March 1,          (cf. the linguistic artificiality problem). The tag set of the
2006 and May 31 2006 by AOL. To our knowledge, the                Penn Treebank corpus consists of 45 word classes [14]. The
AOL search database is also the most recent very large            reason for choosing this particular tag set is the fact that we
corpus of search queries publicly available (2006)2.              are mainly interested in identifying verbs and verb noun
Because applying our definition of explicit and implicit          combinations. For our purpose, we don’t need the finer
intentional queries manually to the AOL dataset with more         grained word classes provided by e.g. the tag set of the
than 20 million queries is infeasible (cf. the goal elicitation   brown corpus or C7. Table 1 shows a sample of word
problem), we have developed an experimental classification        classes of the Penn Treebank tag set.
approach based on a training set of queries that was used
for machine learning syntactical features of explicit                     Tag                 Description           Example
intentional queries. However, coming up with an automatic
                                                                           NN             Noun, sing. or mass          car
classifier that excels on precision and recall measures
would be well beyond the scope of this paper. Instead, our                 VB               Verb, base form            eat
approach focuses on providing us with a reasonable subset                 VBG                Verb, gerund            eating
of the AOL query dataset that contains a significant higher
proportion of explicit intentional queries than the entire                VBZ               Verb, 3sg pres            eats
dataset. Therefore, the goals of our experimental                          JJ                  Adjective             yellow
classification approach are more modest: it should enable us              WRB                 Wh-adverb            how, where
to gain a better understanding about explicit and implicit
intentional queries and aid us in coupling our intuitions                  TO                    “to”                  to
with empirical data. Focusing on better classification
                                                                       Table 1. A sample of Penn Treebank tags (from [14])
approaches could represent a promising line of future
research. In the next section, we will describe some
technical details of our approach.                                The vocabulary size of the corpus is an estimated number of
                                                                  13,500 words, which is rather small compared to the
An Experimental Classification Approach                           expected vocabulary size of the dataset (cf. the linguistic
Before using the dataset for our analysis, we sanitized it        artificiality problem). To address this problem, we have
with respect to undesirable properties such as empty              chosen a suffix tagger as a back off strategy for the bigram
queries. The data representation of an entry resulting from       tagger. The part-of-speech tagging functionality we used
our sanitation process has the following form: {UserID,           was provided by the natural language toolkit NLTK [18].
query, timestamp, (ItemRank, URL)*}. Taking this data
representation as an input, our experimental classification       Supervised Learning of Goal Features
approach consists of two parts: part-of-speech (POS)              Our classification approach is similar to those reported in
tagging and supervised learning of syntactical goal features.     [5,9,11]. However, we use part-of-speech n-grams instead
                                                                  of word n-grams as features. In our experimentation we
                                                                  used binary features based on fixed size trigrams.
                                                                  Furthermore, we introduced markers ($ $) at the beginning
                                                                  and the end of a query to take the query boundary part-of-
                                                                  speeches into account. Thus, the query "buying/VBG a/DT
2
  Because the AOL search database was retracted from              car/NN" would be composed of the following trigrams:
AOL shortly after releasing it, we obtained a copy from a                 $ $ VBG, $ VBG DT, VBG DT NN, DT NN $, NN $ $
secondary source: http://www.gregsadetsky.com/aol-data/
last accessed on July 15th, 2007.
To obtain a training set, we drew a uniform random sample        STUDY RESULTS
from the set of queries which contain at least one verb3.        Results of Experimental Classification
Two of the authors labeled instances in the sample               Applying our technique resulted in a condensed set of
consensually based on whether the queries conform to our         queries containing 279,260 queries. We will refer to this set
definition of goals introduced earlier. This resulted in a       of queries from here on as the “condensed dataset”. The
training set consisting of 98 instances, 59 positives and 39     condensed dataset contains a higher proportion of explicit
negatives. While this training set is not necessarily            intentional queries than the entire dataset. The difference is
representative for the set of all queries under investigation,   significant: While the set of explicit intentional queries in
it yielded sufficient results given the exploratory nature of    the entire dataset has been estimated to lie between 1.69%
our research.                                                    and 3.01%, in the condensed dataset we estimate this ratio
We trained a naive bayesian classifier [7] on the feature        (based on a sample containing 500 random queries from
vectors described above using 10-fold cross-validation. In       this set) to be in a 95% confidence interval of 49.6% and
order to increase the performance of our classifier we           58.4%. This allows us to compare whether there are
applied a chi-squared feature selection algorithm to our         interesting differences in query sets that contain a large as
training set [24]. The best results, based on 10-fold cross-     opposed to a very small proportion of explicit intentional
validation, were achieved by reducing the feature space to       queries.
the 20 most predictive features. Table 2 shows the most
                                                                                              Entire             Condensed
predictive features according to the feature selection.                                       Dataset             Dataset

                  $ $ NN           $ $ VBG                               Queries            20,494,002            279,260
                                                                   Explicit Intentional      346,349-             138,513-
                $ WRB TO         WRB TO VB                              Queries              616,869              163,089
                 $ NN NN          $ VBG DT                         Implicit Intentional     19,877,133-           116,172-
                                                                        Queries             20,147,653            140,747
               VBG DT NN          $ VBG NN
                                                                   Explicit Intentional   1.69% - 3.01%       49.6% - 58.4%
                                                                     Queries, 95%
                 $ $ VBZ            JJ NN $
                                                                   confidence interval
                 $ VBG IN        VBG IN NN                                Users               657,426              94,487

                 $ VB NN         TO VB VBN                           Table 3. Statistical overview of the condensed dataset
                                                                 Table 3 gives an overview of some statistics of our
Table 2. Most predictive features based on chi-squared feature
                                                                 condensed dataset. It also shows that the condensed dataset
                           selection
                                                                 captures only part of the explicit intentional queries
The purpose of our classification technique is to provide us     estimated in the entire dataset. However, the dataset
with a more condensed set of queries - ideally containing a      provides a subset of queries with a significantly higher
higher proportion of explicit intentional queries than the       proportion of explicit intentional queries, which is sufficient
entire dataset – that would allow us to study explicit           for the kind of exploratory research questions we are
intentional queries in greater detail. More sophisticated        interested in.
linguistic techniques such as selectional preference [3]
might be more adequate if the goal would be doing                           Correctly Classified Intentional Queries
classification with a stronger focus on precision and recall                   “buying groceries online”
measures. For all feature selection and classification tasks,
                                                                     “how to get revenge on neighbor within
we used the WEKA toolkit [27] in our work.                                       limits of law”
In the next section, we present the results of applying our        “helping children handle death of a loved
experimental classification approach to the AOL search                                one”
database.                                                                          “cleaning the ak-47”
                                                                                   “coughing up blood”
                                                                       “dealing with the guilt of cheating”

                                                                        Table 4. Examples of correctly classified queries
3
 1,598,612 out of 20,494,002queries contained at least one       In addition to the statistical analysis, we want to give a
verb according to the outcome of our part-of-speech tagging      qualitative account of the type of queries our technique
process.                                                         classified correctly and incorrectly in the condensed dataset.
Examples of correctly classified queries in the condensed         knowledge (such as an Amazon API to detect movie or
dataset, are depicted in Table 4. These queries all represent     book titles) can represent one way for dealing with such
goals that contain at least one verb and conform to our           kind of queries.
definition of goals. In addition, the set of correctly
classified explicit intentional queries does not belong to a      Results of Comparing the two Datasets
single query category (such as the ones identified in             We also investigated whether the most popular websites
previous research [10]), but spans several of them. “buying       (i.e. websites that have been selected by users as a result of
groceries online” for example can be categorized as a             their search) in our condensed dataset differ from the most
transactional query, while “helping children handle death of a    popular websites in the entire search query log. If this
loved one” can be categorized as an informational query.          would be the case, it would make a strong argument for the
This observation, together with the observation that implicit     development of more advanced algorithms and techniques
intentional queries do not belong to a single category either,    that have higher precision in distinguishing between
illustrates that the degree of intentional explicitness           different degrees of intentional explicitness in search
represents an orthogonal view to existing categories in           queries.
query log analysis. Another particularly interesting query is
the instance, “coughing up blood”. Although conforming to          3500


our definition of a goal, it represents a rather different kind    3000
                                                                                                                                                                                                                                                                                                                                                                   Explicit intentional queries
of goal compared to the other goals identified in the              2500
                                                                                                                                                                                                                                                                                                                                                                   Confidence interval

condensed dataset: it represents an avoid goal of a user,                                                                                                                                                                                                                                                                                                          Implicit intentional queries


describing a state which the user presumably tries to change
                                                                   2000


(presumably a medical symptom). Automatically                      1500


distinguishing between achieve and avoid goals appears to          1000


be an interesting research question and a non-trivial              500


research challenge. The other goals in our table represent           0

achieve goals in a sense that a user can be reasonably                    http://www.amazon.com


                                                                                                                                                                                                                                                      http://www.findarticles.com
                                                                                                    http://www.ehow.com


                                                                                                                                                                                                                                http://www.hgtv.com


                                                                                                                                                                                                                                                                                    http://www.answers.com


                                                                                                                                                                                                                                                                                                                                         http://www.nextag.com


                                                                                                                                                                                                                                                                                                                                                                                                                                         http://www.43things.com


                                                                                                                                                                                                                                                                                                                                                                                                                                                                   http://www.medhelp.org
                                                                                                                                                                                                                                                                                                             http://www.superpages.com


                                                                                                                                                                                                                                                                                                                                                                 http://www.bizrate.com
                                                                                                                                                    http://www.geocities.com


                                                                                                                                                                                                          http://www.imdb.com


                                                                                                                                                                                                                                                                                                                                                                                                                http://www.faqfarm.com
                                                                                                                                                                               http://experts.about.com
                                                                                                                          http://en.wikipedia.org


                                                                                                                                                                                                                                                                                                                                                                                          http://cgi.ebay.com
suspected to pursue the goal which is represented in the
query (within the limitations of the goal verification
problem).
Examples of incorrectly classified queries are especially
interesting, as they show some of the limitations of our                                          Figure 3. Top 16 websites in the condensed dataset
experimental classification approach:
                                                                  The histogram in figure 3 lists the top 16 websites that have
          Incorrectly Classified Intentional Queries              been clicked by users in the condensed dataset, including
                                                                  websites such as amazon.com, ehow.com, en.wikipedia.org,
                 “saving privat ryan”
                                                                  geocities.com, medhelp.org and others.
             “driving school Illinois”
                                                                  We have taken a random sample from each set of queries
               “stem cell transplant”                             associated with a URL listed in Figure 3 and evaluated it
             “founding fathers temple”                            with respect to correctly and incorrectly classified queries.
                                                                  We calculated the 95% confidence interval of the error rate
       “recovering the satellites lyrics”
                                                                  to give an estimate (middle part of each bar in figure 3).
      Table 5. Examples of incorrectly classified queries         This kind of analysis revealed interesting differences: The
                                                                  websites that have highest proportion of correctly classified
The small sample of queries listed in Table 5 gives a good        explicit intentional queries among the top 16 websites are
overview of the challenges of identifying explicit                websites that can be considered to be very goal-centric:
intentional queries: “Saving private ryan”, for example, is a     43things.com (a website encouraging users to share their
popular Hollywood movie starring Tom Hanks, which                 goals in life), ehow.com (a website on how to accomplish a
makes it unlikely that the user issuing the query has the         broad variety of tasks and goals), hgtv.com (a home
goal of actually saving a Private named Ryan. “Driving            improvement website), faqfarm.com (a question answering
school Illinois” probably refers to some school where people      website), and medhelp.org (a medical information website).
can learn to drive, rather than the goal of driving to school     Medhelp.org is a particularly interesting result, as a large
in Illinois. “stem cell transplant” is very likely not a goal     proportion of the correctly classified explicit intentional
either. The incorrect classification is likely the result of      queries are queries describing medical symptoms (“coughing
imperfections on the part-of-speech tagging part.                 up blood”), which we defined as avoid goals.
Finally, we observed a significant proportion of queries that     The websites with a higher proportion of incorrectly
appear goal-oriented, but have the term “lyrics” as a pre- or     classified explicit intentional queries are interestingly
postfix, such as “recovering the satellites lyrics” (a song       websites that are less goal centric such as imdb.com (a movie
performed by the Counting Crows). Utilizing domain
database, many queries were movie or series titles like                                                                                                                                                                                                                                                                                                                                                                                            hand, and goal-oriented websites and resources on the
“saving private ryan”, “bowling for columbine” or “meet                                                                                                                                                                                                                                                                                                                                                                                            other.
joe black”), superpages.com (a directory website), followed
by bizrate.com (a comparison shopping site, many queries                                                                                                                                                                                                                                                                                                                                                                                           Results of Analyzing the Condensed Dataset
for goods such as “marble fitted table cloth” or “fencing for                                                                                                                                                                                                                                                                                                                                                                                      Beyond comparative analysis, we were interested in the
pools”), answers.com (an online dictionary and                                                                                                                                                                                                                                                                                                                                                                                                     distribution of verbs in our condensed dataset.
encyclopedia, many queries focusing on definitions such as
“meaning of centimeter” or “define alamo war”) and
en.wikipedia.org (an online encyclopedia).
Especially amazon.com – the website associated with the
highest number of queries in the condensed set – was
difficult to interpret. Book titles often contain goals in their
titles and it is hard to judge whether a user is searching for
the specific book or using a goal as search query (e.g.
“organizing your life” might be a search for the book “The
Complete Idiot's Guide to Organizing Your Life”, which
can be found at amazon.com). Geocities, which is a hosting
company for a variety of web sites has a similar fraction of
intentional queries, and is very broad regarding the range of
topics identified in the queries.
In the following, we compare the entire and the condensed                                                                                                                                                                                                                                                                                                                                                                                                                                                             Fi
                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Figure 5. Verb frequency histogram
dataset with respect to whether they differ in the set of
websites users select as a result of issuing queries.                                                                                                                                                                                                                                                                                                                                                                                              The histogram in Figure 5 lists the most frequent verbs (in
                                                                                                                                                                                                                                                                                                                                                                                                                                                   their stemmed word form) in our dataset. The top 10
                                                                                                                                                                                                                                                                                                                                                                                                                                                   stemmed verbs in the condensed dataset are make, get, buy,
400000                                                                                                                                                                                                                                                                                                                                                                                                                                             wed, is, find, live, play, use, write. While this list is interesting
350000                                                                                                                                                                                                                                                                                                                                                            click events                                                                     from a goal-oriented perspective and largely reasonable, it
300000                                                                                                                                                                                                                                                                                                                                                                                                                                             also highlights some of the limitations of our simplified
250000                                                                                                                                                                                                                                                                                                                                                                                                                                             approach, for example “wed” is the result of mistakenly
200000                                                                                                                                                                                                                                                                                                                                                                                                                                             POS-tagging “wedding” as VBG rather than the result of the
150000                                                                                                                                                                                                                                                                                                                                                                                                                                             verb “wed” occurring in the dataset very often (as we were
100000                                                                                                                                                                                                                                                                                                                                                                                                                                             able to confirm by evaluating occurrences of wed vs.
 50000                                                                                                                                                                                                                                                                                                                                                                                                                                             wedding in the dataset). Another question we were
     0                                                                                                                                                                                                                                                                                                                                                                                                                                             interested is whether a minority of users is responsible for
                                                                                                                                                                                                                                                                                                                                                                                                                                                   issuing explicit intentional queries, or whether a larger set
                                                                                                                                                                                                                                                                                                                                                                                                 http://www.tripadvisor.com
                                                                                                                                                                                                                                 http://www.bankofamerica.com


                                                                                                                                                                                                                                                                                                                                                                    http://profile.myspace.com
                                                                                 http://en.wikipedia.org

                                                                                                           http://www.amazon.com

                                                                                                                                   http://www.imdb.com
         http://www.google.com

                                 http://www.myspace.com

                                                          http://www.yahoo.com


                                                                                                                                                         http://www.mapquest.com

                                                                                                                                                                                   http://www.ebay.com

                                                                                                                                                                                                         http://mail.yahoo.com


                                                                                                                                                                                                                                                                http://www.geocities.com

                                                                                                                                                                                                                                                                                           http://www.hotmail.com

                                                                                                                                                                                                                                                                                                                    http://www.ask.com

                                                                                                                                                                                                                                                                                                                                         http://www.bizrate.com


                                                                                                                                                                                                                                                                                                                                                                                                                              http://www.msn.com


                                                                                                                                                                                                                                                                                                                                                                                                                                                   of users issues such queries. This would have implications
                                                                                                                                                                                                                                                                                                                                                                                                                                                   for the broader relevance of different degrees of intentional
                                                                                                                                                                                                                                                                                                                                                                                                                                                   explicitness in search queries.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                 10000


         Figure 4. Top 16 websites in the entire dataset                                                                                                                                                                                                                                                                                                                                                                                                         1000

In figure 4, we can see the list of top 16 websites that have
been clicked by users in the entire search result set. The
                                                                                                                                                                                                                                                                                                                                                                                                                                                     Frequency


                                                                                                                                                                                                                                                                                                                                                                                                                                                                  100
results differ significantly from the top 16 in the condensed
dataset. Especially goal centric websites are affected by our
experimental classification approach, such as 43things.com                                                                                                                                                                                                                                                                                                                                                                                                         10
(moving from rank #388 in the entire dataset up to rank #15
in the condensed set), ehow.com (from #64 up to #2),
hgtv.com (from #97 up to #7), and medhelp.org (from #104                                                                                                                                                                                                                                                                                                                                                                                                            1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                         1          10         100        1000    10000
up to #16). The difference between popularity of websites                                                                                                                                                                                                                                                                                                                                                                                                                                      Rank
found in the condensed vs. the entire dataset and the
observation of goal-centric websites surfacing in the                                                                                                                                                                                                                                                                                                                                                                                                Figure 6. Number of queries per user: rank/frequency plot
condensed dataset leads us to hypothesize that there is a
                                                                                                                                                                                                                                                                                                                                                                                                                                                   In the above figure 6, users are ranked based on their
correlation between explicit intentional queries on one
                                                                                                                                                                                                                                                                                                                                                                                                                                                   number of queries in the condensed set, whereas only the
first 5000 ranks are shown. Frequency corresponds to the         Reliability: We have documented and described our
number of queries. While the absolute number of explicit         experimental classification approach, and built on existing
intentional queries in the AOL search query log has been         toolkits such as the WEKA toolkit [27], so that reproducing
estimated to lie between 1.69% and 3.01% [25], the               our results is possible within the given limits.
proportion of users in our condensed dataset is significantly
higher: 14.37% of the users from the entire dataset appear       OUTLOOK
in the condensed dataset as well. As the data points             In future work, it would be interesting to identify more fine-
approximately follow a line on a logarithmic scale, the rank     grained degrees of intentional explicitness and more precise
frequency distribution appears to represent a power law - a      criteria for distinguishing between them. Mining relations
distribution that is often found in systems that contain         between explicit and implicit intentional queries would be
traces of social activities or interactions.                     another interesting stream of research, as this could allow
                                                                 for search engines to interactively support goal refinement
THREATS TO VALIDITY                                              or goal generalization activities. We have identified a
In the following, we describe threats to validity according      number of seemingly suitable web corpora, such as
to [28]:                                                         43things.com, ehow.com, medhelp.org and others, that
                                                                 could be used in related future research efforts. Another
Construct validity: The constructs we intended to
                                                                 promising field of future work seems to be the development
investigate in our study are explicit and implicit intentional
                                                                 of more precise classification approaches. In order to
queries. Being aware of a broad spectrum of different
                                                                 advance in this direction, approaches could, for example,
degrees of explicitness of goals in search queries, we have
                                                                 take context or domain knowledge into account to increase
introduced a simplified distinction for practical purposes.
                                                                 the quality of classification (e.g. eliminating movie titles or
While this distinction enabled us to explore the relevance of
                                                                 queries related to song lyrics). Categorization of explicit
different degrees of explicitness, it might be an
                                                                 intentional queries into taxonomies of human goals [6]
oversimplification of the underlying phenomenon.
                                                                 would be another interesting endeavor that could yield
However, by defining different degrees of intentional
                                                                 fruitful insights into the goals users pursue on the web.
explicitness as a continuous spectrum we hint towards more
                                                                 Investigating how our results translate to other contexts,
elaborated future approaches. In addition, relying on part-
                                                                 such as the 43things.com website – a website that
of-speech tagging and involving expert judgment to
                                                                 encourages users to share their goals - is another stream of
distinguish between explicit and implicit intentional queries
                                                                 future research we are interested in.
also puts certain limitations on the generality of our
approach. By providing a definition for goals we aimed to
                                                                 SUMMARY & CONCLUSIONS
objectify our process to a certain extent.                       This paper introduced a novel perspective on analyzing
Internal validity: The experts involved in labeling the          search query logs: different degrees of intentional
training set of queries were two of the authors of this paper,   explicitness. We have argued that these degrees represent a
which might introduce a potential bias to our results. We        continuous dimension, and we have shown by example that
tried to mitigate this bias by requiring the experts to reach    they are orthogonal to existing query categories, such as
consensus on the judgment made, and by involving more            transactional or informational queries. In an effort to make
than one expert. The decision to exclude shorter queries         this novel dimension amenable to analysis, we have
(n≤2) prohibits us to make statements about a large part of      introduced two simplified degrees of intentional
the AOL dataset (~60%). However, our decision was                explicitness, and applied it to the AOL search database. Our
motivated by the inherent difficulty of part-of-speech           analysis demonstrated the principle reasonability of our
tagging one or two word English queries correctly, and by        concepts, and highlighted a series of potentials and
the fact that search engine vendors report increasing            challenges when studying different degrees of intentional
average query length over the past years4.                       explicitness in search query logs. Learning about different
                                                                 degrees can be considered essential for leveraging the full
External validity: While we are referring to established         analytical potential of “databases of intentions” - and for
theories and definitions on goals from different research        understanding their limitations. In addition, considering
areas including human-computer interaction, goal-oriented        different degrees of intentional explicitness appears critical
requirements engineering and search query analysis, our          for search engine vendors to better assess the level of
work is biased towards the data available in the AOL search      service they can or should provide for different user
dataset (2006). Investigating other search query logs with       queries. We have presented a theoretical elaboration of
respect to different degrees of intentional explicitness is      different degrees of intentional explicitness and preliminary
something we are interested in.                                  empirical evidence for the principle reasonability of these
                                                                 concepts. More robust techniques to understand a search
                                                                 query’s degree of intentional explicitness could have a
4                                                                significant impact on narrowing the cognitive gap between
      http://blogs.zdnet.com/micro-markets/index.php?p=27,
                                                                 a user’s goals and the query she formulates. Finally, our
last accessed Nov 21, 2007
                                                                 findings could have a broader impact on web search
research, as well as behavioral and social studies of             14. Jurafsky, D., Martin, J. H., Speech and Language
motivation on the web.                                                Processing: An introduction to natural language
                                                                      processing, Computational Linguistics and Speech
ACKNOWLEDGMENTS                                                       Recognition (International Edition), Prentice Hall
We thank Anwar Us Saeed for providing support in                      (2000).
implementing parts of the experimental classification             15. Kirsh, D., When is information explicitly represented?,
approach and Mark Kröll for very helpful comments and                 UBC Press (1990), 340-365.
criticism. The research of this contribution is funded in part
by the Austrian Competence Center program Kplus.                  16. Lee, U., Liu U., & Cho J., Automatic identification of
                                                                      user goals in Web search. Proc. WWW '05, New York,
REFERENCES                                                            NY, USA, ACM Press (2005), 391—400.
1. Allan, J., & Raghavan, H., Using part-of-speech patterns       17. Liu, H.; Lieberman, H. & Selker, T., GOOSE: A goal-
   to reduce query ambiguity. Proc. SIGIR Conference on               oriented search engine with wommonsense, Proc. AH
   Research and Development in Information Retrieval,                 2002, Springer-Verlag, London, UK (2002), 253-263.
   New York, NY, USA, ACM Press (2002), 307--314.
                                                                  18. Loper, E.. Bird, S., NLTK: The Natural Language
2. Baeza-Yates, R.; Calderón-Benavides, L. & González-                Toolkit, (2002).
   Caro, C., The Intention Behind Web Queries, Proc.
                                                                  19. Norman, D., The design of everyday things, (1988).
   SPIRE 2006, Springer (2006), 98-109.
                                                                  20. Pass, G., Chowdhury, A., Torgeson, C., A picture of
3. Beitzel, S. M., Jensen, E. C., Frieder, O., Grossman, D.,
                                                                      search, Proc. InfoScale 2006, Hong Kong, ACM Press
   Lewis, D. D., Chowdhury, A., Kolcz, A., Automatic
                                                                      (2006).
   web query classification using labeled and unlabeled
   training data, Proc. SIGIR 2005,. New York, NY, USA,           21. Regev, G. & Wegmann, A., Where do goals come from:
   ACM Press (2005), 581-582.                                         the underlying principles of goal-oriented requirements
                                                                      engineering, Proc. RE 2005, Washington, DC, USA,
4. Broder, A., A taxonomy of web search, SIGIR Forum
                                                                      IEEE Computer Society (2005), 253-362.
   36(2), (2002), 3-10
                                                                  22. Rose, D., Levinson, D., Understanding user goals in
5. Cavnar, W. B., Trenkle, J. M., N-gram-based text
                                                                      web search, Proc. WWW 2004, New York, USA (2004).
   categorization, Proc. SDAIR 1994, 161-175.
                                                                  23. Ryu, H. & Monk, A., Analysing interaction problems
6. Chulef, A. S.; Read, S. J. & Walsh, D. A., A hierarchical
                                                                      with cyclic interaction theory: Low-level interaction
   taxonomy of human goals, Motivation and Emotion 25
                                                                      walkthrough, PsychNology Journal 2(3), (2004), 304-
   (3), (2001), 191-232.
                                                                      330.
7. Domingos, P., Pazzani, M. J., On the optimality of the
                                                                  24. Sebastiani, F., Machine learning in automated text
   simple bayesian classifier under zero-one loss, Machine
                                                                      categorization, ACM Computing Surveys , vol. 34, no. 1,
   Learning, vol. 29, no. 2-3 (1997), 103-130
                                                                      (2002), 1-47.
8. Faaborg, A. & Lieberman, H., A goal-oriented web
                                                                  25. Strohmaier, M.; Lux, M.; Granitzer, M.; Scheir, P.;
   browser, Proc. CHI 2006, ACM Press (2006), 751-760.
                                                                      Liaskos, S. & Yu, E., How do users express goals on the
9. Fürnkranz, J., A study using n-gram features for text              web? - An exploration of intentional structures in web
   categorization, Tech rep., Austrian Institute for Artificial       search, in 'We Know'07 International Workshop on
   Intelligence (1998).                                               Collaborative Knowledge Management for Web
10. Greene,      K.,    The      future   of     search.              Information Systems, in conjunction with WISE'07,
    http://www.technologyreview.com/Biztech/19050/, last              Nancy, France, (2007).
    accessed on July 18th, 2007, MIT Technology Review,           26. Strzalkowski, T. & Carballo, J., Natural language
    July 16 (2007).                                                   information retrieval: TREC-5 report, in Text REtrieval
11. Grobelnik, M., Mladenic, D., Efficient text                       Conference, (1998), 164-173.
    categorization, ECML-98 Workshop on Text Mining,              27. Witten, I. H., Frank, E., Data mining: practical machine
    Chemnitz, Germany (1998).                                         learning tools and techniques, Morgan Kaufmann Series
12. Horvitz, E.; Breese, J.; Heckerman, D.; Hovel, D.,                in Data Management Systems, 2nd edn. Morgan
    Rommelse, K., The Lumiere project: Bayesian user                  Kaufmann, (2005).
    modeling for inferring the goals and needs of software        28. Yin, R. K., Case study research: design and methods
    users, Proc. UAI 1998, (1998), 256-265.                           (Applied     Social   Research   Methods),   SAGE
13. Jansen, B. & Spink, A., How are we searching the                  Publications, (2002).
    World Wide Web? A comparison of nine search engine            29. Yu, E., Modelling strategic relationships for process
    transaction logs, Information Processing and                      reengineering, PhD thesis, Department of Computer
    Management 42(1), (2006), 248-263.                                Science, University of Toronto, (1995).