Different Degrees of Explicitness in Intentional Artifacts: Studying User Goals in a Large Search Query Log Markus Strohmaier Peter Prettenhofer Graz University of Technology and Know-Center Know-Center Inffeldgasse 21a, 8010 Graz, AUSTRIA Inffeldgasse 21a, 8010 Graz, AUSTRIA markus.strohmaier@tugraz.at pprett@know-center.at Mathias Lux Klagenfurt University Universitätsstraße 65-67, 9020 Klagenfurt, AUSTRIA mlux@itec.uni-klu.ac.at ABSTRACT ACM Classification Keywords On the web, search engines represent a primary instrument H3.3: Information storage and retrieval: Information search through which users exercise their intent. Understanding the and retrieval, H5.m. Information interfaces and presentation specific goals users express in search queries could improve (e.g., HCI): Miscellaneous. our theoretical knowledge about strategies for search goal formulation and search behavior, and could equip search INTRODUCTION engine providers with better descriptions of users’ Studying users’ goals on the web in general and in web information needs. However, the degree to which goals are search in particular has received increasing attention by explicitly expressed in search queries can be suspected to scientists as well as industry recently [13,16,22]. While exhibit considerable variety, which poses a series of industry has a strong interest in learning more about user challenges for researchers and search engine providers. This goals in order to provide better search results, enable more paper introduces a novel perspective on analyzing user targeted ad campaigns or increase click-through rates, the goals in search query logs by proposing to study different research community aims to develop a profound theoretical degrees of intentional explicitness. To explore the understanding about the different types of goals users have implications of this perspective, we studied two different on the web [4], how users express their goals [25], how degrees of explicitness of user goals in the AOL search goals can be identified automatically and how goal- query log containing more than 20 million queries. Our orientation can be used to facilitate human-computer results suggest that different degrees of intentional interaction [8]. explicitness represent an orthogonal dimension to existing search query categories and that understanding these The enormous power that search engines, such as Google, different degrees is essential for effective search. The Yahoo and Microsoft Live, have today has been described overall contribution of this paper is the elaboration of a set by John Batelle in 2003 with the notion of so-called of theoretical arguments and empirical evidence that makes “databases of intentions”1. This notion refers to the fact that a strong case for further studies of different degrees of user goals, something sensitive and private for users for a intentional explicitness in search query logs. very long time, have become explicit and – to a certain extent - public with the advent of powerful search engines Author Keywords on the web. John Batelle describes databases of intentions Web search, user goals, query log analysis, AOL search as “the aggregate results of every search ever entered, database every result list ever tendered, and every path taken as a result. […]. This information represents […] a place holder for the intentions of humankind - a massive database of desires, needs, wants, and likes that can be discovered, subpoenaed, archived, tracked, and exploited to all sorts of ends. Such a beast has never before existed in the history of culture […].” 1 http://battellemedia.com/archives/000063.php, last accessed Nov 21, 2007 © 2008 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. Re-publication of material from this volume requires permission by the copyright owners. What has received only little attention so far is that the organizational guidelines: The length of search queries is intentions represented in such “databases of intentions” can significantly shorter, the words used in search queries do be suspected to exhibit considerable variety with respect to not necessarily appear in lexica, and the text is not their degree of explicitness. While some goals contained in necessarily represented as natural language text but in some search queries might be very explicit, other queries might artificial language, such as an arbitrary concatenation of contain more implicit goals, which would mean that they terms that users suspect to yield to fruitful and relevant are more difficult to recognize by, for example, an external search results (such as “car miami”). We refer to this problem observer. To give an example: in terms of intentional as the linguistic artificiality problem. explicitness, the query “car miami” differs significantly from While solving all of these problems in their entirety is well the query “buy a used car in Miami”. beyond the scope of this work, in this paper we aim to 1) While this observation appears rather intuitive, to the best increase our understanding about the notion of different of our knowledge there is no research effort degrees of explicitness in intentional artifacts theoretically, comprehensively studying different degrees of intentional and 2) explore related challenges, potentials, and explicitness in search query logs, although the implications implications empirically. For that purpose, we have adopted seem profound: different degrees of intentional explicitness selected concepts from the body of literature related to the could put significant constraints on the general notion of goals in different research areas and conducted an analyzability and ultimately the overall utility of so-called exploratory study of a large search query log: the AOL databases of intentions, and they could put an upper bound search database released in 2006. on the level of service that search engines can provide. As a result, studying different degrees of intentional explicitness WHAT ARE GOALS? DEFINITION AND RELATED WORK in search queries appears relevant on at least two different To establish a theoretical understanding about the levels: fundamental constructs we work with, we introduce the following definitions based on related work in a series of • On a theoretical level, better understanding different, but related research areas. The most central different degrees of intentional explicitness in concept in our paper is the concept of a goal, which we search queries could increase our knowledge about define in our paper as “a condition or state of affairs in the the levels of abstractions users employ when world that some agent would like to achieve or avoid. How searching, and could equip us with better the goal is to be achieved or avoided is typically not distinctions and tools for studying, for example, specified, allowing alternatives to be considered” (based on the way users refine or generalize goals during [21]). An intentional artifact is an electronic artifact search. produced by users or user behaviour that contain • On a practical level, understanding different recognizable “traces of intent”, i.e. traces of users’ goals degrees of intentional explicitness in search and intentions expressed in different degrees of queries could improve the ability of search engine explicitness. The degree to which these traces can be vendors to better tailor their search results to recognized as goals by some independent observer depends specific users and to link search queries at on the artifact’s degree of intentional explicitness. In this different levels of explicitness. paper, we assume that search query logs at large represent intentional artifacts, meaning that they contain such traces However, understanding the degree of explicitness of user of intent at different levels of explicitness. Examples for goals in search queries poses significant research and search queries exhibiting different degrees of intentional technical challenges: First and foremost, all goals contained explicitness are shown in Figure 1. in search query logs are of hypothetical nature in the sense that verification is extremely hard – if not impossible. Most car, car Miami, car Miami dealer, buy a car in Miami, buy a used car in query logs that are available to researchers have been Miami, get loan to buy a used car in Miami anonymized, and even if information about the users would be available, contacting and verifying hypothetical goals Figure 1. Queries with different degrees of explicitness would be costly or hardly feasible due to geographical, time and other constraints. We refer to this problem as the goal The notion of goals has been used by researchers in verification problem, which is extremely hard to overcome different areas to represent and frame the desires and needs in research on search query log analysis. Second, query logs users have when interacting with software. In the following, represent huge text corpora in terms of size, which renders we will discuss selected research relevant to our work. manual elicitation of goals by experts practically impossible. We refer to this problem as the goal elicitation The Notion of Goals in Human Computer Interaction problem. Furthermore, query logs represent a Researchers have focused on studying user intentions long fundamentally different text corpus to mine goals from, before the current popularity of search engines, query log compared to other corpora that have been studied from an analysis and the web in general. In the broader human- intentional perspective, such as interview transcripts or computer interaction (HCI) context, Norman’s theory of action [19], for example, describes the inherent gap apply supervised and unsupervised learning techniques to between a person’s goals and intentions and a system’s study users’ goals in search query logs [2]. Faaborg [8] has capabilities, features and structures. Norman’s research has presented a prototype for goal-oriented browsing and Liu et implicitly acknowledged the existence of different degrees al [17] have presented a prototype for goal-oriented search of explicitness in users’ goals by highlighting that user based on intentional concepts retrieved from the goals are often not well specified, opportunistic, ill-formed ConceptNet commonsense knowledge base. and vague and therefore hard to capture, identify and While state-of-the-art research offers a set of useful represent. Any attempt studying goals in a web search categories, techniques and prototypes, we consider the context must be suspected to face similar, if not the same, degree of intentional explicitness to be orthogonal to challenges. Other work in HCI identifies basic types of so- existing intentional categories of search queries. In other called Goal-Effect Problems, i.e. problems that characterize words, we assume that within each intentional category system performance from an intentional perspective. In (such as informational or transactional queries), goals can their paper [23] the authors distinguish between (I) Missing be expressed in different degrees of intentional explicitness. cues for goal construction, where a system does not suggest Broder, for example, makes a similar point in his 2002 appropriate goals (II) Misleading cues for goal construction, paper, by mentioning that “many informational queries are where a system suggests irrelevant goals (III) Missing cues extremely wide, for instance cars or San Francisco, while for goal elimination, where a system does not eliminate some are narrow, for instance normocytic anemia, Scoville completed goals, and (IV) misleading cues for goal heat units”. Our work in this paper is motivated by a desire elimination, where a system does eliminate incomplete to characterize different degrees of intentional explicitness goals. Translated to a web search context, these distinctions in search query logs, and identifying implications for the highlight some of the implications of search queries process of search. Our own previous work explored how expressed on different levels of intentional explicitness. users express their goals during search [25]. Further work in HCI, such as the work of [12] on the Lumiere project, focuses particularly on studying Further related work has acknowledged this problem to intentional artifacts with a low degree of explicitness. some extent: in the paper of [22], for example, a tool that aims to support experts in categorizing search queries into The Notion of Goals in Requirements Engineering goal categories is presented. While different degrees of Goal Oriented Requirements Engineering (GORE) intentional explicitness were not in the explicit focus of this conceptualizes software development as a process that aims work, the development of the tool can be interpreted as an to satisfy a series of stakeholder goals. The corresponding early recognition of the problems that researchers face with research community distinguishes between different types different degrees of intentional explicitness in search of goals such as: achieve and cease goals, which are said to queries. generate behavior, maintain and avoid goals, which are said to restrict behaviors as well as optimize goals, which are DEGREES OF EXPLICITNESS IN INTENTIONAL said to compare behaviors [21]. The distinction between ARTIFACTS goals and softgoals in GORE can be seen as an indicator for In a web search context, we conceptualize the degrees of the plausibility of studying different degrees of explicitness explicitness in intentional artifacts to represent a broad, in goals. While, for example, in the i* framework [29] a continuous spectrum. On one end of this spectrum, we goal has a clear cut criteria, a softgoal describes a goal for would have queries that describe the users’ intent which there is no such clear-cut criterion to be used for completely and precisely, with nothing to add from an deciding whether it is satisfied or not. intentional perspective. On the other end of the spectrum we would have queries that do not describe user intent at The Notion of Goals in Web Search all, such as blank queries. On the web, search represents a primary instrument through For reasons of simplicity, in this paper we propose to which users exercise their intent. This allows search distinguish – at a high, dichotomous level – between two engines to have a tremendous corpus of intentional artifacts degrees of intentional queries only: explicit and implicit at their disposal. This observation has led scientists to focus intentional queries. This allows us to study whether a on studying user intentions in search query logs. In 2002, distinction between implicit and explicit intentional queries Broder [4] has introduced a high level categorization of is reasonable in a web search context in the first place, and web search intent, distinguishing between navigational, whether it yields interesting insights or implications. Given informational and transactional queries. Based on this early that we can identify interesting differences between work, Rose and Levinson [22] have refined this different degrees of intentional explicitness, it could be categorization into a hierarchical taxonomy including more interesting to conduct research on more refined definitions fine grained categories, such as entertainment or advice and more fine grained degree distinctions in the future. seeking. In 2004, [16] have presented an automatic With these arguments in mind, we introduce the following approach that aims to tell navigational and informational idealized definitions of explicit and implicit intentional goals apart based on analyzing two parameters: user-click query. An explicit intentional query is a query that can be behavior and anchor-link distribution. Baeza-Yates et al related to a specific goal in a recognizable, unambiguous the purpose of query expansion is to make the user query way. Recognizable refers to what [15] defines as “trivial to resemble more closely the documents it is expected to identify” by a subject within a given attention span. On a retrieve [26]. Our interest is rather the opposite: Because more practical level, this idealized definition is related to the precision with which users describe their goals in search what other researchers have characterized as “better queries puts an upper bound on the level of service search queries”, or queries that have “more precise goals” (R. engines can provide, our long term interest is to make Baeza-Yates at the “Future of Web Search” workshop search queries resemble more closely the intentions users 2006, Barcelona). Examples of explicit intentional queries, have (moving towards more explicit intentional queries). i.e. queries that have more precise goals, would be “buy a This could help to narrow the “gulf of execution” for users, car”, “maximize adsense revenue” or “how to get revenge on and could help computer scientists and search engine neighbor within limits of law”. While these queries can still be vendors to work with more accurate descriptions of users’ refined and elaborated, they are more unambiguous in a intent – something search engine vendors are desperate to sense that a user searching for “how to get revenge on achieve today [10]. While some researchers have already neighbor within limits of law” is unlikely to have the true goal attempted to address similar issues, [1], our particular focus of “buy a nice gift for neighbor”. We define an implicit lies in exploring different degrees of intentional intentional query as a query where it is difficult or explicitness in large search query logs rather than ambiguity extremely hard to elicit some specific goal from the of queries in general. intentional artifact. Examples include blank queries, or queries such as “car” or “travel”, which embody user goals on AN EXPLORATORY STUDY a very general level. Queries on this kind of level are likely Equipped with a theoretical understanding about explicit to require further refinement in order to yield useful search and implicit intentional queries, we are now interested in results. Interestingly, a significant proportion of queries empirically studying these different types of queries “in the today are of length 1 or 2 (as it is evident in, for example, wild”. In an exploratory study, we aim to identify and better the AOL search database set [20]). understand explicit intentional queries in the AOL search database, a large search query log database released in Distinguishing between these two broad types of queries is 2006. We want to explore whether there are differences important for several reasons: First, explicit (“better”) between explicit and implicit intentional queries with intentional queries could be used to disambiguate or refine respect to, for example, the number of users issuing these implicit intentional queries. For example: a search engine types of queries or the type of URLs clicked as a result. might be able to refine the implicit intentional query “car Furthermore, we were interested in learning whether there shop” with the explicit intentional queries “shop for a car”, are certain words that indicate the presence of explicit “repair a car”, “find a car shop” or “buy a car for shopping” intentional queries, which could represent a relevant finding with the help of user interaction. Second, we have found for future research efforts. anecdotal evidence that some users organize their search in a way that can be understood as a traversal of goal graphs Although our preliminary distinction between explicit and [25], including iterative goal refinement and generalization. implicit intentional queries equips us with an intuitive This suggests that switching between more explicit and criterion for classification, a sharper measure is needed to more implicit intentional queries during search is a natural separate explicit from implicit intentional queries on an cognitive activity for at least some users. Third, our own operational level. To simplify classification, we distinguish recent research has indicated that only 1.69% to 3.01% of between explicit and implicit intentional queries based on queries have a high degree of intentional explicitness [25]. the following arbitrary criteria A) whether a query contains While this percentage is rather small, we do not know at least one verb and B) whether the goal elicited from the whether users prefer to search via implicit intentional intentional artifact conforms to our definition of a goal. queries, or whether users have simply adapted to the non- Note that for other or more refined degrees of intentional intentional mode in which Google, Yahoo and other search explicitness, different criteria might be used. We are now engines operate today (cf. “bag-of-word principle”). Our using our previous example of queries to illustrate the research is driven by a desire to understand whether explicit implications of our particular distinction in Figure 2, where intentional queries have the potential to narrow the queries in bold represent explicit intentional queries cognitive gap between a user’s goals and the queries she according to our classification criteria. uses. We are interested in the implications of distinguishing Car, car Miami, car Miami dealer, buy a between explicit and implicit intentional queries and in car in Miami, buy a used car in Miami, get learning more about the explicit goals users have on the loan to buy a used car in Miami web, with the long term vision of enabling users to more accurately express their goals in search in the long run Figure 2. Distinguishing different degrees of explicitness (towards “better queries” in Baeza Yates’ diction). While our example might imply that the degree of This is in contrast to some past work in information explicitness correlates with query length only, it does not retrieval, for example in the area of query expansion, where necessarily. Although the query “buying a car in the 1920’s” contains a verb, it does not conform to our definition of a Part of Speech Tagging goal and would therefore not be considered to represent an Our classification approach is based on the simplified explicit intentional query. Our criteria thus allow to assumption that explicit intentional queries can be distinguish between “buy a car” or “sell a car” (explicit) and distinguished from implicit intentional queries by the “car dealer ads” (implicit). We are aware of the implications occurrence of certain part-of-speech patterns. For this of this simplification, and we discuss them in the “Threats purpose the experimental setup incorporated a fast and to validity” section at the end of this paper. reasonably accurate bigram part-of-speech tagger trained on a sample of the Penn Treebank corpus. We have focused on We investigated explicit and implicit intentional queries in tagging queries with query length > 2 only, because of the the AOL search database. In addition to the AOL data, inherent ambiguity of shorter queries, and the resulting several other web search logs are available [13]. We used difficulty of recognizing goals. We favored a bigram tagger the AOL search database because it provides a very large over more powerful approaches such as transformation- dataset including comprehensive information about based taggers and Hidden Markov Model taggers due to anonymous user IDs, time stamps, search queries, and efficiency issues, the lack of contextual information and the click-through events. It contains ~ 20 million search queries rather naive (artificial) linguistic nature of search queries collected from 657,426 unique user ID’s between March 1, (cf. the linguistic artificiality problem). The tag set of the 2006 and May 31 2006 by AOL. To our knowledge, the Penn Treebank corpus consists of 45 word classes [14]. The AOL search database is also the most recent very large reason for choosing this particular tag set is the fact that we corpus of search queries publicly available (2006)2. are mainly interested in identifying verbs and verb noun Because applying our definition of explicit and implicit combinations. For our purpose, we don’t need the finer intentional queries manually to the AOL dataset with more grained word classes provided by e.g. the tag set of the than 20 million queries is infeasible (cf. the goal elicitation brown corpus or C7. Table 1 shows a sample of word problem), we have developed an experimental classification classes of the Penn Treebank tag set. approach based on a training set of queries that was used for machine learning syntactical features of explicit Tag Description Example intentional queries. However, coming up with an automatic NN Noun, sing. or mass car classifier that excels on precision and recall measures would be well beyond the scope of this paper. Instead, our VB Verb, base form eat approach focuses on providing us with a reasonable subset VBG Verb, gerund eating of the AOL query dataset that contains a significant higher proportion of explicit intentional queries than the entire VBZ Verb, 3sg pres eats dataset. Therefore, the goals of our experimental JJ Adjective yellow classification approach are more modest: it should enable us WRB Wh-adverb how, where to gain a better understanding about explicit and implicit intentional queries and aid us in coupling our intuitions TO “to” to with empirical data. Focusing on better classification Table 1. A sample of Penn Treebank tags (from [14]) approaches could represent a promising line of future research. In the next section, we will describe some technical details of our approach. The vocabulary size of the corpus is an estimated number of 13,500 words, which is rather small compared to the An Experimental Classification Approach expected vocabulary size of the dataset (cf. the linguistic Before using the dataset for our analysis, we sanitized it artificiality problem). To address this problem, we have with respect to undesirable properties such as empty chosen a suffix tagger as a back off strategy for the bigram queries. The data representation of an entry resulting from tagger. The part-of-speech tagging functionality we used our sanitation process has the following form: {UserID, was provided by the natural language toolkit NLTK [18]. query, timestamp, (ItemRank, URL)*}. Taking this data representation as an input, our experimental classification Supervised Learning of Goal Features approach consists of two parts: part-of-speech (POS) Our classification approach is similar to those reported in tagging and supervised learning of syntactical goal features. [5,9,11]. However, we use part-of-speech n-grams instead of word n-grams as features. In our experimentation we used binary features based on fixed size trigrams. Furthermore, we introduced markers ($ $) at the beginning and the end of a query to take the query boundary part-of- speeches into account. Thus, the query "buying/VBG a/DT 2 Because the AOL search database was retracted from car/NN" would be composed of the following trigrams: AOL shortly after releasing it, we obtained a copy from a $ $ VBG, $ VBG DT, VBG DT NN, DT NN $, NN $ $ secondary source: http://www.gregsadetsky.com/aol-data/ last accessed on July 15th, 2007. To obtain a training set, we drew a uniform random sample STUDY RESULTS from the set of queries which contain at least one verb3. Results of Experimental Classification Two of the authors labeled instances in the sample Applying our technique resulted in a condensed set of consensually based on whether the queries conform to our queries containing 279,260 queries. We will refer to this set definition of goals introduced earlier. This resulted in a of queries from here on as the “condensed dataset”. The training set consisting of 98 instances, 59 positives and 39 condensed dataset contains a higher proportion of explicit negatives. While this training set is not necessarily intentional queries than the entire dataset. The difference is representative for the set of all queries under investigation, significant: While the set of explicit intentional queries in it yielded sufficient results given the exploratory nature of the entire dataset has been estimated to lie between 1.69% our research. and 3.01%, in the condensed dataset we estimate this ratio We trained a naive bayesian classifier [7] on the feature (based on a sample containing 500 random queries from vectors described above using 10-fold cross-validation. In this set) to be in a 95% confidence interval of 49.6% and order to increase the performance of our classifier we 58.4%. This allows us to compare whether there are applied a chi-squared feature selection algorithm to our interesting differences in query sets that contain a large as training set [24]. The best results, based on 10-fold cross- opposed to a very small proportion of explicit intentional validation, were achieved by reducing the feature space to queries. the 20 most predictive features. Table 2 shows the most Entire Condensed predictive features according to the feature selection. Dataset Dataset $ $ NN $ $ VBG Queries 20,494,002 279,260 Explicit Intentional 346,349- 138,513- $ WRB TO WRB TO VB Queries 616,869 163,089 $ NN NN $ VBG DT Implicit Intentional 19,877,133- 116,172- Queries 20,147,653 140,747 VBG DT NN $ VBG NN Explicit Intentional 1.69% - 3.01% 49.6% - 58.4% Queries, 95% $ $ VBZ JJ NN $ confidence interval $ VBG IN VBG IN NN Users 657,426 94,487 $ VB NN TO VB VBN Table 3. Statistical overview of the condensed dataset Table 3 gives an overview of some statistics of our Table 2. Most predictive features based on chi-squared feature condensed dataset. It also shows that the condensed dataset selection captures only part of the explicit intentional queries The purpose of our classification technique is to provide us estimated in the entire dataset. However, the dataset with a more condensed set of queries - ideally containing a provides a subset of queries with a significantly higher higher proportion of explicit intentional queries than the proportion of explicit intentional queries, which is sufficient entire dataset – that would allow us to study explicit for the kind of exploratory research questions we are intentional queries in greater detail. More sophisticated interested in. linguistic techniques such as selectional preference [3] might be more adequate if the goal would be doing Correctly Classified Intentional Queries classification with a stronger focus on precision and recall “buying groceries online” measures. For all feature selection and classification tasks, “how to get revenge on neighbor within we used the WEKA toolkit [27] in our work. limits of law” In the next section, we present the results of applying our “helping children handle death of a loved experimental classification approach to the AOL search one” database. “cleaning the ak-47” “coughing up blood” “dealing with the guilt of cheating” Table 4. Examples of correctly classified queries 3 1,598,612 out of 20,494,002queries contained at least one In addition to the statistical analysis, we want to give a verb according to the outcome of our part-of-speech tagging qualitative account of the type of queries our technique process. classified correctly and incorrectly in the condensed dataset. Examples of correctly classified queries in the condensed knowledge (such as an Amazon API to detect movie or dataset, are depicted in Table 4. These queries all represent book titles) can represent one way for dealing with such goals that contain at least one verb and conform to our kind of queries. definition of goals. In addition, the set of correctly classified explicit intentional queries does not belong to a Results of Comparing the two Datasets single query category (such as the ones identified in We also investigated whether the most popular websites previous research [10]), but spans several of them. “buying (i.e. websites that have been selected by users as a result of groceries online” for example can be categorized as a their search) in our condensed dataset differ from the most transactional query, while “helping children handle death of a popular websites in the entire search query log. If this loved one” can be categorized as an informational query. would be the case, it would make a strong argument for the This observation, together with the observation that implicit development of more advanced algorithms and techniques intentional queries do not belong to a single category either, that have higher precision in distinguishing between illustrates that the degree of intentional explicitness different degrees of intentional explicitness in search represents an orthogonal view to existing categories in queries. query log analysis. Another particularly interesting query is the instance, “coughing up blood”. Although conforming to 3500 our definition of a goal, it represents a rather different kind 3000 Explicit intentional queries of goal compared to the other goals identified in the 2500 Confidence interval condensed dataset: it represents an avoid goal of a user, Implicit intentional queries describing a state which the user presumably tries to change 2000 (presumably a medical symptom). Automatically 1500 distinguishing between achieve and avoid goals appears to 1000 be an interesting research question and a non-trivial 500 research challenge. The other goals in our table represent 0 achieve goals in a sense that a user can be reasonably http://www.amazon.com http://www.findarticles.com http://www.ehow.com http://www.hgtv.com http://www.answers.com http://www.nextag.com http://www.43things.com http://www.medhelp.org http://www.superpages.com http://www.bizrate.com http://www.geocities.com http://www.imdb.com http://www.faqfarm.com http://experts.about.com http://en.wikipedia.org http://cgi.ebay.com suspected to pursue the goal which is represented in the query (within the limitations of the goal verification problem). Examples of incorrectly classified queries are especially interesting, as they show some of the limitations of our Figure 3. Top 16 websites in the condensed dataset experimental classification approach: The histogram in figure 3 lists the top 16 websites that have Incorrectly Classified Intentional Queries been clicked by users in the condensed dataset, including websites such as amazon.com, ehow.com, en.wikipedia.org, “saving privat ryan” geocities.com, medhelp.org and others. “driving school Illinois” We have taken a random sample from each set of queries “stem cell transplant” associated with a URL listed in Figure 3 and evaluated it “founding fathers temple” with respect to correctly and incorrectly classified queries. We calculated the 95% confidence interval of the error rate “recovering the satellites lyrics” to give an estimate (middle part of each bar in figure 3). Table 5. Examples of incorrectly classified queries This kind of analysis revealed interesting differences: The websites that have highest proportion of correctly classified The small sample of queries listed in Table 5 gives a good explicit intentional queries among the top 16 websites are overview of the challenges of identifying explicit websites that can be considered to be very goal-centric: intentional queries: “Saving private ryan”, for example, is a 43things.com (a website encouraging users to share their popular Hollywood movie starring Tom Hanks, which goals in life), ehow.com (a website on how to accomplish a makes it unlikely that the user issuing the query has the broad variety of tasks and goals), hgtv.com (a home goal of actually saving a Private named Ryan. “Driving improvement website), faqfarm.com (a question answering school Illinois” probably refers to some school where people website), and medhelp.org (a medical information website). can learn to drive, rather than the goal of driving to school Medhelp.org is a particularly interesting result, as a large in Illinois. “stem cell transplant” is very likely not a goal proportion of the correctly classified explicit intentional either. The incorrect classification is likely the result of queries are queries describing medical symptoms (“coughing imperfections on the part-of-speech tagging part. up blood”), which we defined as avoid goals. Finally, we observed a significant proportion of queries that The websites with a higher proportion of incorrectly appear goal-oriented, but have the term “lyrics” as a pre- or classified explicit intentional queries are interestingly postfix, such as “recovering the satellites lyrics” (a song websites that are less goal centric such as imdb.com (a movie performed by the Counting Crows). Utilizing domain database, many queries were movie or series titles like hand, and goal-oriented websites and resources on the “saving private ryan”, “bowling for columbine” or “meet other. joe black”), superpages.com (a directory website), followed by bizrate.com (a comparison shopping site, many queries Results of Analyzing the Condensed Dataset for goods such as “marble fitted table cloth” or “fencing for Beyond comparative analysis, we were interested in the pools”), answers.com (an online dictionary and distribution of verbs in our condensed dataset. encyclopedia, many queries focusing on definitions such as “meaning of centimeter” or “define alamo war”) and en.wikipedia.org (an online encyclopedia). Especially amazon.com – the website associated with the highest number of queries in the condensed set – was difficult to interpret. Book titles often contain goals in their titles and it is hard to judge whether a user is searching for the specific book or using a goal as search query (e.g. “organizing your life” might be a search for the book “The Complete Idiot's Guide to Organizing Your Life”, which can be found at amazon.com). Geocities, which is a hosting company for a variety of web sites has a similar fraction of intentional queries, and is very broad regarding the range of topics identified in the queries. In the following, we compare the entire and the condensed Fi Figure 5. Verb frequency histogram dataset with respect to whether they differ in the set of websites users select as a result of issuing queries. The histogram in Figure 5 lists the most frequent verbs (in their stemmed word form) in our dataset. The top 10 stemmed verbs in the condensed dataset are make, get, buy, 400000 wed, is, find, live, play, use, write. While this list is interesting 350000 click events from a goal-oriented perspective and largely reasonable, it 300000 also highlights some of the limitations of our simplified 250000 approach, for example “wed” is the result of mistakenly 200000 POS-tagging “wedding” as VBG rather than the result of the 150000 verb “wed” occurring in the dataset very often (as we were 100000 able to confirm by evaluating occurrences of wed vs. 50000 wedding in the dataset). Another question we were 0 interested is whether a minority of users is responsible for issuing explicit intentional queries, or whether a larger set http://www.tripadvisor.com http://www.bankofamerica.com http://profile.myspace.com http://en.wikipedia.org http://www.amazon.com http://www.imdb.com http://www.google.com http://www.myspace.com http://www.yahoo.com http://www.mapquest.com http://www.ebay.com http://mail.yahoo.com http://www.geocities.com http://www.hotmail.com http://www.ask.com http://www.bizrate.com http://www.msn.com of users issues such queries. This would have implications for the broader relevance of different degrees of intentional explicitness in search queries. 10000 Figure 4. Top 16 websites in the entire dataset 1000 In figure 4, we can see the list of top 16 websites that have been clicked by users in the entire search result set. The Frequency 100 results differ significantly from the top 16 in the condensed dataset. Especially goal centric websites are affected by our experimental classification approach, such as 43things.com 10 (moving from rank #388 in the entire dataset up to rank #15 in the condensed set), ehow.com (from #64 up to #2), hgtv.com (from #97 up to #7), and medhelp.org (from #104 1 1 10 100 1000 10000 up to #16). The difference between popularity of websites Rank found in the condensed vs. the entire dataset and the observation of goal-centric websites surfacing in the Figure 6. Number of queries per user: rank/frequency plot condensed dataset leads us to hypothesize that there is a In the above figure 6, users are ranked based on their correlation between explicit intentional queries on one number of queries in the condensed set, whereas only the first 5000 ranks are shown. Frequency corresponds to the Reliability: We have documented and described our number of queries. While the absolute number of explicit experimental classification approach, and built on existing intentional queries in the AOL search query log has been toolkits such as the WEKA toolkit [27], so that reproducing estimated to lie between 1.69% and 3.01% [25], the our results is possible within the given limits. proportion of users in our condensed dataset is significantly higher: 14.37% of the users from the entire dataset appear OUTLOOK in the condensed dataset as well. As the data points In future work, it would be interesting to identify more fine- approximately follow a line on a logarithmic scale, the rank grained degrees of intentional explicitness and more precise frequency distribution appears to represent a power law - a criteria for distinguishing between them. Mining relations distribution that is often found in systems that contain between explicit and implicit intentional queries would be traces of social activities or interactions. another interesting stream of research, as this could allow for search engines to interactively support goal refinement THREATS TO VALIDITY or goal generalization activities. We have identified a In the following, we describe threats to validity according number of seemingly suitable web corpora, such as to [28]: 43things.com, ehow.com, medhelp.org and others, that could be used in related future research efforts. Another Construct validity: The constructs we intended to promising field of future work seems to be the development investigate in our study are explicit and implicit intentional of more precise classification approaches. In order to queries. Being aware of a broad spectrum of different advance in this direction, approaches could, for example, degrees of explicitness of goals in search queries, we have take context or domain knowledge into account to increase introduced a simplified distinction for practical purposes. the quality of classification (e.g. eliminating movie titles or While this distinction enabled us to explore the relevance of queries related to song lyrics). Categorization of explicit different degrees of explicitness, it might be an intentional queries into taxonomies of human goals [6] oversimplification of the underlying phenomenon. would be another interesting endeavor that could yield However, by defining different degrees of intentional fruitful insights into the goals users pursue on the web. explicitness as a continuous spectrum we hint towards more Investigating how our results translate to other contexts, elaborated future approaches. In addition, relying on part- such as the 43things.com website – a website that of-speech tagging and involving expert judgment to encourages users to share their goals - is another stream of distinguish between explicit and implicit intentional queries future research we are interested in. also puts certain limitations on the generality of our approach. By providing a definition for goals we aimed to SUMMARY & CONCLUSIONS objectify our process to a certain extent. This paper introduced a novel perspective on analyzing Internal validity: The experts involved in labeling the search query logs: different degrees of intentional training set of queries were two of the authors of this paper, explicitness. We have argued that these degrees represent a which might introduce a potential bias to our results. We continuous dimension, and we have shown by example that tried to mitigate this bias by requiring the experts to reach they are orthogonal to existing query categories, such as consensus on the judgment made, and by involving more transactional or informational queries. In an effort to make than one expert. The decision to exclude shorter queries this novel dimension amenable to analysis, we have (n≤2) prohibits us to make statements about a large part of introduced two simplified degrees of intentional the AOL dataset (~60%). However, our decision was explicitness, and applied it to the AOL search database. Our motivated by the inherent difficulty of part-of-speech analysis demonstrated the principle reasonability of our tagging one or two word English queries correctly, and by concepts, and highlighted a series of potentials and the fact that search engine vendors report increasing challenges when studying different degrees of intentional average query length over the past years4. explicitness in search query logs. Learning about different degrees can be considered essential for leveraging the full External validity: While we are referring to established analytical potential of “databases of intentions” - and for theories and definitions on goals from different research understanding their limitations. In addition, considering areas including human-computer interaction, goal-oriented different degrees of intentional explicitness appears critical requirements engineering and search query analysis, our for search engine vendors to better assess the level of work is biased towards the data available in the AOL search service they can or should provide for different user dataset (2006). Investigating other search query logs with queries. We have presented a theoretical elaboration of respect to different degrees of intentional explicitness is different degrees of intentional explicitness and preliminary something we are interested in. empirical evidence for the principle reasonability of these concepts. More robust techniques to understand a search query’s degree of intentional explicitness could have a 4 significant impact on narrowing the cognitive gap between http://blogs.zdnet.com/micro-markets/index.php?p=27, a user’s goals and the query she formulates. Finally, our last accessed Nov 21, 2007 findings could have a broader impact on web search research, as well as behavioral and social studies of 14. Jurafsky, D., Martin, J. H., Speech and Language motivation on the web. Processing: An introduction to natural language processing, Computational Linguistics and Speech ACKNOWLEDGMENTS Recognition (International Edition), Prentice Hall We thank Anwar Us Saeed for providing support in (2000). implementing parts of the experimental classification 15. Kirsh, D., When is information explicitly represented?, approach and Mark Kröll for very helpful comments and UBC Press (1990), 340-365. criticism. The research of this contribution is funded in part by the Austrian Competence Center program Kplus. 16. Lee, U., Liu U., & Cho J., Automatic identification of user goals in Web search. Proc. WWW '05, New York, REFERENCES NY, USA, ACM Press (2005), 391—400. 1. Allan, J., & Raghavan, H., Using part-of-speech patterns 17. Liu, H.; Lieberman, H. & Selker, T., GOOSE: A goal- to reduce query ambiguity. Proc. SIGIR Conference on oriented search engine with wommonsense, Proc. AH Research and Development in Information Retrieval, 2002, Springer-Verlag, London, UK (2002), 253-263. New York, NY, USA, ACM Press (2002), 307--314. 18. Loper, E.. Bird, S., NLTK: The Natural Language 2. Baeza-Yates, R.; Calderón-Benavides, L. & González- Toolkit, (2002). Caro, C., The Intention Behind Web Queries, Proc. 19. Norman, D., The design of everyday things, (1988). SPIRE 2006, Springer (2006), 98-109. 20. Pass, G., Chowdhury, A., Torgeson, C., A picture of 3. Beitzel, S. M., Jensen, E. C., Frieder, O., Grossman, D., search, Proc. InfoScale 2006, Hong Kong, ACM Press Lewis, D. D., Chowdhury, A., Kolcz, A., Automatic (2006). web query classification using labeled and unlabeled training data, Proc. SIGIR 2005,. New York, NY, USA, 21. Regev, G. & Wegmann, A., Where do goals come from: ACM Press (2005), 581-582. the underlying principles of goal-oriented requirements engineering, Proc. RE 2005, Washington, DC, USA, 4. Broder, A., A taxonomy of web search, SIGIR Forum IEEE Computer Society (2005), 253-362. 36(2), (2002), 3-10 22. Rose, D., Levinson, D., Understanding user goals in 5. Cavnar, W. B., Trenkle, J. M., N-gram-based text web search, Proc. WWW 2004, New York, USA (2004). categorization, Proc. SDAIR 1994, 161-175. 23. Ryu, H. & Monk, A., Analysing interaction problems 6. Chulef, A. S.; Read, S. J. & Walsh, D. A., A hierarchical with cyclic interaction theory: Low-level interaction taxonomy of human goals, Motivation and Emotion 25 walkthrough, PsychNology Journal 2(3), (2004), 304- (3), (2001), 191-232. 330. 7. Domingos, P., Pazzani, M. J., On the optimality of the 24. Sebastiani, F., Machine learning in automated text simple bayesian classifier under zero-one loss, Machine categorization, ACM Computing Surveys , vol. 34, no. 1, Learning, vol. 29, no. 2-3 (1997), 103-130 (2002), 1-47. 8. Faaborg, A. & Lieberman, H., A goal-oriented web 25. Strohmaier, M.; Lux, M.; Granitzer, M.; Scheir, P.; browser, Proc. CHI 2006, ACM Press (2006), 751-760. Liaskos, S. & Yu, E., How do users express goals on the 9. Fürnkranz, J., A study using n-gram features for text web? - An exploration of intentional structures in web categorization, Tech rep., Austrian Institute for Artificial search, in 'We Know'07 International Workshop on Intelligence (1998). Collaborative Knowledge Management for Web 10. Greene, K., The future of search. Information Systems, in conjunction with WISE'07, http://www.technologyreview.com/Biztech/19050/, last Nancy, France, (2007). accessed on July 18th, 2007, MIT Technology Review, 26. Strzalkowski, T. & Carballo, J., Natural language July 16 (2007). information retrieval: TREC-5 report, in Text REtrieval 11. Grobelnik, M., Mladenic, D., Efficient text Conference, (1998), 164-173. categorization, ECML-98 Workshop on Text Mining, 27. Witten, I. H., Frank, E., Data mining: practical machine Chemnitz, Germany (1998). learning tools and techniques, Morgan Kaufmann Series 12. Horvitz, E.; Breese, J.; Heckerman, D.; Hovel, D., in Data Management Systems, 2nd edn. Morgan Rommelse, K., The Lumiere project: Bayesian user Kaufmann, (2005). modeling for inferring the goals and needs of software 28. Yin, R. K., Case study research: design and methods users, Proc. UAI 1998, (1998), 256-265. (Applied Social Research Methods), SAGE 13. Jansen, B. & Spink, A., How are we searching the Publications, (2002). World Wide Web? A comparison of nine search engine 29. Yu, E., Modelling strategic relationships for process transaction logs, Information Processing and reengineering, PhD thesis, Department of Computer Management 42(1), (2006), 248-263. Science, University of Toronto, (1995).