<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Different Degrees of Explicitness in Intentional Artifacts: Studying User Goals in a Large Search Query Log</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Markus</forename><surname>Strohmaier</surname></persName>
							<email>markus.strohmaier@tugraz.at</email>
							<affiliation key="aff0">
								<orgName type="institution">Graz University of Technology and Know-Center</orgName>
								<address>
									<addrLine>Inffeldgasse 21a</addrLine>
									<postCode>8010</postCode>
									<settlement>Graz</settlement>
									<country key="AT">AUSTRIA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Peter</forename><surname>Prettenhofer</surname></persName>
							<affiliation key="aff1">
								<address>
									<addrLine>Know-Center Inffeldgasse 21a</addrLine>
									<postCode>8010</postCode>
									<settlement>Graz</settlement>
									<country key="AT">AUSTRIA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mathias</forename><surname>Lux</surname></persName>
							<email>mlux@itec.uni-klu.ac.at</email>
							<affiliation key="aff2">
								<orgName type="institution">Klagenfurt University</orgName>
								<address>
									<addrLine>Universitätsstraße 65-67</addrLine>
									<postCode>9020</postCode>
									<settlement>Klagenfurt</settlement>
									<country key="AT">AUSTRIA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Different Degrees of Explicitness in Intentional Artifacts: Studying User Goals in a Large Search Query Log</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">AC617182BE79CCB7C8E0DF3E5A4BC096</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T01:21+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Web search</term>
					<term>user goals</term>
					<term>query log analysis</term>
					<term>AOL search database H3.3: Information storage and retrieval: Information search and retrieval</term>
					<term>H5.m. Information interfaces and presentation (e.g.</term>
					<term>HCI): Miscellaneous</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>On the web, search engines represent a primary instrument through which users exercise their intent. Understanding the specific goals users express in search queries could improve our theoretical knowledge about strategies for search goal formulation and search behavior, and could equip search engine providers with better descriptions of users' information needs. However, the degree to which goals are explicitly expressed in search queries can be suspected to exhibit considerable variety, which poses a series of challenges for researchers and search engine providers. This paper introduces a novel perspective on analyzing user goals in search query logs by proposing to study different degrees of intentional explicitness. To explore the implications of this perspective, we studied two different degrees of explicitness of user goals in the AOL search query log containing more than 20 million queries. Our results suggest that different degrees of intentional explicitness represent an orthogonal dimension to existing search query categories and that understanding these different degrees is essential for effective search. The overall contribution of this paper is the elaboration of a set of theoretical arguments and empirical evidence that makes a strong case for further studies of different degrees of intentional explicitness in search query logs.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>INTRODUCTION</head><p>Studying users' goals on the web in general and in web search in particular has received increasing attention by scientists as well as industry recently <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b21">22]</ref>. While industry has a strong interest in learning more about user goals in order to provide better search results, enable more targeted ad campaigns or increase click-through rates, the research community aims to develop a profound theoretical understanding about the different types of goals users have on the web <ref type="bibr" target="#b3">[4]</ref>, how users express their goals <ref type="bibr" target="#b24">[25]</ref>, how goals can be identified automatically and how goalorientation can be used to facilitate human-computer interaction <ref type="bibr" target="#b7">[8]</ref>.</p><p>The enormous power that search engines, such as Google, Yahoo and Microsoft Live, have today has been described by John Batelle in 2003 with the notion of so-called "databases of intentions" <ref type="foot" target="#foot_0">1</ref> . This notion refers to the fact that user goals, something sensitive and private for users for a very long time, have become explicit and -to a certain extent -public with the advent of powerful search engines on the web. John Batelle describes databases of intentions as "the aggregate results of every search ever entered, every result list ever tendered, and every path taken as a result. <ref type="bibr">[…]</ref>. This information represents <ref type="bibr">[…]</ref> a place holder for the intentions of humankind -a massive database of desires, needs, wants, and likes that can be discovered, subpoenaed, archived, tracked, and exploited to all sorts of ends. Such a beast has never before existed in the history of culture <ref type="bibr">[…]</ref>."</p><p>What has received only little attention so far is that the intentions represented in such "databases of intentions" can be suspected to exhibit considerable variety with respect to their degree of explicitness. While some goals contained in search queries might be very explicit, other queries might contain more implicit goals, which would mean that they are more difficult to recognize by, for example, an external observer. To give an example: in terms of intentional explicitness, the query "car miami" differs significantly from the query "buy a used car in Miami".</p><p>While this observation appears rather intuitive, to the best of our knowledge there is no research effort comprehensively studying different degrees of intentional explicitness in search query logs, although the implications seem profound: different degrees of intentional explicitness could put significant constraints on the general analyzability and ultimately the overall utility of so-called databases of intentions, and they could put an upper bound on the level of service that search engines can provide. As a result, studying different degrees of intentional explicitness in search queries appears relevant on at least two different levels:</p><p>• On a theoretical level, better understanding different degrees of intentional explicitness in search queries could increase our knowledge about the levels of abstractions users employ when searching, and could equip us with better distinctions and tools for studying, for example, the way users refine or generalize goals during search.</p><p>• On a practical level, understanding different degrees of intentional explicitness in search queries could improve the ability of search engine vendors to better tailor their search results to specific users and to link search queries at different levels of explicitness.</p><p>However, understanding the degree of explicitness of user goals in search queries poses significant research and technical challenges: First and foremost, all goals contained in search query logs are of hypothetical nature in the sense that verification is extremely hard -if not impossible. Most query logs that are available to researchers have been anonymized, and even if information about the users would be available, contacting and verifying hypothetical goals would be costly or hardly feasible due to geographical, time and other constraints. We refer to this problem as the goal verification problem, which is extremely hard to overcome in research on search query log analysis. Second, query logs represent huge text corpora in terms of size, which renders manual elicitation of goals by experts practically impossible. We refer to this problem as the goal elicitation problem. Furthermore, query logs represent a fundamentally different text corpus to mine goals from, compared to other corpora that have been studied from an intentional perspective, such as interview transcripts or organizational guidelines: The length of search queries is significantly shorter, the words used in search queries do not necessarily appear in lexica, and the text is not necessarily represented as natural language text but in some artificial language, such as an arbitrary concatenation of terms that users suspect to yield to fruitful and relevant search results (such as "car miami"). We refer to this problem as the linguistic artificiality problem.</p><p>While solving all of these problems in their entirety is well beyond the scope of this work, in this paper we aim to 1) increase our understanding about the notion of different degrees of explicitness in intentional artifacts theoretically, and 2) explore related challenges, potentials, and implications empirically. For that purpose, we have adopted selected concepts from the body of literature related to the notion of goals in different research areas and conducted an exploratory study of a large search query log: the AOL search database released in 2006.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>WHAT ARE GOALS? DEFINITION AND RELATED WORK</head><p>To establish a theoretical understanding about the fundamental constructs we work with, we introduce the following definitions based on related work in a series of different, but related research areas. The most central concept in our paper is the concept of a goal, which we define in our paper as "a condition or state of affairs in the world that some agent would like to achieve or avoid. How the goal is to be achieved or avoided is typically not specified, allowing alternatives to be considered" (based on <ref type="bibr" target="#b20">[21]</ref>). An intentional artifact is an electronic artifact produced by users or user behaviour that contain recognizable "traces of intent", i.e. traces of users' goals and intentions expressed in different degrees of explicitness. The degree to which these traces can be recognized as goals by some independent observer depends on the artifact's degree of intentional explicitness. In this paper, we assume that search query logs at large represent intentional artifacts, meaning that they contain such traces of intent at different levels of explicitness. Examples for search queries exhibiting different degrees of intentional explicitness are shown in Figure <ref type="figure" target="#fig_0">1</ref>.  The notion of goals has been used by researchers in different areas to represent and frame the desires and needs users have when interacting with software. In the following, we will discuss selected research relevant to our work. action <ref type="bibr" target="#b18">[19]</ref>, for example, describes the inherent gap between a person's goals and intentions and a system's capabilities, features and structures. Norman's research has implicitly acknowledged the existence of different degrees of explicitness in users' goals by highlighting that user goals are often not well specified, opportunistic, ill-formed and vague and therefore hard to capture, identify and represent. Any attempt studying goals in a web search context must be suspected to face similar, if not the same, challenges. Other work in HCI identifies basic types of socalled Goal-Effect Problems, i.e. problems that characterize system performance from an intentional perspective. In their paper <ref type="bibr" target="#b22">[23]</ref> the authors distinguish between (I) Missing cues for goal construction, where a system does not suggest appropriate goals (II) Misleading cues for goal construction, where a system suggests irrelevant goals (III) Missing cues for goal elimination, where a system does not eliminate completed goals, and (IV) misleading cues for goal elimination, where a system does eliminate incomplete goals. Translated to a web search context, these distinctions highlight some of the implications of search queries expressed on different levels of intentional explicitness. Further work in HCI, such as the work of <ref type="bibr" target="#b11">[12]</ref> on the Lumiere project, focuses particularly on studying intentional artifacts with a low degree of explicitness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The Notion of Goals in Requirements Engineering</head><p>Goal Oriented Requirements Engineering (GORE) conceptualizes software development as a process that aims to satisfy a series of stakeholder goals. The corresponding research community distinguishes between different types of goals such as: achieve and cease goals, which are said to generate behavior, maintain and avoid goals, which are said to restrict behaviors as well as optimize goals, which are said to compare behaviors <ref type="bibr" target="#b20">[21]</ref>. The distinction between goals and softgoals in GORE can be seen as an indicator for the plausibility of studying different degrees of explicitness in goals. While, for example, in the i* framework <ref type="bibr" target="#b28">[29]</ref> a goal has a clear cut criteria, a softgoal describes a goal for which there is no such clear-cut criterion to be used for deciding whether it is satisfied or not.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The Notion of Goals in Web Search</head><p>On the web, search represents a primary instrument through which users exercise their intent. This allows search engines to have a tremendous corpus of intentional artifacts at their disposal. This observation has led scientists to focus on studying user intentions in search query logs. In 2002, Broder <ref type="bibr" target="#b3">[4]</ref> has introduced a high level categorization of web search intent, distinguishing between navigational, informational and transactional queries. Based on this early work, Rose and Levinson <ref type="bibr" target="#b21">[22]</ref> have refined this categorization into a hierarchical taxonomy including more fine grained categories, such as entertainment or advice seeking. In 2004, <ref type="bibr" target="#b15">[16]</ref> have presented an automatic approach that aims to tell navigational and informational goals apart based on analyzing two parameters: user-click behavior and anchor-link distribution. Baeza-Yates et al apply supervised and unsupervised learning techniques to study users' goals in search query logs <ref type="bibr" target="#b1">[2]</ref>. Faaborg <ref type="bibr" target="#b7">[8]</ref> has presented a prototype for goal-oriented browsing and Liu et al <ref type="bibr" target="#b16">[17]</ref> have presented a prototype for goal-oriented search based on intentional concepts retrieved from the ConceptNet commonsense knowledge base.</p><p>While state-of-the-art research offers a set of useful categories, techniques and prototypes, we consider the degree of intentional explicitness to be orthogonal to existing intentional categories of search queries. In other words, we assume that within each intentional category (such as informational or transactional queries), goals can be expressed in different degrees of intentional explicitness. Broder, for example, makes a similar point in his 2002 paper, by mentioning that "many informational queries are extremely wide, for instance cars or San Francisco, while some are narrow, for instance normocytic anemia, Scoville heat units". Our work in this paper is motivated by a desire to characterize different degrees of intentional explicitness in search query logs, and identifying implications for the process of search. Our own previous work explored how users express their goals during search <ref type="bibr" target="#b24">[25]</ref>.</p><p>Further related work has acknowledged this problem to some extent: in the paper of <ref type="bibr" target="#b21">[22]</ref>, for example, a tool that aims to support experts in categorizing search queries into goal categories is presented. While different degrees of intentional explicitness were not in the explicit focus of this work, the development of the tool can be interpreted as an early recognition of the problems that researchers face with different degrees of intentional explicitness in search queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>DEGREES OF EXPLICITNESS IN INTENTIONAL ARTIFACTS</head><p>In a web search context, we conceptualize the degrees of explicitness in intentional artifacts to represent a broad, continuous spectrum. On one end of this spectrum, we would have queries that describe the users' intent completely and precisely, with nothing to add from an intentional perspective. On the other end of the spectrum we would have queries that do not describe user intent at all, such as blank queries.</p><p>For reasons of simplicity, in this paper we propose to distinguish -at a high, dichotomous level -between two degrees of intentional queries only: explicit and implicit intentional queries. This allows us to study whether a distinction between implicit and explicit intentional queries is reasonable in a web search context in the first place, and whether it yields interesting insights or implications. Given that we can identify interesting differences between different degrees of intentional explicitness, it could be interesting to conduct research on more refined definitions and more fine grained degree distinctions in the future. With these arguments in mind, we introduce the following idealized definitions of explicit and implicit intentional query. An explicit intentional query is a query that can be related to a specific goal in a recognizable, unambiguous way. Recognizable refers to what <ref type="bibr" target="#b14">[15]</ref> defines as "trivial to identify" by a subject within a given attention span. On a more practical level, this idealized definition is related to what other researchers have characterized as "better queries", or queries that have "more precise goals" (R. Baeza-Yates at the "Future of Web Search" workshop 2006, Barcelona). Examples of explicit intentional queries, i.e. queries that have more precise goals, would be "buy a car", "maximize adsense revenue" or "how to get revenge on neighbor within limits of law". While these queries can still be refined and elaborated, they are more unambiguous in a sense that a user searching for "how to get revenge on neighbor within limits of law" is unlikely to have the true goal of "buy a nice gift for neighbor". We define an implicit intentional query as a query where it is difficult or extremely hard to elicit some specific goal from the intentional artifact. Examples include blank queries, or queries such as "car" or "travel", which embody user goals on a very general level. Queries on this kind of level are likely to require further refinement in order to yield useful search results. Interestingly, a significant proportion of queries today are of length 1 or 2 (as it is evident in, for example, the AOL search database set <ref type="bibr" target="#b19">[20]</ref>).</p><p>Distinguishing between these two broad types of queries is important for several reasons: First, explicit ("better") intentional queries could be used to disambiguate or refine implicit intentional queries. For example: a search engine might be able to refine the implicit intentional query "car shop" with the explicit intentional queries "shop for a car", "repair a car", "find a car shop" or "buy a car for shopping" with the help of user interaction. Second, we have found anecdotal evidence that some users organize their search in a way that can be understood as a traversal of goal graphs <ref type="bibr" target="#b24">[25]</ref>, including iterative goal refinement and generalization. This suggests that switching between more explicit and more implicit intentional queries during search is a natural cognitive activity for at least some users. Third, our own recent research has indicated that only 1.69% to 3.01% of queries have a high degree of intentional explicitness <ref type="bibr" target="#b24">[25]</ref>. While this percentage is rather small, we do not know whether users prefer to search via implicit intentional queries, or whether users have simply adapted to the nonintentional mode in which Google, Yahoo and other search engines operate today (cf. "bag-of-word principle"). Our research is driven by a desire to understand whether explicit intentional queries have the potential to narrow the cognitive gap between a user's goals and the queries she uses. We are interested in the implications of distinguishing between explicit and implicit intentional queries and in learning more about the explicit goals users have on the web, with the long term vision of enabling users to more accurately express their goals in search in the long run (towards "better queries" in Baeza Yates' diction). This is in contrast to some past work in information retrieval, for example in the area of query expansion, where the purpose of query expansion is to make the user query resemble more closely the documents it is expected to retrieve <ref type="bibr" target="#b25">[26]</ref>. Our interest is rather the opposite: Because the precision with which users describe their goals in search queries puts an upper bound on the level of service search engines can provide, our long term interest is to make search queries resemble more closely the intentions users have (moving towards more explicit intentional queries). This could help to narrow the "gulf of execution" for users, and could help computer scientists and search engine vendors to work with more accurate descriptions of users' intent -something search engine vendors are desperate to achieve today <ref type="bibr" target="#b9">[10]</ref>. While some researchers have already attempted to address similar issues, <ref type="bibr" target="#b0">[1]</ref>, our particular focus lies in exploring different degrees of intentional explicitness in large search query logs rather than ambiguity of queries in general.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>AN EXPLORATORY STUDY</head><p>Equipped with a theoretical understanding about explicit and implicit intentional queries, we are now interested in empirically studying these different types of queries "in the wild". In an exploratory study, we aim to identify and better understand explicit intentional queries in the AOL search database, a large search query log database released in 2006. We want to explore whether there are differences between explicit and implicit intentional queries with respect to, for example, the number of users issuing these types of queries or the type of URLs clicked as a result. Furthermore, we were interested in learning whether there are certain words that indicate the presence of explicit intentional queries, which could represent a relevant finding for future research efforts.</p><p>Although our preliminary distinction between explicit and implicit intentional queries equips us with an intuitive criterion for classification, a sharper measure is needed to separate explicit from implicit intentional queries on an operational level. To simplify classification, we distinguish between explicit and implicit intentional queries based on the following arbitrary criteria A) whether a query contains at least one verb and B) whether the goal elicited from the intentional artifact conforms to our definition of a goal. Note that for other or more refined degrees of intentional explicitness, different criteria might be used. We are now using our previous example of queries to illustrate the implications of our particular distinction in Figure <ref type="figure" target="#fig_1">2</ref>, where queries in bold represent explicit intentional queries according to our classification criteria. While our example might imply that the degree of explicitness correlates with query length only, it does not necessarily. Although the query "buying a car in the 1920's" contains a verb, it does not conform to our definition of a goal and would therefore not be considered to represent an explicit intentional query. Our criteria thus allow to distinguish between "buy a car" or "sell a car" (explicit) and "car dealer ads" (implicit). We are aware of the implications of this simplification, and we discuss them in the "Threats to validity" section at the end of this paper.</p><p>We investigated explicit and implicit intentional queries in the AOL search database. In addition to the AOL data, several other web search logs are available <ref type="bibr" target="#b12">[13]</ref>. We used the AOL search database because it provides a very large dataset including comprehensive information about anonymous user IDs, time stamps, search queries, and click-through events. It contains ~ 20 million search queries collected from 657,426 unique user ID's between <ref type="bibr">March 1, 2006</ref> and May 31 2006 by AOL. To our knowledge, the AOL search database is also the most recent very large corpus of search queries publicly available (2006) 2 . Because applying our definition of explicit and implicit intentional queries manually to the AOL dataset with more than 20 million queries is infeasible (cf. the goal elicitation problem), we have developed an experimental classification approach based on a training set of queries that was used for machine learning syntactical features of explicit intentional queries. However, coming up with an automatic classifier that excels on precision and recall measures would be well beyond the scope of this paper. Instead, our approach focuses on providing us with a reasonable subset of the AOL query dataset that contains a significant higher proportion of explicit intentional queries than the entire dataset. Therefore, the goals of our experimental classification approach are more modest: it should enable us to gain a better understanding about explicit and implicit intentional queries and aid us in coupling our intuitions with empirical data. Focusing on better classification approaches could represent a promising line of future research. In the next section, we will describe some technical details of our approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>An Experimental Classification Approach</head><p>Before using the dataset for our analysis, we sanitized it with respect to undesirable properties such as empty queries. The data representation of an entry resulting from our sanitation process has the following form: {UserID, query, timestamp, (ItemRank, URL)*}. Taking this data representation as an input, our experimental classification approach consists of two parts: part-of-speech (POS) tagging and supervised learning of syntactical goal features. 2 Because the AOL search database was retracted from AOL shortly after releasing it, we obtained a copy from a secondary source: http://www.gregsadetsky.com/aol-data/ last accessed on July 15th, 2007.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Part of Speech Tagging</head><p>Our classification approach is based on the simplified assumption that explicit intentional queries can be distinguished from implicit intentional queries by the occurrence of certain part-of-speech patterns. For this purpose the experimental setup incorporated a fast and reasonably accurate bigram part-of-speech tagger trained on a sample of the Penn Treebank corpus. We have focused on tagging queries with query length &gt; 2 only, because of the inherent ambiguity of shorter queries, and the resulting difficulty of recognizing goals. We favored a bigram tagger over more powerful approaches such as transformationbased taggers and Hidden Markov Model taggers due to efficiency issues, the lack of contextual information and the rather naive (artificial) linguistic nature of search queries (cf. the linguistic artificiality problem). The tag set of the Penn Treebank corpus consists of 45 word classes <ref type="bibr" target="#b13">[14]</ref>. The reason for choosing this particular tag set is the fact that we are mainly interested in identifying verbs and verb noun combinations. For our purpose, we don't need the finer grained word classes provided by e.g. the tag set of the brown corpus or C7. Table <ref type="table" target="#tab_1">1</ref>  The vocabulary size of the corpus is an estimated number of 13,500 words, which is rather small compared to the expected vocabulary size of the dataset (cf. the linguistic artificiality problem). To address this problem, we have chosen a suffix tagger as a back off strategy for the bigram tagger. The part-of-speech tagging functionality we used was provided by the natural language toolkit NLTK <ref type="bibr" target="#b17">[18]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Supervised Learning of Goal Features</head><p>Our classification approach is similar to those reported in <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b10">11]</ref>. However, we use part-of-speech n-grams instead of word n-grams as features. In our experimentation we used binary features based on fixed size trigrams. Furthermore, we introduced markers ($ $) at the beginning and the end of a query to take the query boundary part-ofspeeches into account. Thus, the query "buying/VBG a/DT car/NN" would be composed of the following trigrams: We trained a naive bayesian classifier <ref type="bibr" target="#b6">[7]</ref> on the feature vectors described above using 10-fold cross-validation. In order to increase the performance of our classifier we applied a chi-squared feature selection algorithm to our training set <ref type="bibr" target="#b23">[24]</ref>. The best results, based on 10-fold crossvalidation, were achieved by reducing the feature space to the 20 most predictive features. Table <ref type="table">2</ref> shows the most predictive features according to the feature selection. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2. Most predictive features based on chi-squared feature selection</head><p>The purpose of our classification technique is to provide us with a more condensed set of queries -ideally containing a higher proportion of explicit intentional queries than the entire dataset -that would allow us to study explicit intentional queries in greater detail. More sophisticated linguistic techniques such as selectional preference <ref type="bibr" target="#b2">[3]</ref> might be more adequate if the goal would be doing classification with a stronger focus on precision and recall measures. For all feature selection and classification tasks, we used the WEKA toolkit <ref type="bibr" target="#b26">[27]</ref> in our work.</p><p>In the next section, we present the results of applying our experimental classification approach to the AOL search database.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>STUDY RESULTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results of Experimental Classification</head><p>Applying our technique resulted in a condensed set of queries containing 279,260 queries. We will refer to this set of queries from here on as the "condensed dataset". The condensed dataset contains a higher proportion of explicit intentional queries than the entire dataset. The difference is significant: While the set of explicit intentional queries in the entire dataset has been estimated to lie between 1.69% and 3.01%, in the condensed dataset we estimate this ratio (based on a sample containing 500 random queries from this set) to be in a 95% confidence interval of 49.6% and 58.4%. This allows us to compare whether there are interesting differences in query sets that contain a large as opposed to a very small proportion of explicit intentional queries.  Table <ref type="table" target="#tab_4">3</ref> gives an overview of some statistics of our condensed dataset. It also shows that the condensed dataset captures only part of the explicit intentional queries estimated in the entire dataset. However, the dataset provides a subset of queries with a significantly higher proportion of explicit intentional queries, which is sufficient for the kind of exploratory research questions we are interested in.</p><p>Correctly Classified Intentional Queries "buying groceries online"</p><p>"how to get revenge on neighbor within limits of law" "helping children handle death of a loved one"</p><p>"cleaning the ak-47" "coughing up blood"</p><p>"dealing with the guilt of cheating"</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4. Examples of correctly classified queries</head><p>In addition to the statistical analysis, we want to give a qualitative account of the type of queries our technique classified correctly and incorrectly in the condensed dataset.</p><p>Examples of correctly classified queries in the condensed dataset, are depicted in Table <ref type="table">4</ref>. These queries all represent goals that contain at least one verb and conform to our definition of goals. In addition, the set of correctly classified explicit intentional queries does not belong to a single query category (such as the ones identified in previous research <ref type="bibr" target="#b9">[10]</ref>), but spans several of them. "buying groceries online" for example can be categorized as a transactional query, while "helping children handle death of a loved one" can be categorized as an informational query. This observation, together with the observation that implicit intentional queries do not belong to a single category either, illustrates that the degree of intentional explicitness represents an orthogonal view to existing categories in query log analysis. Another particularly interesting query is the instance, "coughing up blood". Although conforming to our definition of a goal, it represents a rather different kind of goal compared to the other goals identified in the condensed dataset: it represents an avoid goal of a user, describing a state which the user presumably tries to change (presumably a medical symptom). Automatically distinguishing between achieve and avoid goals appears to be an interesting research question and a non-trivial research challenge. The other goals in our table represent achieve goals in a sense that a user can be reasonably suspected to pursue the goal which is represented in the query (within the limitations of the goal verification problem).</p><p>Examples of incorrectly classified queries are especially interesting, as they show some of the limitations of our experimental classification approach:</p><p>Incorrectly Classified Intentional Queries "saving privat ryan" "driving school Illinois" "stem cell transplant" "founding fathers temple" "recovering the satellites lyrics"</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 5. Examples of incorrectly classified queries</head><p>The small sample of queries listed in Table <ref type="table">5</ref> gives a good overview of the challenges of identifying explicit intentional queries: "Saving private ryan", for example, is a popular Hollywood movie starring Tom Hanks, which makes it unlikely that the user issuing the query has the goal of actually saving a Private named Ryan. "Driving school Illinois" probably refers to some school where people can learn to drive, rather than the goal of driving to school in Illinois. "stem cell transplant" is very likely not a goal either. The incorrect classification is likely the result of imperfections on the part-of-speech tagging part.</p><p>Finally, we observed a significant proportion of queries that appear goal-oriented, but have the term "lyrics" as a pre-or postfix, such as "recovering the satellites lyrics" (a song performed by the Counting Crows). Utilizing domain knowledge (such as an Amazon API to detect movie or book titles) can represent one way for dealing with such kind of queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results of Comparing the two Datasets</head><p>We also investigated whether the most popular websites (i.e. websites that have been selected by users as a result of their search) in our condensed dataset differ from the most popular websites in the entire search query log. If this would be the case, it would make a strong argument for the development of more advanced algorithms and techniques that have higher precision in distinguishing between different degrees of intentional explicitness in search queries. The histogram in figure <ref type="figure" target="#fig_3">3</ref> lists the top 16 websites that have been clicked by users in the condensed dataset, including websites such as amazon.com, ehow.com, en.wikipedia.org, geocities.com, medhelp.org and others.</p><p>We have taken a random sample from each set of queries associated with a URL listed in Figure <ref type="figure" target="#fig_3">3</ref> and evaluated it with respect to correctly and incorrectly classified queries. We calculated the 95% confidence interval of the error rate to give an estimate (middle part of each bar in figure <ref type="figure" target="#fig_3">3</ref>). This kind of analysis revealed interesting differences: The websites that have highest proportion of correctly classified explicit intentional queries among the top 16 websites are websites that can be considered to be very goal-centric: 43things.com (a website encouraging users to share their goals in life), ehow.com (a website on how to accomplish a broad variety of tasks and goals), hgtv.com (a home improvement website), faqfarm.com (a question answering website), and medhelp.org (a medical information website). Medhelp.org is a particularly interesting result, as a large proportion of the correctly classified explicit intentional queries are queries describing medical symptoms ("coughing up blood"), which we defined as avoid goals.</p><p>The websites with a higher proportion of incorrectly classified explicit intentional queries are interestingly websites that are less goal centric such as imdb.com (a movie database, many queries were movie or series titles like "saving private ryan", "bowling for columbine" or "meet joe black"), superpages.com (a directory website), followed by bizrate.com (a comparison shopping site, many queries for goods such as "marble fitted table cloth" or "fencing for pools"), answers.com (an online dictionary and encyclopedia, many queries focusing on definitions such as "meaning of centimeter" or "define alamo war") and en.wikipedia.org (an online encyclopedia).</p><p>Especially amazon.com -the website associated with the highest number of queries in the condensed set -was difficult to interpret. Book titles often contain goals in their titles and it is hard to judge whether a user is searching for the specific book or using a goal as search query (e.g. "organizing your life" might be a search for the book "The Complete Idiot's Guide to Organizing Your Life", which can be found at amazon.com). Geocities, which is a hosting company for a variety of web sites has a similar fraction of intentional queries, and is very broad regarding the range of topics identified in the queries.</p><p>In the following, we compare the entire and the condensed dataset with respect to whether they differ in the set of websites users select as a result of issuing queries. In figure <ref type="figure" target="#fig_4">4</ref>, we can see the list of top 16 websites that have been clicked by users in the entire search result set. The results differ significantly from the top 16 in the condensed dataset. Especially goal centric websites are affected by our experimental classification approach, such as 43things.com (moving from rank #388 in the entire dataset up to rank #15 in the condensed set), ehow.com (from #64 up to #2), hgtv.com (from #97 up to #7), and medhelp.org (from #104 up to <ref type="bibr">#16)</ref>. The difference between popularity of websites found in the condensed vs. the entire dataset and the observation of goal-centric websites surfacing in the condensed dataset leads us to hypothesize that there is a correlation between explicit intentional queries on one hand, and goal-oriented websites and resources on the other.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results of Analyzing the Condensed Dataset</head><p>Beyond comparative analysis, we were interested in the distribution of verbs in our condensed dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fi Figure 5. Verb frequency histogram</head><p>The histogram in Figure <ref type="figure">5</ref> lists the most frequent verbs (in their stemmed word form) in our dataset. The top 10 stemmed verbs in the condensed dataset are make, get, buy, wed, is, find, live, play, use, write. While this list is interesting from a goal-oriented perspective and largely reasonable, it also highlights some of the limitations of our simplified approach, for example "wed" is the result of mistakenly POS-tagging "wedding" as VBG rather than the result of the verb "wed" occurring in the dataset very often (as we were able to confirm by evaluating occurrences of wed vs. wedding in the dataset). Another question we were interested is whether a minority of users is responsible for issuing explicit intentional queries, or whether a larger set of users issues such queries. This would have implications for the broader relevance of different degrees of intentional explicitness in search queries. In the above figure <ref type="figure" target="#fig_5">6</ref>, users are ranked based on their number of queries in the condensed set, whereas only the first 5000 ranks are shown. Frequency corresponds to the number of queries. While the absolute number of explicit intentional queries in the AOL search query log has been estimated to lie between 1.69% and 3.01% <ref type="bibr" target="#b24">[25]</ref>, the proportion of users in our condensed dataset is significantly higher: 14.37% of the users from the entire dataset appear in the condensed dataset as well. As the data points approximately follow a line on a logarithmic scale, the rank frequency distribution appears to represent a power law -a distribution that is often found in systems that contain traces of social activities or interactions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>THREATS TO VALIDITY</head><p>In the following, we describe threats to validity according to <ref type="bibr" target="#b27">[28]</ref>:</p><p>Construct validity: The constructs we intended to investigate in our study are explicit and implicit intentional queries. Being aware of a broad spectrum of different degrees of explicitness of goals in search queries, we have introduced a simplified distinction for practical purposes. While this distinction enabled us to explore the relevance of different degrees of explicitness, it might be an oversimplification of the underlying phenomenon. However, by defining different degrees of intentional explicitness as a continuous spectrum we hint towards more elaborated future approaches. In addition, relying on partof-speech tagging and involving expert judgment to distinguish between explicit and implicit intentional queries also puts certain limitations on the generality of our approach. By providing a definition for goals we aimed to objectify our process to a certain extent.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Internal validity:</head><p>The experts involved in labeling the training set of queries were two of the authors of this paper, which might introduce a potential bias to our results. We tried to mitigate this bias by requiring the experts to reach consensus on the judgment made, and by involving more than one expert. The decision to exclude shorter queries (n≤2) prohibits us to make statements about a large part of the AOL dataset (~60%). However, our decision was motivated by the inherent difficulty of part-of-speech tagging one or two word English queries correctly, and by the fact that search engine vendors report increasing average query length over the past years 4 .</p><p>External validity: While we are referring to established theories and definitions on goals from different research areas including human-computer interaction, goal-oriented requirements engineering and search query analysis, our work is biased towards the data available in the AOL search dataset <ref type="bibr">(2006)</ref>. Investigating other search query logs with respect to different degrees of intentional explicitness is something we are interested in.</p><p>http://blogs.zdnet.com/micro-markets/index.php?p=27, last accessed <ref type="bibr">Nov 21, 2007</ref> Reliability: We have documented and described our experimental classification approach, and built on existing toolkits such as the WEKA toolkit <ref type="bibr" target="#b26">[27]</ref>, so that reproducing our results is possible within the given limits.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>OUTLOOK</head><p>In future work, it would be interesting to identify more finegrained degrees of intentional explicitness and more precise criteria for distinguishing between them. Mining relations between explicit and implicit intentional queries would be another interesting stream of research, as this could allow for search engines to interactively support goal refinement or goal generalization activities. We have identified a number of seemingly suitable web corpora, such as 43things.com, ehow.com, medhelp.org and others, that could be used in related future research efforts. Another promising field of future work seems to be the development of more precise classification approaches. In order to advance in this direction, approaches could, for example, take context or domain knowledge into account to increase the quality of classification (e.g. eliminating movie titles or queries related to song lyrics). Categorization of explicit intentional queries into taxonomies of human goals <ref type="bibr" target="#b5">[6]</ref> would be another interesting endeavor that could yield fruitful insights into the goals users pursue on the web. Investigating how our results translate to other contexts, such as the 43things.com website -a website that encourages users to share their goals -is another stream of future research we are interested in.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SUMMARY &amp; CONCLUSIONS</head><p>This paper introduced a novel perspective on analyzing search query logs: different degrees of intentional explicitness. We have argued that these degrees represent a continuous dimension, and we have shown by example that they are orthogonal to existing query categories, such as transactional or informational queries. In an effort to make this novel dimension amenable to analysis, we have introduced two simplified degrees of intentional explicitness, and applied it to the AOL search database. Our analysis demonstrated the principle reasonability of our concepts, and highlighted a series of potentials and challenges when studying different degrees of intentional explicitness in search query logs. Learning about different degrees can be considered essential for leveraging the full analytical potential of "databases of intentions" -and for understanding their limitations. In addition, considering different degrees of intentional explicitness appears critical for search engine vendors to better assess the level of service they can or should provide for different user queries. We have presented a theoretical elaboration of different degrees of intentional explicitness and preliminary empirical evidence for the principle reasonability of these concepts. More robust techniques to understand a search query's degree of intentional explicitness could have a significant impact on narrowing the cognitive gap between a user's goals and the query she formulates. Finally, our findings could have a broader impact on web search research, as well as behavioral and social studies of motivation on the web.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 .</head><label>1</label><figDesc>Figure 1. Queries with different degrees of explicitness</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 .</head><label>2</label><figDesc>Figure 2. Distinguishing different degrees of explicitness</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 .</head><label>3</label><figDesc>Figure 3. Top 16 websites in the condensed dataset</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 .</head><label>4</label><figDesc>Figure 4. Top 16 websites in the entire dataset</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 .</head><label>6</label><figDesc>Figure 6. Number of queries per user: rank/frequency plot</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 . A sample of Penn Treebank tags (from [14])</head><label>1</label><figDesc>shows a sample of word classes of the Penn Treebank tag set.</figDesc><table><row><cell>Tag</cell><cell>Description</cell><cell>Example</cell></row><row><cell>NN</cell><cell>Noun, sing. or mass</cell><cell>car</cell></row><row><cell>VB</cell><cell>Verb, base form</cell><cell>eat</cell></row><row><cell>VBG</cell><cell>Verb, gerund</cell><cell>eating</cell></row><row><cell>VBZ</cell><cell>Verb, 3sg pres</cell><cell>eats</cell></row><row><cell>JJ</cell><cell>Adjective</cell><cell>yellow</cell></row><row><cell>WRB</cell><cell>Wh-adverb</cell><cell>how, where</cell></row><row><cell>TO</cell><cell>"to"</cell><cell>to</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>To obtain a training set, we drew a uniform random sample from the set of queries which contain at least one verb3 . Two of the authors labeled instances in the sample consensually based on whether the queries conform to our definition of goals introduced earlier. This resulted in a training set consisting of 98 instances, 59 positives and 39 negatives. While this training set is not necessarily representative for the set of all queries under investigation, it yielded sufficient results given the exploratory nature of our research.</figDesc><table /><note>$ $ VBG, $ VBG DT, VBG DT NN, DT NN $, NN $ $</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 3 . Statistical overview of the condensed dataset</head><label>3</label><figDesc></figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://battellemedia.com/archives/000063.php, last accessed Nov</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="21" xml:id="foot_1">, 2007 © 2008 for the individual papers by the papers' authors. Copying permitted for private and academic purposes.Re-publication of material from this volume requires permission by the copyright owners.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><ref type="bibr" target="#b0">1,</ref>598,612 out of 20,494,002queries contained at least one verb according to the outcome of our part-of-speech tagging process.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_3">http://www.amazon.com http://www.ehow.com http://en.wikipedia.org http://www.geocities.com http://experts.about.com http://www.imdb.com http://www.hgtv.com http://www.findarticles.com http://www.answers.com http://www.superpages.com http://www.nextag.com http://www.bizrate.com http://cgi.ebay.com http://www.faqfarm.com http://www.43things.com http://www.medhelp.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_4">http://www.google.com http://www.myspace.com http://www.yahoo.com http://en.wikipedia.org http://www.amazon.com http://www.imdb.com http://www.mapquest.com http://www.ebay.com http://mail.yahoo.com http://www.bankofamerica.com http://www.geocities.com http://www.hotmail.com http://www.ask.com http://www.bizrate.com http://profile.myspace.com http://www.tripadvisor.com http://www.msn.com</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGMENTS</head><p>We thank Anwar Us Saeed for providing support in implementing parts of the experimental classification approach and Mark Kröll for very helpful comments and criticism. The research of this contribution is funded in part by the Austrian Competence Center program Kplus.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Using part-of-speech patterns to reduce query ambiguity</title>
		<author>
			<persName><forename type="first">J</forename><surname>Allan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Raghavan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. SIGIR Conference on Research and Development in Information Retrieval</title>
				<meeting>SIGIR Conference on Research and Development in Information Retrieval<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="307" to="314" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The Intention Behind Web Queries</title>
		<author>
			<persName><forename type="first">R</forename><surname>Baeza-Yates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Calderón-Benavides</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>González-Caro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. SPIRE 2006</title>
				<meeting>SPIRE 2006</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="98" to="109" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Automatic web query classification using labeled and unlabeled training data</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Beitzel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">C</forename><surname>Jensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Frieder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Grossman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">D</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chowdhury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kolcz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. SIGIR 2005</title>
				<meeting>SIGIR 2005<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="581" to="582" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A taxonomy of web search</title>
		<author>
			<persName><forename type="first">A</forename><surname>Broder</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGIR Forum</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="3" to="10" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">N-gram-based text categorization</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">B</forename><surname>Cavnar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Trenkle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. SDAIR</title>
				<meeting>SDAIR</meeting>
		<imprint>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="161" to="175" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A hierarchical taxonomy of human goals</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Chulef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Read</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Walsh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Motivation and Emotion</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="191" to="232" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">On the optimality of the simple bayesian classifier under zero-one loss</title>
		<author>
			<persName><forename type="first">P</forename><surname>Domingos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Pazzani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">2-3</biblScope>
			<biblScope unit="page" from="103" to="130" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A goal-oriented web browser</title>
		<author>
			<persName><forename type="first">A</forename><surname>Faaborg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lieberman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. CHI 2006</title>
				<meeting>CHI 2006</meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="751" to="760" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">A study using n-gram features for text categorization</title>
		<author>
			<persName><forename type="first">J</forename><surname>Fürnkranz</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
		</imprint>
		<respStmt>
			<orgName>Austrian Institute for Artificial Intelligence</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Tech rep</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">The future of search</title>
		<author>
			<persName><forename type="first">K</forename><surname>Greene</surname></persName>
		</author>
		<ptr target="http://www.technologyreview.com/Biztech/19050/" />
		<imprint>
			<date type="published" when="2007-07-16">July 18th, 2007. July 16 (2007</date>
		</imprint>
		<respStmt>
			<orgName>MIT Technology Review</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Efficient text categorization</title>
		<author>
			<persName><forename type="first">M</forename><surname>Grobelnik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mladenic</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECML-98 Workshop on Text Mining</title>
				<meeting><address><addrLine>Chemnitz, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">The Lumiere project: Bayesian user modeling for inferring the goals and needs of software users</title>
		<author>
			<persName><forename type="first">E</forename><surname>Horvitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Breese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Heckerman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hovel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Rommelse</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. UAI 1998</title>
				<meeting>UAI 1998</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="256" to="265" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">How are we searching the World Wide Web? A comparison of nine search engine transaction logs</title>
		<author>
			<persName><forename type="first">B</forename><surname>Jansen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Spink</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Processing and Management</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="248" to="263" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Speech and Language Processing: An introduction to natural language processing</title>
		<author>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Martin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Linguistics and Speech Recognition</title>
				<imprint>
			<publisher>Prentice Hall</publisher>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">When is information explicitly represented?</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kirsh</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1990">1990</date>
			<publisher>UBC Press</publisher>
			<biblScope unit="page" from="340" to="365" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Automatic identification of user goals in Web search</title>
		<author>
			<persName><forename type="first">U</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cho</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. WWW &apos;05</title>
				<meeting>WWW &apos;05<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="391" to="400" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">GOOSE: A goaloriented search engine with wommonsense</title>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lieberman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Selker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. AH 2002</title>
				<meeting>AH 2002<address><addrLine>London, UK</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="253" to="263" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">NLTK: The Natural Language Toolkit</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">.</forename><surname>Loper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bird</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">The design of everyday things</title>
		<author>
			<persName><forename type="first">D</forename><surname>Norman</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1988">1988</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">A picture of search</title>
		<author>
			<persName><forename type="first">G</forename><surname>Pass</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chowdhury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Torgeson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. InfoScale 2006</title>
				<meeting>InfoScale 2006<address><addrLine>Hong Kong</addrLine></address></meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Where do goals come from: the underlying principles of goal-oriented requirements engineering</title>
		<author>
			<persName><forename type="first">G</forename><surname>Regev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Wegmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. RE 2005</title>
				<meeting>RE 2005<address><addrLine>Washington, DC, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="253" to="362" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Understanding user goals in web search</title>
		<author>
			<persName><forename type="first">D</forename><surname>Rose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Levinson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. WWW 2004</title>
				<meeting>WWW 2004<address><addrLine>New York, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Analysing interaction problems with cyclic interaction theory: Low-level interaction walkthrough</title>
		<author>
			<persName><forename type="first">H</forename><surname>Ryu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Monk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PsychNology Journal</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="304" to="330" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Machine learning in automated text categorization</title>
		<author>
			<persName><forename type="first">F</forename><surname>Sebastiani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="47" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">How do users express goals on the web? -An exploration of intentional structures in web search</title>
		<author>
			<persName><forename type="first">M</forename><surname>Strohmaier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Granitzer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Scheir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liaskos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Yu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">We Know&apos;07 International Workshop on Collaborative Knowledge Management for Web Information Systems, in conjunction with WISE&apos;07</title>
				<meeting><address><addrLine>Nancy, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Natural language information retrieval: TREC-5 report</title>
		<author>
			<persName><forename type="first">T</forename><surname>Strzalkowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carballo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Text REtrieval Conference</title>
				<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="164" to="173" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Data mining: practical machine learning tools and techniques</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">H</forename><surname>Witten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Frank</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Morgan Kaufmann Series in Data Management Systems</title>
				<imprint>
			<publisher>Morgan Kaufmann</publisher>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
	<note>2nd edn</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Case study research: design and methods</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Yin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Social Research Methods</title>
		<imprint>
			<date type="published" when="2002">2002</date>
			<publisher>SAGE Publications</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Yu</surname></persName>
		</author>
		<title level="m">Modelling strategic relationships for process reengineering</title>
				<imprint>
			<date type="published" when="1995">1995</date>
		</imprint>
		<respStmt>
			<orgName>Department of Computer Science, University of Toronto</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD thesis</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
