<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ambiguity of Queries and the Challenges for Query Language Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juliane Stiller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Gäde</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vivien Petras</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Berlin School of Library and Information Science, Humboldt-Universität zu Berlin</institution>
          ,
          <addr-line>Dorotheenstr. 26, 10117 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, a sample set of 510 simple searches from the TEL action log 2009 is analyzed for query content and query language. More than half of the queries are for named entities, which has consequences for query language disambiguation. A manual identification of query language finds that often a definite language cannot be determined, because many named entities are not translated. Problems and challenges for query category and language identification are discussed. Further analysis shows that IP address and interface language are not very strong indicators for determining the query language.</p>
      </abstract>
      <kwd-group>
        <kwd>LogCLEF</kwd>
        <kwd>log file analysis</kwd>
        <kwd>query language</kwd>
        <kwd>named entities</kwd>
        <kwd>query language detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>One of the challenges in cross-lingual information retrieval systems is to identify the
user‟s information need which is expressed in short and decontextualized queries in
multiple languages. To be able to process the query adequately (e.g. stem or translate
correctly), it is essential to determine the language of the query.</p>
      <p>In this paper, we explore what the challenges for query language identification and
classification are, especially in the context of digital libraries. We extracted 510
search queries from the TEL action log corpus 2009 and analyzed them on a
conceptual and linguistic level. Of special interest was the identification of query
characteristics from the corpus. We examine the ambiguity of query terms and the
resulting challenges in determining the query language.</p>
      <p>We also looked at other signals, which might help to determine the language of a
query such as the IP address or the interface language. An analysis of the relationship
between interface language, IP address and country of origin of users and the query
language was carried out.</p>
      <p>
        With our analysis we follow up on results generated from the TEL corpus in the
previous LogCLEF track in 20091, which is briefly summarized here. Data from The
European Library (TEL) and Tumba! were evaluated with the aim to analyze and
classify user queries in order to understand search behavior in multilingual contexts
and to improve search systems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this context, Oakes and Xu [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] analyzed the
query language used under certain interface languages. Furthermore, they found out
that users rarely switch the query language during their sessions. The CELI research
institute tried to identify translations of search queries, assuming that users of a
multilingual digital library will repeat queries in different languages [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Ghorab et al.
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] looked on general statistics for comparing the behavior of users from different
linguistic or cultural backgrounds and identifying communities. They observed that
20% of term changes involved language changes. Lamm et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] investigated user
search performance and interaction with the TEL interface. They defined successful
and not successful user actions and discovered different search behaviors of users
from different countries. Hoffmann et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] pointed out the limitation of query logs
and proposed to gain more context information through the semantic enrichment of
queries by linking them to sources of background information such as Wikipedia
since the most frequent queries are named entities.
      </p>
      <p>The paper is organized as follows: Section 2 briefly describes the TEL simple
search log corpus and provides some general statistics. In Section 3 we present and
discuss the analysis of our sample query corpus introducing categories for this sample
of digital library queries. Section 4 deals with the problems in query category and
language detection and provides examples for characteristic difficulties. We conclude
the paper by investigating signals for language identification analysis that includes
information about the interface language and the language information of the country
the searcher is from.</p>
    </sec>
    <sec id="sec-2">
      <title>2 The TEL Simple Searches</title>
      <p>For LogCLEF 2010, two corpora are provided. The first one contains logs from the
Deutsche Bildungsserver, the second one logs from The European Library (TEL). We
created a multilingual test corpus with TEL queries from the simple search interface.
Log files from two different periods were available. In the first period from January
2007 to June 2008, action log files and server logs could be used for research. We
analyzed the second set of data, which were action logs from the period of January to
December 2009.</p>
      <p>This one-year log data file contained 762,485 lines of log entries. We extracted
queries, which either contained a simple search or an advanced search indicator
(search_sim / search_adv) in the log file entry. Queries, which were entered on the
result page or the full record view were not selected. Log entries, which contained
a simple search made up 137,827 of the entries, advanced search 32,528 entries.</p>
      <p>As the advanced search offers some categorizations of the query by adding certain
facets like” title” or “author” we used only simple search queries for our sample</p>
      <sec id="sec-2-1">
        <title>1 http://www.uni-hildesheim.de/logclef/</title>
        <p>corpus as they do not give any context information regarding the intent of the user.
Therefore, the query entered by the user and the information saved in the logs are the
only signals the system can get.
2,guest,127.0,5E390977758E505C871AFB99E1342988,en,"(""t
oto"")",search_sim,"/en/search/collections/a0268,a0365/
",0,,,,2009-01-28 00:00:00.0
The query is shown in the sixth column field of the log entry (see Figure 1).</p>
        <sec id="sec-2-1-1">
          <title>2.1 Query Length</title>
          <p>
            As we extracted a sample corpus from the simple search queries, which initiated a
search session, we also looked at the entire simple search queries to extract
information about the average length of the queries. One important aspect is the
length of the queries which can indicate the amount of textual information embedded
in a query. Early studies investigated the query length in web search engines, for a
comparison of the major studies see Jansen &amp; Poosch [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. To name one of these,
Spink et al. [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] did a longitudinal study of Excite transaction logs and found that over
the years there is a change in content users are intending to look for but not in the
structure of the queries. They also found that over the years the average in query
length was between 2.4 and 2.6 query terms.
          </p>
          <p>
            A first analysis of queries in the cultural heritage domain was conducted by Jones
et al. [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] who looked at queries from the New Zealand Digital Library. Similar
patterns as for the web search queries were found. The average number of search
terms in a query is 2.5.
          </p>
          <p>We analyzed the words per query separated by white space in the simple search
corpus. On average, the queries consist of 2.34 terms. Approximately, 43% of all
simple search queries contain only one term, 27% two terms, 13% three terms, 7%
four terms and only 10% of all queries contain 5 words and more. Our results validate
again that query terms usually consist of a few keywords and therefore the correct
language identification is very challenging.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.2 Most Frequent Queries</title>
          <p>To look at the most frequent queries of a retrieval system is an indication of trends
and content people expect to find in the digital library. The most frequent queries in
the TEL 2009 log files are “toto”, followed by “mozart” and “napoleon” (see table
1). From the 10 most frequent queries, 7 contain named entities and expressed a
search for a person. These queries already indicate the challenges in determining the
language of the query as they cannot be assigned to one particular language.
Apparently, queries are not very often repeated since the most frequent one appears
only 400 times.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 A Sample Corpus for Query Analysis</title>
      <p>From the simple searches, we extracted randomly 510 queries. The aim was to gain
information about the query language, topic and intent of the queries. Another goal
was to investigate the distribution of proper names and a categorization of different
query types regarding their content.</p>
      <sec id="sec-3-1">
        <title>3.1 Sample Corpus Query Statistics</title>
        <p>In line with our findings about the entire simple search corpus our sample corpus
showed an average in query length of 2.43 terms per query. More than ¾ of the
queries consist of one term (41.5%), two terms (25%) or three terms (14.5%). 7% of
the queries are composed of 4 terms, 12% have 5 and more terms. Ten queries contain
only numbers such as ISBN /ISSN or dates and four queries consist of less than 3
characters.</p>
        <p>Through a manual conceptual analysis of the extracted query terms we categorized
the queries according to their content. We focused on flagging those queries, which
might be problematic in terms of language identification and query translation. This
includes proper names, uniform book titles and other entities requiring special
recognition when processed during a cross-lingual search session. We subsumed these
categories under the definition of named entities. Three different query types were
defined in the context of named entities:
2 The queries “toto”, “a” and “test” are queries used for testing by the TEL office. It proves
how important a thorough understanding of the data shown in log files is. To interpret user
behavior by log file data correctly, it is necessary to exclude misleading log entries such as
test data or search engine crawlers.
This means that only 38% of our sample corpus queries could be readily translated
with a dictionary-based translation approach - if the query language could be
determined.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Development of Query Categories</title>
        <p>Previous studies have dealt with search engine query classification according to their
intent [10], search goals [11] or topics [12].</p>
        <p>A log file analysis of English Altavista queries showed that 20% of the queries are
navigational, 48% are informational and the rest (30%) are transactional queries
(excluding all sexual oriented queries) [10]. Rose and Levinson [11] created a
hierarchy of search goals where the first level resembled Broder's taxonomy changing
the transactional query to a search goal for resources. They found a greater proportion
(around 61%) of informational queries and a smaller of navigational ones (around
15%).</p>
        <p>Other studies focused on an automatic query classification [13]. The shortness of
queries poses great challenges for automated query classification [14].
For our sample corpus we identified the following 6 named entity categories and two
for non-named entities terms (table 3).</p>
        <sec id="sec-3-2-1">
          <title>3 This category also contained 10 numbers expressing ISBN or ISSN, one URL.</title>
          <p>As shown in table 4, besides the topical searches such as “qualificações salário” users
are mainly searching for persons: “dante”, followed by geo related topics, mainly
countries or towns: “japan”, and titles: “social support and health status: a literature
review (1997)”. Queries that can be assigned to more than one class are often a
combination of author (person) and work: “all the russias by e. c. phillips” and
counted for multiple categories (see table 4).</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3 Query Languages</title>
        <p>Table 5 shows the languages in which queries were expressed more than 10 times. It
is striking that most of the queries are ambiguous terms where it was not possible to
identify the language. This was mainly the case for named entities such as persons or
geographic terms. In different languages they have normally no spelling variants e.g.
“paris”. Several queries are not named entities but still ambiguous across languages
e.g. “administration” or “culture”. Besides the languages listed we found queries in
13 other languages. Additionally, 4 Latin terms appeared in the corpus which
expressed a very specific information need, e.g. “neuroptera myrmeleontidae”.
We compared our manual language identification with an automated process using the
Google Translate language detector4.
The manual analysis showed that 39% of all queries cannot be assigned to a special
language, which complicates the automatic language detection. In contrast to our
analysis, Google did not detect any ambiguous queries with respect to language. With
the Google API, more than 60% of the queries were detected to be English while our
manual analysis identified 31% as English terms. The significant difference can be
explained with a probable English language detection of the ambiguous queries
because of an English language bias in the training or general Web data that Google
uses.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Problems in Query Classification and Language Detection</title>
      <p>It is a well known fact that the language identification of search engine queries is
challenging but very important especially for multilingual information access. The
correct language detection is necessary for further processing of the query such as
stemming, spell checking, disambiguation or translation and the decision in which
language the result list should be presented.</p>
      <p>Web search queries are normally very short. Due to the large number of named
entity queries – especially in the cultural domain - the automatic language detection
has to deal with ambiguous terms or even terms that are not easily assigned to a
certain language such as “Franz Kafka”. In our sample corpus 61.96% of the queries
contain named entities and 54.70% of the queries consist only of named entities. As
table 6 shows, from the 279 named entity queries we determined 167 as ambiguous.
These are terms that occur in many languages such as: “Paris” or “Madonna”. Of
course there are also named entities that can be assigned to the different languages
such as: “Eiffelturm” (German), “Tour Eiffel” (France).</p>
      <p>It is also shown, that queries which contain a named entity and another word are less
ambiguous than those that only contain a named entity.</p>
      <p>The named entity recognition is a very important aspect concerning the
identification of a query language. For example, the query term “barber” can either
refer to the English word for “hair dresser” or to the composer “Samual Barber”. In
this case the correct detection of language alone does not ensure the identification of
the user information need or intent.</p>
      <p>For search engines, there are also cases where correct language detection does not
necessarily imply that the user wants to see the results in the same language. For
example, although the identification of the language for the query ”candida stellata” is
Latin, a user entering this query from Germany, would most probably want to see
German web pages, rather than web pages in Latin.</p>
      <p>Table 6 shows the ratio of ambiguous terms in the sets of queries containing named
entities and not containing named entities.
This shows that many queries where the language cannot be clearly determined are
expressing a search for a named entity. The 23 ambiguous terms which were not
categorized as named entities are numbers and terms existing in several languages
such as “culture” or “administration” or characters such as “a”. It is also worth to look
at the ambiguity of different categories as shown in table 7.
In the person category, the proportion of queries where the language cannot be
identified is much higher than for geographic entities or titles of work. This is mainly
due to the fact that names of persons do not change across languages, but it is
fundamental that a CLIR system recognizes these entities. Standardized name
authority files such as PND (Personennamendatei) or ULAN (Union List of Artist
Names) are essential to fulfill this task.</p>
    </sec>
    <sec id="sec-5">
      <title>5 IP Address, Interface Language and Query Language</title>
      <p>As demonstrated before, language identification is very hard to implement correctly
in an automated manner.</p>
      <p>It is therefore reasonable to incorporate other aspects in the language detection that
could hint at the language the user is searching in. Especially the correlation between
the query language, the corresponding IP address and the interface language is of
interest. Of course, the IP address might not be reliable in every case. The same user
may use several IP addresses or several users can share one IP address. Furthermore,
it is possible to hide the true location by using proxies. We are also aware of the fact
that users rarely switch the interface language and that many of them work with the
default English interface.</p>
      <sec id="sec-5-1">
        <title>5.1 Interface and Query Language</title>
        <p>
          Interface language as a signal to detect the query language was also analyzed during
the last LogCLEF track by Oakes &amp; Xu [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. They found that for the most common
interface languages, namely German, French, English, Portuguese, Dutch and Italian,
the most common query language was identical to the interface language.
        </p>
        <p>Since we also flagged terms, which could not be assigned to a particular language a
slightly different dataset resulted. Looking at the number of queries which were
entered under the same interface language as the query language the proportion of the
total number of queries in these languages is very small. This is probably due to the
fact that not many users switch their interface language.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2 IP Addresses and Query Language</title>
        <p>For the shortened IP addresses, which were given in the log files, the respective
country was identified. To make a statement about the relationship regarding country
of origin and query language we determined the official language in these countries.
The rows of table 10 show the languages spoken in the countries where queries
originated from. This signal seems to be less strong than looking at the interface
language. Looking for example at the 50 queries originating from German speaking
countries5 like Germany or Austria, only 7 were German whereas the other languages
of these queries were English (17), Spanish (3), Polish (1), other languages (2) and 20
queries, which could not be assigned to a single language.
5 Included countries are: Germany, Austria and Liechtenstein. We excluded countries with
several spoken languages such as Switzerland and Belgium.
The correct query language identification is decisive for language-dependent retrieval,
the disambiguation and translation of query terms. It is also used in many retrieval
systems to determine the language of the results presented. Our analysis shows that
search query language identification and named entity recognition need to come
together especially within a cultural heritage context. Many queries are expressing a
search for proper names of persons, geographic entities or titles of work. Most of
these queries cannot be assigned to a certain language. We also showed that signals
commonly assumed to give indications about a user„s preferred language are not as
strong as expected.</p>
        <p>The retrieval system, however, should be able to identify named entities and
language preferences and be able to present the users results in a language they can
understand and enable them to judge the relevance of the documents. More research is
therefore needed not only on language detection, a problem that might not be solved
entirely – but also on named entities and their presentation in the search process.
12.Jansen, B. J., Spink, A., Bateman, J., Saracevic, T.: Real Life Information Retrieval: A
Study of User Queries on the Web. SIGIR Forum: A Publication of the Special Interest
Group on Information Retrieval, 32, 5-18 (1998)
13.Baeza-Yates, R. A., Calderón-Benavides,L., González-Caro, C.N.: The Intention Behind
Web Queries. In: Lecture Notes in Computer Science. Vol. 4209, String Processing and
Information Retrieval. 13th International Conference, SPIRE 2006, Glasgow, UK, October
11 - 13, 2006, 98‐109 (2006)
14.Kang, I., Kim, G.: Query Type Classification for Web Document Retrieval. In: Proceedings
of the 26th Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval, SIGIR '03. ACM, New York, NY, 64-71 (2003)</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agosti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Nunzio</given-names>
            <surname>Di</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Yeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Mani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Doran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Schulz</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.M.:</surname>
          </string-name>
          <article-title>LogCLEF 2009: the CLEF 2009 Multilingual Logfile Analysis Track Overview</article-title>
          .
          <source>In: Working Notes of the Cross Language Evaluation Forum (CLEF)</source>
          .
          <article-title>(</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Oakes</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>A Search Engine Based on Query Logs, and Search Log Analysis at</article-title>
          the University of Sunderland, In: Working Notes, LADS Workshop,
          <string-name>
            <surname>Cross-Language Evaluation</surname>
          </string-name>
          Forum
          <year>2009</year>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bosca</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dini</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Cacao Project at the LOGCLEF Track</article-title>
          , In: Working Notes, LADS Workshop,
          <string-name>
            <surname>Cross-Language Evaluation</surname>
          </string-name>
          Forum
          <year>2009</year>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ghorab</surname>
            ,
            <given-names>M. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leveling</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>G. J. F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wade</surname>
          </string-name>
          , V.:
          <article-title>TCD-DCU at LogCLEF 2009: An Analysis of Queries, Actions, and Interface Languages</article-title>
          . In: Working Notes, LADS Workshop,
          <string-name>
            <surname>Cross-Language Evaluation</surname>
          </string-name>
          Forum
          <year>2009</year>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lamm</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kölle</surname>
          </string-name>
          , R.:
          <article-title>Search Path Visualization and Session Performance Evaluation with Log Files from The European Library</article-title>
          . In: Working Notes, LADS Workshop,
          <string-name>
            <surname>Cross-Language Evaluation</surname>
          </string-name>
          Forum
          <year>2009</year>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Rijke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huurnink</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meij</surname>
          </string-name>
          , E.:
          <article-title>A Semantic Perspective on Query Log Analysis</article-title>
          .
          <source>In: Working Notes</source>
          , LADS Workshop,
          <string-name>
            <surname>Cross-Language Evaluation</surname>
          </string-name>
          Forum
          <year>2009</year>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jansen</surname>
            ,
            <given-names>B.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pooch</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>A Review of Web Searching Studies and a Framework for Future Research</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          .
          <volume>52</volume>
          (
          <issue>3</issue>
          ),
          <fpage>235</fpage>
          ‐
          <lpage>246</lpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Spink</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jansen</surname>
            ,
            <given-names>B.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolfram</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saracevic</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <string-name>
            <surname>From E-Sex to</surname>
          </string-name>
          E-Commerce:
          <article-title>Web Search Changes</article-title>
          . Computer.
          <volume>35</volume>
          (
          <issue>3</issue>
          ),
          <fpage>107</fpage>
          ‐
          <lpage>109</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>S.J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mcnab</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>An Analysis of Usage of a Digital Library</article-title>
          .
          <source>In: Proceeding of Second European Conference on Digital Libraries</source>
          ,
          <fpage>261</fpage>
          ‐
          <lpage>277</lpage>
          (
          <year>1998</year>
          )
          <fpage>10</fpage>
          .
          <string-name>
            <surname>Broder</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A Taxonomy of Web Search</article-title>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>36</volume>
          (
          <issue>2</issue>
          ),
          <fpage>3</fpage>
          ‐
          <lpage>10</lpage>
          (
          <year>2002</year>
          )
          <fpage>11</fpage>
          .
          <string-name>
            <surname>Rose</surname>
            ,
            <given-names>D. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levinson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Understanding User Goals in Web Search</article-title>
          .
          <source>In: WWW „04: Proceedings of the 13th International Conference on World Wide Web</source>
          ,
          <fpage>13</fpage>
          -
          <lpage>19</lpage>
          . New York, NY, USA: ACM (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>