<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluation of Personalised Information Retrieval at CLEF 2017 (PIR-CLEF): towards a reproducible evaluation framework for PIR</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriella Pasi</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gareth J. F. Jones</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefania Marrara</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Camilla Sanvitto</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Debasis Ganguly</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Procheta Sen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dublin City University</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IBM Research Labs</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Milano Bicocca</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Personalised Information Retrieval (PIR-CLEF) Lab workshop at CLEF 2017 is designed to provide a forum for the exploration of methodologies for the repeatable evaluation of personalised information retrieval (PIR). The PIR-CLEF 2017 Lab provides a preliminary pilot edition of a Lab task dedicated to personalised search, while the workshop at the conference is intended to provide a forum for the discussion of strategies for the evaluation of PIR and extension of the pilot Lab task. The PIR-CLEF 2017 Pilot Task is the rst evaluation benchmark based on the Cran eld paradigm, with the potential bene ts of producing evaluation results that are easily reproducible. The task is based on search sessions over a subset of the ClueWeb12 collection, undertaken by 10 users by using a clearly de ned and novel methodology. The collection provides data gathered by the activities undertaken during the search sessions by each participant, including details of relevant documents as marked by the searchers. The PIR-CLEF 2017 workshop is intended to review the design and construction of this Pilot collection and to consider the topic of reproducible evaluation of PIR more generally with the aim of launching a more formal PIR Lab at CLEF 2018.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The objective of the PIR-CLEF Lab is to develop and demonstrate the e
ectiveness of a methodology for the repeatable evaluation of Personalised Information
Retrieval (PIR). PIR systems are aimed at enhancing traditional IR systems
to better satisfy the information needs of individual users by providing search
results that are not only relevant to the query but also to the speci c user who
submitted the query. In order to provide a personalised service, a PIR system
maintains information about the user and their preferences and interests. These
personal preferences and interests are typically inferred through a variety of
interactions modes between the user with the system. This information is then
represented in a user model, which is used to either improve the user's query or
to re-rank a set of retrieved results list so that documents that are more relevant
to the user are presented in the top positions of the ranked list.</p>
      <p>Existing work on the evaluation of PIR has generally relied on a user-centered
approach, mostly based on user studies; this approach involves real users
undertaking search tasks in a supervised environment. While this methodology has
the advantage of enabling the detailed study of the activities of real users, it has
the signi cant drawback of not being easily reproducible and does not support
the extensive exploration of the design and construction of user models and their
exploitation in the search process. These limitations greatly restrict the scope
for algorithmic exploration in PIR. This means that it is generally not possible
to make de nitive statements about the e ectiveness or suitability of individual
PIR methods and meaningful comparison between alternative approaches.</p>
      <p>
        Among existing IR benchmark tasks based on the Cran eld paradigm, the
closest task to the evaluation of PIR is the TREC Session track1 conducted
annually between 2010 and 2014. However, this track focused only on stand-alone
search sessions, where a \session" is a continuous sequence of query
reformulations on the same topic, along with any user interaction with the retrieved
results in service of satisfying a speci c topical information need, for which no
details of the searcher undertaking the task are available. Thus, the TREC
Session track did not exploit any user model to personalise the search experience,
nor did it allow user actions over multiple search sessions to be taken into
consideration in the ranking of the search output. In FIRE another attempt was made
to set a framework for the evaluation of Personalized Search, under controlled
experimental settings [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        The PIR-CLEF 2017 Pilot Task provides search data from a single search
session with personal details of the user undertaking the search session. This test
collection was created using the methodology described in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Since this was a
pilot activity we encouraged participants to attempt the task using existing
algorithms and to explore new ideas. We also welcomed contributions to the
workshop examining the speci cation and contents of the task and the provided
dataset.
      </p>
      <p>The PIR-CLEF 2017 workshop at the CLEF 2017 Conference brought
together researchers working in PIR and related topics to explore the development
of new methods for evaluation in PIR.</p>
      <p>The remainder of this paper is organised as follows: Section 2 outlines existing
related work, Section 3 provides an overview of the PIR-CLEF 2017 Pilot task,
and Section 5 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Recent years have seen increasing interest in the study of contextualisation in
search: in particular, several research contributions have addressed the task of
personalising search by incorporating knowledge of user preferences into the
search process [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This user-centred approach to search has raised the related
issue of how to properly evaluate search results in a scenario where relevance is
1 http://trec.nist.gov/data/session.html
strongly dependent on the interpretation of the individual user. For this purpose
several user-based evaluation frameworks have been developed, as discussed in
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        A key issue when seeking to introduce personalisation into the search process
is the evaluation of the e ectiveness of the proposed method. A rst category of
approaches aimed at evaluating personalised search systems attempts to perform
a user-centred evaluation provided a kind of extension to the laboratory based
evaluation paradigm. The TREC Interactive track [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and the TREC HARD
track [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] are examples of this kind of evaluation framework, which aimed at
involving users in interactive tasks to get additional information about them
and the query context being formulated. The evaluation was done by comparing
a baseline run ignoring the user/topic metadata with another run considering it.
      </p>
      <p>
        The more recent TREC Contextual Suggestion track [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] was proposed with
the purpose of investigating search techniques for complex information needs
that are highly dependent on context and users interests. As input, participants
in the track were given a set of geographical contexts and a set of user pro les
that contain a list of attractions the user has previously rated. The task was to
produce a list of ranked suggestions for each pro le-context pair by exploiting
the given contextual information. However, despite these extensions, the overall
evaluation was still system controlled and only a few contextual features were
available in the process.
      </p>
      <p>
        TREC also introduced a Session track [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] whose focus was to exploit user
interactions during a query session to incrementally improve the results within
that session. The novelty of this task was the evaluation of system performance
over entire sessions instead of a single query.
      </p>
      <p>
        For all these reasons, the problem of de ning a standard approach to the
evaluation of personalised search is a hot research topic, which needs e ective
solutions. A rst attempt to create a collection in support of PIR research was
done in the FIRE Conference held in 2011. The personalised and Collaborative
Information Retrieval track [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] was organised with the aim of extending a
standard IR ad-hoc test collection by gathering additional meta-information during
the topic development process to facilitate research on personalised and
collaborative IR. However, since no runs were submitted to this track, only preliminary
studies have been carried out and reported using it.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Overview of the PIR-CLEF 2017 Pilot Task</title>
      <p>The goal of the PIR-CLEF 2017 Pilot Task was to investigate the use of a
laboratory-based method to enable a comparative evaluation of PIR. The
pilot collection used during PIR-CLEF 2017 was created with the cooperation of
volunteer users, and was organized into two sequential phases:
{ Data gathering. This phase involved the volunteer users carrying out a
taskbased search session during which a set of activities performed by the user
were recorded (e.g, formulated queries, bookmarked documents, etc.). Each
search session was composed of a phase of query development, re nement
and modi cation, and associated search with each query on a speci c
topical domain selected by the user, followed by a relevance assessment phase
where the user indicated the relevance of documents returned in response to
each query and a short report writing activity based on the search activity
undertaken.
{ Data cleaning and preparation. This phase took place once the data gathering
had been completed, and did not involve any user participation. It consisted
of ltering and elaborating the information collected in the previous phase
in order to prepare a dataset with various kinds of information related to
the speci c user's preferences. In addition, a bag-of-words representation of
the participant's user pro le was created to allow comparative evaluation of
PIR algorithms using the same simple user model.</p>
      <p>For the PIR-CLEF 2017 Pilot Task we made available the user pro le data
and raw search data produced by guided search sessions undertaken by 10
volunteer users as detailed in section 3.1.</p>
      <p>The aim of the task was to use the provided information to improve the
ranking of a search results list over a baseline ranking of documents judged
relevant to the query by the user who entered the query.</p>
      <p>The Pilot Task data were provided in csv format to registered participants in
the task. Access to the search service for the indexed subset of the ClueWeb12
collection was provided by Dublin City University via an API.
3.1</p>
      <sec id="sec-3-1">
        <title>Dataset</title>
        <p>For the PIR-CLEF 2017 Pilot Task we made available both user pro le data and
raw search data produced by guided search sessions undertaken by 10 volunteer
users. The data provided included the submitted queries, the baseline ranked lists
of documents retrieved in response to each query by using a standard search
system, the items clicked by the user in the result list, and the documents relevance
assessments provided by the user on a 4-grade scale. Each session was performed
by the user on a topic of her choice selected from a provided list of broad topics,
and search was carried out over a subset of the ClueWeb12 web collection.</p>
        <p>The data has been extracted and stored in csv format as detailed in the
following. In particular 7 csv les were provided in a zip folder. The le user's
session (csv1) contains the information about each phase of the query sessions
performed by each user. Each row of the csv contains:
{ username: the user who performed the session
{ query session: id of the performed query session
{ category: the top level search domain of the session
{ task: the description of the search task ful lled by the user
{ start time: starting time of the query session
{ close time: closing time of the search phase
{ evaluated time, closing time of the assessment phase
{ end time: closing time of the topic evaluation and the whole session.
The le user's log (csv2) contains the search logs of each user, i.e. every search
event that has been triggered by a users action. The le row contains:
{ username: the user who performed the session
{ query session: id of the query session within the search was performed
{ category: the top level search domain
{ query text: the submitted query
{ document id: the document on which a particular action was performed
{ rank: the retrieval rank of the document on which a particular action is
performed
{ action type: the type of the action executed by the user (query submission,
open document, close document, bookmark)
{ time stamp: the timestamp of the action.</p>
        <p>The le user's assessment (csv3) contains the relevance assessments of a pool of
documents with respect to every single query developed by each user to ful ll
the given task:
{ username: the user who performed the session
{ query session: id of the query session within the evaluation was performed
{ query text: the query on which the evaluation is based
{ document id: the document id for which the evaluation was provided
{ rank: the retrieval rank of the document on which a particular action is
performed
{ relevance score: the relevance of the document to the topic (1 o -topic, 2
not relevant, 3 somewhat relevant, 4 relevant).</p>
        <p>The le user's info (csv4) contains some personal information about the users:
{ username
{ age range
{ gender
{ occupation
{ native language.</p>
        <p>The le user's topic (csv5) contains the TREC-style nal topic descriptions
about the users information needs that were developed in the nal step of each
search session:
{ username, the user who formulated the topic
{ query session, id of the query session which the topic refers to
{ title, a small phrase de ning the topic provided by the user
{ description, a detailed sentence describing the topic provided by the user
{ narrative, a description of which documents are relevant to the topic and
which are not, provided by the user
The le simple user pro le (csv6a) for each user contains the following
information (simple version - the applied indexing included tokenization, shingling, and
index terms weighting):
{ username: the user whose interests are represented
{ category: the search domain of interest
{ a list of triples constituted by:
a term: a word or n-grams related to the users searches
a normalised score: term weight computed as the mean of the term
frequencies in the users documents of interests, where term frequency is the
ratio of the number of occurrences of the term in a document and the
number of occurrences of the most frequent term in the same document.
The le complex user pro le (csv6b) contains, for each user, the same information
provided in csv6a, with the di erence that the applied indexing was enriched by
also including stop word removal:
{ username, the user whose interests are represented
{ category, the search domain of interest
{ a list of triples constituted by:
term, a word or a set of words related to the users searches
normalised score,</p>
        <p>Participants had the possibility to contribute to the task in two di erent
ways:
{ the two user pro le les (csv6a and csv6b) provide the bag-of words pro les
of the 10 users involved in the experiment, extracted by applying di erent
indexing procedures to the documents. The user's log le (cvs2) contains for
each user all the queries she formulated during the query session. The
participant could compare the results obtained by applying their personalisation
algorithm on these queries with the results obtained and evaluated by the
users on the same queries (and included in the user assessment le csv3).
The search had to be carried out on the ClueWeb12 collection, by using the
API provided by DCU. Then, by using the 4-graded scale evaluations of the
documents (relevant, somewhat relevant, non relevant, o topic) provided
by the users and contained in the user assessment le csv3, it was possible
to compute Average Precision (AP) and Normalized Discounted Cumulative
Gain (NDCG). Note that documents that do not appear in csv3 were
considered non-relevant. As further explained in section 3.2, these metrics were
computed both globally (in the literature they are just AP and NDCG) and
for each user query individually, by then taking the mean.
{ The challenge here was to use the raw data provided in csv1, csv2, csv3, csv4,
and csv5 to create user pro les. A user pro le is a formal representation of
the user interests and preferences; the more accurate the representation of
the user model, the higher is the probability to improve the search
process. In the approaches proposed in the literature, user pro les are formally
represented as bags of words, as vectors, or as conceptual taxonomies,
generally de ned based on external knowledge resources (such as the WordNet
and the ODP Open Directory Project). The task request here was more
research oriented: are the provided information su cient to create a useful
pro le? Which information is missing? The outcome here was a report up
to 6 pages discussing the theme of user information to pro ling aims, by
proposing possible integrations of the provided data and by suggesting a
way to collect them in a controlled Cran eld style experiment.</p>
        <p>Since this was a pilot activity we encouraged participants to be involved in this
task by using existing or new algorithms and/or to explore new ideas. We also
welcomed contributions that make an analysis of the task and/or of the dataset.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Performance Measures</title>
        <p>At this preliminary stage, well known information retrieval metrics, such as
Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG)
can be considered to benchmark the participants' systems. However, new metrics
should be investigated to evaluate the task of personalised search.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Towards More Realistic Evaluation of PIR</title>
      <p>The PIR-CLEF 2017 Pilot Task gathered data from the volunteer searchers over
only a single search session, in practice a personalisation model is generally
expected to gather and exploit information across multiple sessions for the searcher.
Over the course of these sessions the searcher will have multiple topics associated
with their informations. Some topics will typically recur over a number of
sessions, and while some search topics may be entirely semantically separate, others
will overlap, and in all cases the users knowledge of the topic will progress over
time and recall of earlier sessions may in some cases assist the searcher in later
sessions looking at the same topic. How to extend the data gathering
methodology to this more realistic and complex situation requires further investigation.
There are multiple issues which must be considered, not least how to engage
volunteer participants in these more complex tasks over the longer collections
periods that will required. Given the multiple interacting factors highlighting
above, work will also be required to consider how to account for these in the
design of such an extended PIR test collection and the process of the
information collection, to enable meaningful experiments to be conducted to investigate
personalisation models and their use in search algorithms.</p>
      <p>
        The design of the PIR-CLEF 2017 Pilot task makes the additional simplifying
assumption of a simple relevance relationship between individual queries posed
to the search engine by the retrieved documents. However, it is observed that
users often approach an IR system with an a more complex information seeking
intention which can require multiple search iteractions to satisfy. Further we can
consider the relationship between the information seeking intention as it develops
incrementally during the multiple search iteractions and item retrieved at each
stage in terms of usefulness to the searcher rather than simple relevance to the
information need [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. However, to operationalise these more complex factors in
the development of a framework for evaluation of PIR is clearly challenging.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>This paper introduced the PIR-CLEF 2017 Personalised Information Retrieval
(PIR) Workshop and the associated Pilot Task. The paper rst introduced
relevant existing work in the evaluation of PIR. The Pilot task is the preliminary
edition of a Lab dedicated to the theme of personalised search that is planned
to o cially start at CLEF 2018. This is the rst evaluation benchmark in this
eld based on the Cran eld paradigm, with the signi cant bene t of producing
results easily reproducible. A pilot evaluation using this collection has been run
to allow research groups working on personalised IR to both experience with
and provide feedback about our proposed PIR evaluation methodology. While
the Pilot Task moves beyond the state-of-the-art in evaluation of PIR, it
nevertheless makes simplifying assumptions in terms of the user's interactions during
a search session, we brie y considered these here, and how to incorporate these
into more evaluation of PIR that is closer to real-world user experience will be
the subject of further work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>C.</given-names>
            <surname>Sanvitto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ganguly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Pasi</surname>
          </string-name>
          <article-title>A Laboratory-Based Method for the Evaluation of Personalised Search</article-title>
          .
          <source>Proceedings of the Seventh International Workshop on Evaluating Information Access (EVIA</source>
          <year>2016</year>
          ),
          <source>a Satellite Workshop of the NTCIR-12 Conference</source>
          , June 7, 2016 Tokyo Japan.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>G.</given-names>
            <surname>Pasi</surname>
          </string-name>
          .
          <article-title>Issues in personalising information retrieval</article-title>
          .
          <source>IEEE Intelligent Informatics Bulletin</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <fpage>37</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. L.
          <string-name>
            <surname>Tamine-Lechani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Boughanem</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Daoud</surname>
          </string-name>
          .
          <article-title>Evaluation of contextual information retrieval e ectiveness: overview of issues and research</article-title>
          .
          <source>Knowledge and Information Systems</source>
          ,
          <volume>24</volume>
          (
          <issue>1</issue>
          ):
          <fpage>134</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Harman</surname>
          </string-name>
          .
          <article-title>Overview of the fourth text retrieval conference (TREC-4)</article-title>
          . In D. K. Harman, editor,
          <source>TREC, volume Special Publication 500-236. National Institute of Standards and Technology (NIST)</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.</given-names>
            <surname>Allan</surname>
          </string-name>
          .
          <article-title>HARD track overview in TREC 2003: High accuracy retrieval from documents</article-title>
          .
          <source>In Proceedings of The Twelfth Text REtrieval Conference (TREC</source>
          <year>2003</year>
          ), pages
          <fpage>2437</fpage>
          ,
          <string-name>
            <surname>Gaithersburg</surname>
          </string-name>
          , Maryland, USA,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Adriel</given-names>
            <surname>Dean-Hall</surname>
          </string-name>
          , Charles L. A.
          <string-name>
            <surname>Clarke</surname>
          </string-name>
          , Jaap Kamps, Paul Thomas, and
          <string-name>
            <surname>Ellen</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Voorhees</surname>
          </string-name>
          .
          <article-title>Overview of the TREC 2012 contextual suggestion track</article-title>
          .
          <source>In Voorhees and Bucklan.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B.</given-names>
            <surname>Carterette</surname>
          </string-name>
          , E. Kanoulas,
          <string-name>
            <surname>M. M. Hall</surname>
            , and
            <given-names>P. D.</given-names>
          </string-name>
          <string-name>
            <surname>Clough</surname>
          </string-name>
          .
          <article-title>Overview of the TREC 2014 session track</article-title>
          .
          <source>In Proceedings of The Twenty-Third Text REtrieval Conference (TREC</source>
          <year>2014</year>
          ), Gaithersburg, Maryland, USA.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Debasis</given-names>
            <surname>Ganguly</surname>
          </string-name>
          , Johannes Leveling, and
          <string-name>
            <given-names>Gareth J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Overview of the personalized and collaborative information retrieval (PIR) track at FIRE-2011</article-title>
          . In Prasenjit Majumder, Mandar Mitra, Pushpak Bhat- tacharyya, L. Venkata Subramaniam, Danish Contractor, and Paolo Rosso, editors, Multilingual Information Access in South Asian Lan- guages - Second International Workshop, FIRE 2010, Gandhinagar, India,
          <source>February 19-21</source>
          ,
          <year>2010</year>
          and Third International Workshop, FIRE 2011, Bombay, India, December 2-
          <issue>4</issue>
          ,
          <year>2011</year>
          , Revised Selected Papers, volume
          <volume>7536</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>227240</fpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Puigcerver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Toselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.A.</given-names>
            <surname>Sanchez</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Vidal</surname>
          </string-name>
          .
          <article-title>Overview of the ImageCLEF 2016 Handwritten Scanned Document Retrieval Task</article-title>
          .
          <source>In Proceedings of CLEF</source>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S.</given-names>
            <surname>Robertson</surname>
          </string-name>
          .
          <article-title>A new interpretation of Average Precision</article-title>
          .
          <source>In Proceedings of the International ACM SIGIR conference on Research and development in information retrieval (SIGIR '08)</source>
          . pp.
          <fpage>689</fpage>
          -
          <lpage>690</lpage>
          . ACM, New York, NY, USA (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>N.J.</given-names>
            <surname>Belkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hienert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mayr-Schlegel</surname>
          </string-name>
          and
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Shah1, Data Requirements for Evaluation of Personalization of Information Retrieval A Position Paper</article-title>
          ,
          <source>In Proceedings of Working Notes of the CLEF 2017 Labs</source>
          . Dublin, Ireland.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>