<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ECNU at CLEF PIR 2018 : Evaluation of Personalized Information Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qingchun Bai</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiayi Chen</string-name>
          <email>jycheng@ica.stc.sh.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qinmin Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liang He</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Ryerson University</institution>
          ,
          <addr-line>Toronto</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science and Software Engineering, East China Normal University</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Personalized Information Retrieval (PIR) is an e ective solution when purposes of queries are issued but users receive the same results. The PIR-CLEF 2018 task aims to explore the methods and evaluations of PIR. By analyzing the provided data we generate query level and session level baselines. We compare baselines and extended models we propose, and experiment results show that insu cient relevance information has a negative impact on the performance of models and evaluation process. Since personalization ranking based on typical users interests is not e ective in reality, especially when the results of relevance feedback is not satisfactory, we consider that the PIR task should not only relate to context, but to the various search intentions. We propose several suggestions about data and evaluation process.</p>
      </abstract>
      <kwd-group>
        <kwd>Personalized Information Retrieval</kwd>
        <kwd>Query Expansion</kwd>
        <kwd>Data Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The PIR-CLEF 2018 task aims to explore the methods and evaluations of
Personalized Information Retrieval (PIR). PIR has drawn great attention to
help understand user's behaviors in the interaction with IR systems. Personalized
search is an e ective solution when queries purposes are issued but users received
the same results.</p>
      <p>
        Existing works [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3,5,4</xref>
        ] have proved that personalized ranking can be
considered as a good solution for PIR task. The foundation and key of personalized
ranking service is how to obtain persons attempt to frame user's interest model.
Various penalization strategies are carried out, for example, In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a method is
discussed to identify user's interest automatically, which based on the
assumption that a user's general preference may help the search engine disambiguate
the true intention of a query. The approach described in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] considered a user's
prior interactions with a wide variety of content to personalize that user's
current web search. More recently, in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a dynamic personalized ranking model is
proposed to recommend the most relevant information which combined di erent
sources of information.
      </p>
      <p>
        For another piece of research, the research focus on understanding the user's
intent of search session information[
        <xref ref-type="bibr" rid="ref10 ref2 ref6 ref9">2,6,9,10</xref>
        ], results show that it is possible to
understand the user's intent, since people all have intentions in the process of
seeking information and also have reasons to believe these information seeking
intentions.
      </p>
      <p>These prior works on personalized information retrieval have focused on
independent issues with independent data. A few of them also have focused on
the analysis of required data and evaluation of personalized ranking. In fact, we
consider that personalized ranking based on typical user's interests is not e
ective in reality, especially when the results of relevance feedback are not good,
the re-ranking model cannot achieve the desired result.</p>
      <p>Therefore, in this study, we aim to explore the potential of the task and
understand both the data and evaluation of the personalized search and ask the
following research questions:
{ If the provided PIR data can satis ed the task?
{ How to achieve personalization, and what kind of data is needed to support
this research?
{ How to evaluate it?
To achieve this aims, this paper is organized as follows. In section 2, we brie y
review and analysis the current PIR data. In section 3, we describes our baselines
method to the data in detail. In section 4, experiments and results are presented.
Finally, we discuss the task and evaluation in section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data Review and Analysis</title>
      <p>We present the statistics about dataset in this section, the review of current
dataset described in Section 2.1, and explores the potenital of the dataset. Then
we analysis the query session given by the data in Section 2.2.</p>
      <p>Statistics about Dataset In this section, we brie y review current dataset of
PIR task and provide a comprehensive analysis on this task. In PIR-CLEF 2018,
data are provided with six csv les including information below:
{ the search tasks (sessions) of ten users;
{ the queries submitted by all users and all documents returned by ClueWeb</p>
      <p>API;
{ relevance scores labeled by users and original ranks of documents;
{ personal information like gender and job;
{ remarks written by users;
{ statistical information of terms in queries.</p>
      <p>Statistics about Sessions A user can submit several queries in a query
session. These queries are aiming at di erent objectives. To nd the objective of
the user, we gather all queries as one query to represent the objective. We need
to submit this query to the API and evalutate the performance.</p>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <p>The users submit their queries to the ClueWeb API3 and annotate whether
the returned documents are relevant. The users divide relevance into four grades:
relevant, somewhat relevant, not relevant and o topic with scores ranging from
four to one. According to the description above, we de ne that documents are
relevant to the query only when those scores are four. Figure 1 shows the
framework of the personalized ranking part.</p>
      <p>Query(Q)</p>
      <p>ClueWeb API
Personalized Ranking</p>
      <p>Personalized Ranking</p>
      <p>User(U)</p>
      <p>Query
Session(QS)
Query(Q)
Testing Sample
(Q,U)</p>
      <p>Topic-Sensitive</p>
      <p>User model
Query Expansion</p>
      <p>Personalized</p>
      <p>Ranking
Score(Q,U,D)</p>
      <p>Baselines We propose two baselines: query-level baseline and session-level
baseline. In query-level baseline, we evaluate each query independently. While in
session-level baseline, we collect all relevant documents of queries in the session
and consider them as the relevant documents of the search task. We evaluate
the performance of each query on its search task. We also assume that queries
belonging to one session represent di erent aspects of user's need. So we sum up
all queries in a session to one and submit it to the ClueWeb API. We evaluate
and display the performance of each session baseline in Table 1.
Query Expansion We rst take user's feedback into account. After the user
labels the documents with relevance score, we choose ten words with highest
frequency in relevant documents as expansion words and add them to the original
session-level query. We submit the new queries to the API and evaluate the
performance of this method. Then we assume top-20 documents returned from
API are relevant and select ten most frequent words in documents as expansion
words. The rst method is listed in Table 2 with su x \RF" while the second
one is named with su x \PRF".
3 http://clueweb.adaptcentre.ie/WebSearcher/search?query=queryString
&amp;page=pagenumber
Topic-Sensitive User model We propose a language modeling approach to
personalized search based on users' search behavior and preference. To capture
the user's searching interesting and implicit purpose, we propose to use
LDAbased approach for simulating users which does not merely focus on simulating
the search behaviours but also considers search sessions of the task.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <p>4.1</p>
      <p>Performance of Each Query Session</p>
      <p>We further make a comparison between baselines and our methods in Table 2.
In query-level evaluation, our methods are worse than baselines. In session-level
evaluation, relevance feedback method performs better than baselines because we
can obtain users' interests from relevant documents. However pseudo relevance
feedback and LDA methods get worse performance than baseline.</p>
      <p>It is within our expectation that all query-level evaluations are worse than
that of baseline. We think this phenomenon is caused by the lack of relevant
documents. In the provided data, each user only labels about twenty documents
whose original ranks range from 0 to 100 and few of them are relevant. Assuming
the users get 100 documents returned per query, eighty percent of relevance
information is lost. In this scenario, any document not occurring in this list is
considered as irrelevant which means a relevant document can be annotated as
irrelevant from the perspective of user. Insu cient relevance information even
make us hard to evaluate on certain queries. In session 154, 176 and 204, all
documents are irrelevant so that even though we nd the potentially relevant
documents, we cannot know whether they are relevant from the angle of users.</p>
      <p>We also think the evaluation process should be upgraded. Unlike existing
search task like TREC tracks, personalized information retrieval focuses more on
individual di erences. In PIR-CLEF 2018, some users receive the same task but
the queries they submit are di erent. For example, user 8,11 and 12 receive the
same task of traveling, but their queries are about Dublin, Tokyo and Barcelona.
The individual di erences are expressed by queries so we think this task is still
an ad-hoc retrieval task. So if we want to focus on individual di erences, we
need more users to join in the data collection.</p>
      <p>We suggest that the complete logs of users can be provided. By analyzing the
relevant documents annotated by users, we get an improvement in session aspect
as listed in Table 2. However our method still can be improved. In this task we are
provided with users' actions such as opening document and submitting a query.
But we think these data is not su cient enough because only part of actions are
provided so that we cannot analyze users' preference by their actions.</p>
      <p>In conclusion, we put up with three suggestions. The rst one is that more
complete relevance labels should be provided. Then we think more participants
can join in the data collection to provide more personalized data. The last one
is that we think detailed user actions can help improve the performance.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>We have proposed a view of PIR task that implies that personalization should
be with respect not only to context, but to the various information that people
have during the course of an information search session. We focus on taking user's
feedback into account and propose two extend models : Query Expansion method
and Topic-Sensitive user model. We rst conduct experiments on each query
session, results show that di erent session have the wide variations performance.
Then we compared baselines with extended models. Noting that topic-sensitive
strategy does not work very well, insu cient relevance information has a negative
impact on the performance of models and evaluation process. We will extract
more useful features and focus on the learning to rank approaches in the future.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>E.</given-names>
            <surname>Ali</surname>
          </string-name>
          .
          <article-title>Dynamic personalized ranking of facets for exploratory search</article-title>
          .
          <source>In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>1379</volume>
          {
          <fpage>1379</fpage>
          . ACM,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Carevic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lusky</surname>
          </string-name>
          , W. van Hoek, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Mayr</surname>
          </string-name>
          .
          <article-title>Investigating exploratory search activities based on the stratagem level in digital libraries</article-title>
          .
          <source>International Journal on Digital Libraries</source>
          , pages
          <volume>1</volume>
          {
          <fpage>21</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Song</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Wen</surname>
          </string-name>
          .
          <article-title>A large-scale evaluation and analysis of personalized search strategies</article-title>
          .
          <source>In International Conference on World Wide Web</source>
          , pages
          <volume>581</volume>
          {
          <fpage>590</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>W.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Tan</surname>
          </string-name>
          .
          <source>Multiple Attribute Aware Personalized Ranking</source>
          . Springer International Publishing,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>W.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Tan</surname>
          </string-name>
          .
          <article-title>Personalized ranking with pairwise factorization machines</article-title>
          .
          <source>Neurocomputing</source>
          , 214(C):
          <volume>191</volume>
          {
          <fpage>200</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Mitsui</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Belkin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Shah</surname>
          </string-name>
          .
          <article-title>Predicting information seeking intentions from search behaviors</article-title>
          .
          <source>In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>1121</volume>
          {
          <fpage>1124</fpage>
          . ACM,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>F.</given-names>
            <surname>Qiu</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Cho</surname>
          </string-name>
          .
          <article-title>Automatic identi cation of user interest for personalized search</article-title>
          .
          <source>In Proceedings of the 15th international conference on World Wide Web</source>
          , pages
          <volume>727</volume>
          {
          <fpage>736</fpage>
          . ACM,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Teevan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Horvitz</surname>
          </string-name>
          .
          <article-title>Personalizing search via automated analysis of interests and activities</article-title>
          .
          <source>In ACM SIGIR Forum</source>
          , volume
          <volume>51</volume>
          , pages
          <fpage>10</fpage>
          {
          <fpage>17</fpage>
          . ACM,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. W. van Hoek and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Carevic</surname>
          </string-name>
          .
          <article-title>Building user groups based on a structural representation of user search sessions</article-title>
          .
          <source>In International Conference on Theory and Practice of Digital Libraries</source>
          , pages
          <volume>459</volume>
          {
          <fpage>470</fpage>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Zhang.</surname>
          </string-name>
          <article-title>Session search modeling by partially observable markov decision process</article-title>
          .
          <source>Information Retrieval Journal</source>
          ,
          <volume>21</volume>
          (
          <issue>1</issue>
          ):
          <volume>56</volume>
          {
          <fpage>80</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>