<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chaoyu Ye</string-name>
          <email>psxcy1@nottingham.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Porcheron</string-name>
          <email>me@mporcheron.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Max L. Wilson</string-name>
          <email>max.wilson@nottingham.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Mixed Reality Lab, University of Nottingham</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>While there is an increasing amount of interest in evaluating and supporting longer \search sessions", the majority of research has focused on analysing large volumes of logs and dividing sessions according to obvious gaps between entries. Although such approaches have produced interesting insights into some di erent types of longer sessions, this paper describes the early results of an investigation into sessions as experienced by the searcher. During interviews, participants reviewed their own search histories, presented their views of \sessions", and discussed their actual sessions. We present preliminary ndings around a) how users understand sessions, b) how these sessions are characterised and c) how sessions relate to each other temporally.</p>
      </abstract>
      <kwd-group>
        <kwd>HCIR</kwd>
        <kwd>Interactive</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>Sessions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Information Retrieval (IR) specialists are becoming
increasingly concerned with users who continue to search
beyond a few queries or a few minutes1. Although
Information Retrieval, and even Interactive IR, evaluations are well
known, research is recognising situations where people
continue to search after nding seemingly useful results [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Some might be in a larger session involving several related
subtopics, while others may continue to search for
entertaining videos until they struggle to nd `good' results [
        <xref ref-type="bibr" rid="ref1 ref3">3,
1</xref>
        ]. Consequently, researchers are interested in how to
evaluate, measure, and ultimately better support searchers who
continue to search for extended sessions.
      </p>
      <p>
        Most research into extended search sessions, described in
detail below, has focused on analysing search engine logs [
        <xref ref-type="bibr" rid="ref1 ref4 ref8">1,
4, 8</xref>
        ] by dividing the logs using obvious periods of inactivity
and either qualitatively [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or quantitatively [
        <xref ref-type="bibr" rid="ref4 ref8">4, 8</xref>
        ]
characterising them. Some research has investigated human web
behaviour and user goals qualitatively through interviews,
1The recent NII Shonan event and the forthcoming Dagstuhl
are both, for example, focused on this topic.
      </p>
      <p>Presented at EuroHCIR2013. Copyright c 2013 for the individual papers
by the papers’ authors. Copying permitted only for private and academic
purposes. This volume is published and copyrighted by its editors.
however our research has focused on using such methods to
better understand real extended search sessions. This
paper begins by rst summarising literature on sessions and
then describes our research methods and preliminary
ndings about extended search sessions.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>UNDERSTANDING “SESSIONS”</title>
      <p>
        Although investigations into web sessions can be dated
back to around 20 years ago (e.g. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), the concept of a session
still lacks clear de nition. A number of researchers have
generated diverse de nitions of a session using di erent
delimiters such as cuto time, query context, or even the status of
the browser windows (e.g. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). In 1995, Catledge and Pitkow
used a \timeout", the time between two adjacent activities,
to divide user's web activities into sessions and found that
a 25.5 minute timeout was best [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Their research,
however, was focused on general web activity rather than search
sessions, but their 25.5 minutes timeout has been used by
many others. He and Goker later aimed to nd the optimal
interval that would divide large sessions, whilst not a
ecting smaller sessions [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Their analysis found that optimal
timeout values vary between 10 and 15 minutes.
      </p>
      <p>
        In 2006, Spink et al [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] de ned a session as the entire
series of queries submitted by a user during one interaction
with a search engine, and one session may consist of single
or multiple topics. Their approach focused on topic changes
rather than temporal breaks, yet it is perhaps unclear how
they determined \one interaction" with a search engine.
      </p>
      <p>
        A clear de nition has also been cited as an important
challenge in other research. While focusing on \revisitation"
behaviour, Jhaveri and Raiha [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Tausher and
Greenberg [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] found it challenging to di erentiate between
insession revisitation and post-session revisitation, for which
a clear detection of session boundaries would be useful.
      </p>
      <p>
        When focusing on searching, rather than web sessions,
some use the concept of a \query session". Nettleton et al
de ned a query session as at least one query made to a
search engine, together with the results which were clicked
on and other user behaviours as well [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. They also evaluated
the \session quality" based on the number of clicks, hold
time and ranking of selected documents, and they used these
measures to help determine the di erence between sessions.
      </p>
      <p>
        To summarise the di erent approaches used to de ne
sessions, Jansen et al. provided a summary of the three most
representative strategies [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], as shown in Table 1. As IP and
cookies were utilised to identify a user, the most frequent
strategies involve temporal cuto s and topic change.
      </p>
      <p>
        The methods summarised in Table 1 are primarily focused
on temporal and topical boundaries, but other research has
shown clear challenges to these strategies. Mackay et al, in
2008, examined tasks that frequently occur as multi-session
tasks, where something thematically consistent occurs over
multiple sessions [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Moreover, research into web, browser,
and browser-tabs, has found that some users often keep web
pages spread out over time, especially in the information
gathering tasks, e.g. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These situations indicate that
the logged web behaviour may di er signi cantly from the
actual behaviours and intentions of the searchers. This
research focuses on the searcher's experience of web sessions,
such that others may continue to develop strategies for more
accurately dividing large scale logs into sessions.
      </p>
    </sec>
    <sec id="sec-3">
      <title>EXPERIMENT DESIGN</title>
      <p>
        To understand and characterise real extended search
sessions, we employed similar interview methods to Sellen et
al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Participants were engaged in a 90-120 minute
interview about their own search behaviour. To ground the
interviews in real data, participants focused on printouts of their
own web history, and we used the card sorting technique [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
to probe their mental models of sessions. The procedure was
approved by the school ethics board and pilot tested.
      </p>
      <p>Participants began by providing their web history and
they were advised to edit their history in advance should
they wish to keep some logged activities private2. These logs
were gathered by importing their search histories to Firefox
(if not already there), and creating an XML export using
\History Export 0.4"3. This log was then structured and
preliminarily processed using a) automatic methods to nd
search URLs, and b) manual investigation to nd possible
sessions to discuss in the interview. After providing
demographic information, participants spent around 20 minutes
examining the structured printout of their history, using a
pen to mark sessions. These sessions, unless duplicates of
prior sessions, were written onto separate cards for later
sorting until around 20 cards were produced. Each card had
a number, a title, activity purpose, included history items
from the history list and also whether it has been completed
successfully or not; an example is shown in Figure 1.</p>
      <p>
        The remainder of the interview involved rst open, and
then closed card sorting. Open card sorting allowed the
participants to classify and group the sessions according to
their own ideas, whilst closed card sorting allowed us to
make sure the following dimensions were considered:
purpose, for whom, with whom, location, duration, di culty,
importance, frequency, and priority. This exercise was to
help explore the session feature in a more detailed way. For
example, studying frequency helps to nd out the most
frequent sessions and elicit the pattern of user's web activity.
2Although this means we have likely missed common search
sessions, like the lengthy adult sessions observed by Bailey
et al [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], it was considered an important ethical provision.
3addons.mozilla.org/en-us/ refox/addon/history-export/
In addition, the reasons for leading to non-success and
difculty can be investigated via the card sorting of di culty,
and the di erence of user's web behaviour in di erent
environments can also be examined by the sorting of location.
The entire interview was audio recorded, and physical copies
of the card sorts were kept for analysis.
      </p>
      <p>This paper describes our preliminary analysis of the rst
phase of the study, which involved 11 interviews. Phase two,
which is still under way, involves a slightly re ned
methodology to capture more information about topics that emerged
from the initial analysis described below. A more
comprehensive analysis of both phases will be published later.</p>
    </sec>
    <sec id="sec-4">
      <title>4. PRELIMINARY FINDINGS</title>
      <p>
        Based on our preliminary investigation, some potentially
interesting results relating to perceived duration, time of
day, and use of queries were found. We considered each of
these below according to two aspects: activity goal and
activity context. For activity goal, we used Sellen et al's [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
6 categories: ` nding', `information gathering', `browsing',
`transaction', `communication', and `housekeeping'. This
approach did not include any email, so this was added as a
7th category. For activity context, we applied Elseweiler et
al's [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] comparison between work and non-work (leisure)
activities, involving: `work', `serious-leisure', `project-leisure',
and `casual-leisure'. At this early stage in the project, the
primary author performed the classi cation individually based
on corresponding examples given in the referenced work.
4.1
      </p>
    </sec>
    <sec id="sec-5">
      <title>Defining Sessions</title>
      <p>There were 216 sessions in total and 19.6 sessions per
person have been studied thus far, as shown as Table 2.
Amongst these, 94 were longer than 5 minutes, 99 featured
search and only 9 sessions were unsuccessful.</p>
      <p>All participants mentioned that activities with the same
purpose and subject should be grouped into one session, as
shown in Table 3. In addition, 8 of the 11 suggested that
similar tasks happened in di erent time periods should be
classi ed as a single session, rather than them being
temporally connected. Some participants said that they always
kept the browser windows open when doing long-term tasks.
Finally, 1 participant advised that they care about the
emotion involved within these web activities, even when they
were doing the same task, such as \buying a pair shoes". In
particular, this participant indicated that one topically
consistent session should be divided between two
disappointingly unproductive and excitingly productive phases.</p>
    </sec>
    <sec id="sec-6">
      <title>4.2 Duration</title>
      <p>As duration is one of the targeted dimensions, all
participants were asked for their own de nition of what
constitutes a \long session". 45% of participants de ned the
session where the duration is more than 5 minutes, whereas
27% went with over 30 minutes, 18% more than 1 hour, and
1 participant chose over 2 hours.</p>
      <p>Because participants rst de ned what they considered
to be a long session, and then later sorted their sessions
into length categories, we investigated the di erence
between sessions that met their de nition of long, and ones
they remembered as being long during the card sorts.
Participants frequently grouped `de ned short' sessions as long
and vice-versa. Consequently, we investigated both
`overestimated' and `under-estimated' sessions in addition to
`dened long', `long', `actual long', `de ned short`, `short', and
`actual short' as given in Table 5.</p>
      <p>Firstly, considering activity goals given in Table 6, the
number of `information-gathering' sessions de ned as long
was 5 times as that of those `de ned short', as was the same
with `browsing'. On the contrary, the number of ` nding'
sessions de ned as short was 1.5 times the number de ned as
long. Overall, nearly 70% of ` nding', 42% of
`informationgathering', 60.7% of `browsing', 50% of `transaction', and
85.5% of `email' sessions de ned as long were overestimated
by users. Moreover, under-estimation occurred with `
nding', `information-gathering', and `housekeeping' although
over-estimation was more frequent with ` nding', `browsing',
`communication', and `email' sessions.</p>
    </sec>
    <sec id="sec-7">
      <title>4.3 Time of Day</title>
      <p>Figure 3 shows that most the `information-gathering', `
nding' and `housekeeping' sessions seem to occur between 10:00
and 16:00 whilst more `browsing', `email', and
`communication' activities were done between 22:00 and 0:00, which
was labelled \before bed time". Additionally, there is a
peak around 14:00, in which more ` nding' and
`informationgathering' happened rather than other kinds of sessions.
Finally, at 23:00, general `browsing' is most prevalent.</p>
      <p>Figure 4 shows that most of the `serious-leisure' sessions
occurred between 18:00 and 22:00. Most of the `work'
activities happened between 11:00 and 18:00, which seems to
t in within a typical working day. In the time `before bed',
the most frequent activity is `casual-leisure'.
4.4</p>
    </sec>
    <sec id="sec-8">
      <title>Search Queries</title>
      <p>In Figure 5 below, sessions with more search queries tend
to be classi ed as `de ned long', `long', and `actual long'
than those with fewer queries. An interesting observation is
that what the user de ned as a long session features a
relatively low average number of search queries compared with
`long' and `actual long' sessions. Equally, sessions de ned as
`short' by the user actually feature relatively more queries
compared to `short' and `actual short'. This may indicate
that the user did not consider the number of queries
performed when de ning the duration of sessions and failed to
realise the e ect of this behaviour.</p>
    </sec>
    <sec id="sec-9">
      <title>CONCLUSIONS</title>
      <p>Although this paper only describes a preliminary analysis
of over 200 sessions from 11 participants, we have begun to
see some potentially interesting early ndings. Initially,
participants varied greatly in their opinions about their own
sessions, with some matching topical divisions, some temporal
divisions, and some a combination of the two. The majority
of participants judged \long sessions" as being longer than 5
minutes, but many had inaccurate recollections of the length
of sessions. Long sessions were typically a mix of casual and
serious leisure that often involved information gathering and
browsing behaviour, while the majority of work related
sessions were typically short. We also noticed that some of
these activities may also be related to certain times of the
day. All of the ndings will be further explored after phase
two of the study, but early insights suggest that real
extended search sessions could be more accurately modelled
based on additional factors such as: time of day, activity
goal, activity context, and number of queries.
6.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bailey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Grosenick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Reinholdtsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Salada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Wong.</surname>
          </string-name>
          <article-title>User task understanding: a web search engine perspective</article-title>
          .
          <source>In NII Shonan Meeting on Whole-Session Evaluation of Interactive Information Retrieval Systems</source>
          , Kanagawa, Japan,
          <year>October 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Catledge</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Pitkow</surname>
          </string-name>
          .
          <article-title>Characterizing browsing strategies in the World-Wide web</article-title>
          .
          <source>Computer Networks and ISDN Systems</source>
          ,
          <volume>27</volume>
          (
          <issue>6</issue>
          ):
          <volume>1065</volume>
          {
          <fpage>1073</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Elsweiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Wilson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Lunn</surname>
          </string-name>
          .
          <article-title>Understanding casual-leisure information behaviour</article-title>
          .
          <source>In A. Spink and J</source>
          . Heinstrom, editors,
          <source>Library and Information Science</source>
          , pages
          <volume>211</volume>
          {
          <fpage>241</fpage>
          . Emerald Group Publishing Limited,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>He</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Go</surname>
          </string-name>
          <article-title>ker. Detecting session boundaries from Web user logs</article-title>
          .
          <source>Methodology</source>
          , pages
          <volume>57</volume>
          {
          <fpage>66</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Jansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Spink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Blakely</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Koshman. De ning</surname>
          </string-name>
          <article-title>a session on Web search engines</article-title>
          .
          <source>JASIST</source>
          ,
          <volume>58</volume>
          (
          <issue>6</issue>
          ):
          <volume>862</volume>
          {
          <fpage>871</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jhaveri</surname>
          </string-name>
          and
          <string-name>
            <surname>K.-J. Ra</surname>
          </string-name>
          <article-title>iha. The advantages of a cross-session web workspace</article-title>
          .
          <source>In CHI2005 Ext</source>
          . Abstracts, page
          <year>1949</year>
          . ACM Press,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mackay</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Watters</surname>
          </string-name>
          .
          <article-title>Exploring Multi-session Web Tasks</article-title>
          . Time, pages
          <volume>1187</volume>
          {
          <fpage>1196</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nettleton</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <article-title>Calderon-benavides, and</article-title>
          <string-name>
            <surname>R.</surname>
          </string-name>
          <article-title>Baeza-yates. Baezayates, analysis of web search engine query sessions</article-title>
          .
          <source>In Proc. WebKDD</source>
          <year>2006</year>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Rugg</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>McGeorge</surname>
          </string-name>
          .
          <article-title>The sorting techniques: a tutorial paper on card sorts, picture sorts and item sorts</article-title>
          .
          <source>Expert Systems</source>
          ,
          <volume>14</volume>
          (
          <issue>2</issue>
          ):
          <volume>80</volume>
          {
          <fpage>93</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Sellen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Murphy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. L.</given-names>
            <surname>Shaw</surname>
          </string-name>
          .
          <article-title>How knowledge workers use the web</article-title>
          .
          <source>In Proc. CHI2002</source>
          , pages
          <fpage>227</fpage>
          {
          <fpage>234</fpage>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Spink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Jansen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          .
          <article-title>Multitasking during Web search sessions</article-title>
          . IP&amp;M,
          <volume>42</volume>
          (
          <issue>1</issue>
          ):
          <volume>264</volume>
          {
          <fpage>275</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tauscher</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Greenberg</surname>
          </string-name>
          .
          <article-title>How people revisit web pages: empirical ndings and implications for the design of history systems</article-title>
          .
          <source>IJHCS</source>
          ,
          <volume>47</volume>
          (
          <issue>1</issue>
          ):
          <volume>97</volume>
          {
          <fpage>137</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E. G.</given-names>
            <surname>Toms</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Villa</surname>
          </string-name>
          , and L.
          <string-name>
            <surname>McCay-Peet</surname>
          </string-name>
          .
          <article-title>How is a search system used in work task completion</article-title>
          ?
          <source>Journal of Information Science</source>
          ,
          <volume>39</volume>
          (
          <issue>1</issue>
          ):
          <volume>15</volume>
          {
          <fpage>25</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>