=Paper= {{Paper |id=Vol-1180/CLEF2014wn-Inex-WalshEt2014 |storemode=property |title=The Edge Hill Contribution to the INEX Interactive Social Book Search Task |pdfUrl=https://ceur-ws.org/Vol-1180/CLEF2014wn-Inex-WalshEt2014.pdf |volume=Vol-1180 |dblpUrl=https://dblp.org/rec/conf/clef/WalshH14 }} ==The Edge Hill Contribution to the INEX Interactive Social Book Search Task== https://ceur-ws.org/Vol-1180/CLEF2014wn-Inex-WalshEt2014.pdf
        The Edge Hill Contribution to the INEX
          Interactive Social Book Search Task

                            David Walsh1 and Mark Hall1

      Edge Hill University, St Helens Road, Ormskirk, L39 4QP, United Kingdom,
                     {David.Walsh, Mark.Hall}@edgehill.ac.uk



        Abstract. In our contribution we use log-analysis to investigate whether
        participants in the INEX Interactive Social Book Search are able to use
        the new multi-stage interface and whether it provides any benefits over
        the traditional IR baseline interface. Our initial results show that par-
        ticipants are able to successfully use the new multi-stage interface, with
        no significant learning effects. Additionally, for the non-goal task, the
        multi-stage interface actually enables the participants to collect more
        books than when using the baseline interface.


Keywords: human computer information retrieval, user study, log analysis


1     Introduction

The CLEF1 INEX2 track’s Interactive Social Book Search task gathered data
from users using one of two interfaces to complete two tasks. The baseline inter-
face implemented a standard Information Retrieval (IR) interface [3] consisting
of a search box, a search result list, and an individual item display. The second
interface (multi-stage) attempted an implementation of Kuhlthau’s multi-stage
search process[4], filtered through Vakkari’s simplification of the model [5]. Two
tasks were tested, the first an non-goal task where participants were asked to
look for any book they might find interesting, and a goal-oriented task where
participants were asked to find books for a given topic (“laymen books on math-
ematics and physics”). Each participant completed both tasks in one of the two
interfaces. Task order was automatically balanced to avoid ordering bias.
    We investigated the following three research questions:

1. RQ1: Does the multi-stage interface enable the participants to explore and
   find a larger number of books?
2. RQ2: Does the multi-stage interface have an additional learning time?
3. RQ3: Do participants make use of all three stages in the multi-stage multi-
   stage interface?
1
    Conference and Labs of the Evaluation Forum
2
    INitiative for the Evaluation of XML retrieval




                                          549
2     Time Spent in the System

The first analysis focused on how long the participants spent using the system in
order to determine whether there were any differences between the two systems
and tasks. The experiment system [2] automatically measured the time taken on
the non-goal and goal-oriented tasks for every participant and the main anal-
ysis is based on this data. For the three stages implemented in the multi-stage
interface, the log data acquired by the IR system [1] was processed to determine
how long each participant spent using the each of the three stages (“explore”,
“focus”, and “refine”).
    The first step in the analysis was to determine if the task ordering impacted
the time spent on either of the tasks. Wilcox signed rank tests were used to com-
pare the task times for all interface and task combinations, showing no significant
differences in task duration for any of the combinations. For the multi-stage in-
terface, the time spent in each of the three stages was also compared using a
Wilcoxon rank-sum test for both ordering conditions, and also showed no signif-
icant differences in times in the two stages. For the remainder of the analysis,
the task order can thus be ignored and the times for the two order conditions
aggregated.
    Table 1 shows the task times for the two interfaces and tasks. The data
seems to indicate that participants are faster with the multi-stage interface for
both tasks and that within the interfaces, participants are faster to complete
the non-goal task than the goal-oriented task. However, for neither of these
conditions does a Wilcoxon rank-sum test show significant differences. Thus for
the remaining analysis presented here, we can assume that any differences in
participant performance are due to the task or interface and not due to the time
the participants spent on the task or interface.


Table 1. Task durations for the baseline and multi-stage interfaces for both the non-
goal and goal-oriented tasks. All times are in seconds and formatted “median (1st
quartile, 3rd quartile)”.

              Interface       non-goal task       goal-oriented task
              baseline        217 (109, 334)         385 (142, 436)
              multi-stage     160 (109, 490)         215 (148, 412)




2.1   Modern Interface Phase Times

In the multi-stage interface participants were able to switch between three stages
(“explore”, “focus”, and “refine”). Table 2 shows the time spent in each of the
three stages for the two tasks. Wilcoxon rank-sum tests were used to test for
ordering effects. There are no ordering effects for the time spent in the explore




                                        550
and focus stages, but there is an ordering effect in the refine stage. For the non-
goal task, the time spent in the refine stage is longer, if it is the second task
(p = 0.012). No ordering effect was shown for the goal-oriented task.


Table 2. Time spent in each of the three phrases available in the multi-stage interface.
All times are in seconds and formatted “median (1st quartile, 3rd quartile)”.

       Task             explore stage            focus stage       refine stage
       non-goal          54(26.75, 73)          91.5 (52.5, 359)     0 (0, 5.75)
       goal-oriented     54 (30.25, 79)     122.5 (73.25, 353)        0 (0, 18)



    The times shown in Table 2 follow similar patterns for both the non-goal
and closed tasks. Participants spent slightly less than a minute using the explore
stage, and then spent between one and a half and two minutes on the focus
stage. Only a small fraction of participants used the refine stage at all and those
that did, did so only very briefly.
    Considering RQ3, participants obviously do not use the final refine stage,
either because they did not notice the stage in the user interface or because
the label “Refine” did not clearly state what functionality would be available.
Without looking at the participants qualitative responses it is impossible to
determine the cause. However, the use pattern for the first two stages is as
expected, with participants first spending time in the explore stage gaining an
overview and then using the focus stage.


3     Books Collected
To investigate RQ1 we looked at the number of books participants added to
their book-bag and also how quickly they added the first book. For RQ2 we also
investigated which of the three stages participants added books from.

3.1   Total Number of Books Added
The total number of books each participant added to their book-bag was de-
termine using a manual analysis of the log-data. For the baseline interface, the
number of books added to the book-bag was counted and any books that were
subsequently removed from the book-bag subtracted from that count. For the
multi-stage interface, the same process was applied, but book counts were sep-
arated according to which of the stages the books were added from.
    The resulting data-set was checked for task ordering effects using Wilcoxon
rank sum tests and no significant ordering effects were found for any of the
interface / task combinations. In the multi-stage task, the same checks were
applied to the more detailed data and only for the explore stage in the goal-
oriented task, was there a significant ordering bias. If the goal-oriented task was




                                          551
the second task, then significantly fewer books were added to the book-bag in
the explore stage (Wilcoxon signed rank, p = 0.035). As there were no overall
ordering effects, the further analysis did not take task ordering into account.
    Table 3 shows that significantly more books were added in the non-goal task
using the multi-stage interface than using the baseline interface (Fig. 1, Wilcoxon
signed rank test, p = 0.011). No such effect is visible in the goal-oriented task.
This seems to indicate that the multi-stage interface provides significant benefit
to the user when they do not yet have an explicit goal that they are searching
for. At the same time, the multi-stage interface does not impact the performance
when the user has an explicit goal in mind.


Table 3. Median number of books added to the book-bag. Numbers are “median (1st
quartile, 3rd quartile)”.

              Interface      non-goal task      goal-oriented task
              baseline           1 (0, 2)              3 (1, 4)
              multi-stage        2 (1, 4)             3.5 (2, 5)




Fig. 1. Box-plot showing the number of books added to the book-bag for the baseline
(a) and the multi-stage (b) interface in the non-goal task.




3.2   Modern Interface Details

To investigate RQ1 further we used the dataset from the previous section, but
now looked in detail at the number of books added in the three stages of the




                                       552
multi-stage interface (Tab. 4). Task ordering effects were investigated and no
significant effects were found.
    Table 4 clearly shows that the refine stage was not used to add any books,
which is in line with the timing results that showed that the refine stage was
essentially not used. Interestingly, although the participants spent more time in
the focus stage, they collected more items in the initial explore stage. While the
data seems to indicate that in the goal-oriented task participants collected more
books in the explore stage, the effect is not significant.


Table 4. Median number of books added in each stage of the multi-stage interface. All
results are formatted “median (1st quartile, 3rd quartile)”.

         Task             explore stage        focus stage     refine stage
         non-goal             1 (1, 2)           0 (0, 1)         0 (0, 0)
         goal-oriented        2 (0, 2)           0 (0, 2)         0 (0, 0)




3.3   Time to Collect the First Book

To investigate RQ2, we analysed how quickly participants added their first book
in each task. The log was manually analysed and the time between the session
start and when the time at which the first book was added to the book-bag
determined.
    Table 5 shows the median times to collect their first book. While it looks as
if the modern interface enables the participant to find the first book faster, the
difference is not statistically significant.


Table 5. Time to add first item. All times are in seconds and formatted “median (1st
quartile, 3rd quartile)”.

            Interface        non-goal task            goal-oriented task
            classic       113.50 (76.25, 225.00)      112.0 (54.0, 164.5)
            modern             101 (50, 271)           88.5 (58.0, 175.8)




4     Interaction Patterns

The final analysis looked at the interaction patterns, using a user-interaction
bi-gram analysis. To create the interaction bi-gram distributions needed for the
analysis, the log was processed in the following steps:




                                         553
 1. Generate interaction string – in the initial step each user-system interac-
    tion was mapped to a single letter. Using this mapping, for each participant
    and each of the participant’s tasks a string representation of their interac-
    tions with the system was generated. Repeats of a single letter were reduced
    to a single letter;
 2. Create participant pattern distribution – based on the interaction
    strings, all bi-grams were determined and bi-gram frequency distributions
    calculated;
 3. Aggregate distributions – the participants’ bi-gram distributions were
    aggregated into interface and task bi-gram distributions;
 4. Filter distributions – the interface and task bi-gram distributions were
    filtered. All bi-grams that occurred fewer than three times were aggregated
    into a single value. This ensures that a potentially large number of interaction
    patterns that only occurred once or twice do not skew the results, while at
    the same time not completely loosing that data.

    Before analysing the data in any more depth, potential ordering effects were
investigated and only for the goal-oriented task with the multi-stage interface
is there a significant difference in the interaction pattern distributions (χ2 test,
p = 0.047). As the significance is border-line and there is no significant differences
in any of the other metrics tested, for the purpose of this analysis, the influence
of task order will be ignored.
    Comparing the two interfaces shows a significant difference in the bi-gram
distributions between the baseline and multi-stage interfaces on the goal-oriented
task (χ2 test, p = 0.036). Looking at the bi-gram distribution (Tab. 6) shows
that the main difference is at which point participants added books to their
book-bag. In the baseline interface this primarily happened after the participants
had viewed the book’s detail (IA), while in the multi-stage interface it happens
directly from the search results list (QI ). The behaviour after adding an book
to the book-bag is also different. In the baseline interface, the next action is to
run another query (IQ), while in the multi-stage interface it is to view another
book (AI ).


Table 6. Top-ten most frequent bi-grams for the baseline and multi-stage interfaces.
Major differences have been highlighted in bold. Actions: A – add book to book-bag;
I – view a book; L – load a page; P – paginate; Q – run a query.

         Interface     AI   AQ    IA    IQ    LI   LQ   PI   QA    QI   QL
         baseline       1    47   71    24    10   30   23    4    56    7
         multi-stage   45    16    40   19    10   26   36   20    46    13



   Comparing the two tasks within both interfaces, shows a significant difference
between the non-goal and goal-oriented tasks in the multi-stage interface (χ2 ,
p < 0.001), but none in the baseline interface. From the bi-gram distribution




                                        554
of the multi-stage interface (Tab. 7), three differences stand out. In the goal-
oriented task, participants made more use of the pagination functionality to
see more items (PI ) and also viewed more items after selecting a search facet
(FI ). In the non-goal task, participants more frequently viewed different bits of
meta-data after viewing an item (MI ).


Table 7. Top-ten most frequent bi-grams for the non-goal and focus tasks using the
multi-stage interface. Major differences have been highlighted in bold. Actions: A –
add book to book-bag; F – add a facet; I – view a book; L – load a page; P – paginate;
Q – run a query.

         Task            AI   AL   FI   FQ    IA   IM    LF   MI    PI   QI
         non-goal        31   23    9    22   35    65   27    46    6    37
         goal-oriented   45   20   19    21   40    37   15    50   36    46



    The difference is in line with what would be expected, due to the task dif-
ferences. In the goal-oriented task, participants use the facetting and pagination
functionality to dig into the results, a pattern that is not so relevent when the
task is non-goal. At the same time, in the non-goal task, participants interact
more with the books’ meta-data, as the participants use the meta-data to develop
the search goal.



5    Conclusions


In conclusion the goal of the initial log analysis was to determine how participants
used the multi-stage interface. The initial question was whether there would be
a learning impact into the participants’ peformance. The results clearly show
that participants are able to use the new multi-stage interface just as well as
the baseline interface and that there are no learning effects. For the non-goal
task, the multi-stage interface even outperforms the baseline interface. Clearly,
the initial explore stage, designed to support open-ended exploration, enables
the user to explore better and thus collect more books.
    Finally, both the considering the number of books collected and the time
spent in the three stages of the multi-stage interface, participants clearly only
make use of the first two stages. For the first two stages, the behaviour is as
expected, with more time spent in the second focus stage, compared to the first
explore stage. Interestingly, the majority of books were collected using the first
explore stage, a result that needs more analysis. The use of the final refine stage
requires further analysis, as it is clearly not used much and the reasons for this
need to be investigated.




                                        555
References
1. Hall, M.M., Katsaris, S., Toms, E.: A pluggable interactive ir evaluation work-
   bench. In: European Workshop on Human-Computer Interaction and Information
   Retrieval. pp. 35–38 (2013), http://ceur-ws.org/Vol-1033/paper4.pdf
2. Hall, M.M., Toms, E.: Building a common framework for iir evaluation. In: CLEF
   2013 - Information Access Evaluation. Multilinguality, Multimodality, and Visual-
   ization. pp. 17–28 (2013)
3. Hearst, M.A.: Search User Interfaces. Cambridge University Press (2009)
4. Kuhlthau, C.C.: Inside the search process: Information seeking from the user’s per-
   spective. JASIS 42(5), 361–371 (1991)
5. Vakkari, P.: A theory of the task-based information retrieval process: a summary
   and generalisation of a longitudinal study. Journal of documentation 57(1), 44–60
   (2001)




                                        556