<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploratory Search in an Audio-Visual Archive: Evaluating a Professional Search Tool for Non-Professional Users</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marc Bron</string-name>
          <email>m.m.bron@uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jasmijn van Gorp</string-name>
          <email>j.vangorp@uu.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frank Nack</string-name>
          <email>nack@uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maarten de Rijke</string-name>
          <email>derijke@uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISLA, University of Amsterdam</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TViT, Utrecht University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <sec id="sec-1-1">
        <title>H.5.2 [User interfaces]: Evaluation/methodology</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>General Terms</title>
      <sec id="sec-2-1">
        <title>Exploratory search, Usability evaluation</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>INTRODUCTION</title>
      <p>
        Traditionally, archives have been the domain of archivists and
librarians, who retrieve relevant items for a user’s request through
their knowledge of the content in, and organization of, the archive.
Increasingly, archives are opening up and publishing their content
online, making their collections directly accessible for the general
public. There are two major problems that these non-professional
users face. First, most users are unfamiliar or only partially
familiar with the archive content and its representation in the repository.
The internal representation is designed from the expert point of
Copyright c 2011 for the individual papers by the papers’ authors.
Copying permitted only for private and academic purposes. This volume is
published and copyrighted by the editors of euroHCIR2011.
view, i.e., the type of information included in the metadata, which
does not necessarily match the expectation of the general public.
This leads to an increase in exploratory types of search [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], as users
are unable to translate their information need into terms that
correspond with the representation of the content in the archive. The
second problem is that archives provide users with professional search
tools to search through their collections. Such tools were
originally developed to support professional users in searching through
the metadata descriptions in a collection. Given their knowledge of
the collection, professionals primarily exhibit directed search
behavior [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], but it is unclear to what extent professional search tools
support non-professional users in exploratory search.
      </p>
      <p>
        The focus of most work on improving exploratory search is
towards professionals [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this paper we present a small-scale user
study where non-professional users perform exploratory search tasks
in an audio-visual archive using a search tool originally developed
for media professionals and archivists. We investigate the
following hypotheses: (i) a search interface designed for professional
users does not provide satisfactory support for non-professional
users on exploratory search tasks; and (ii) users with high
performance on exploratory search tasks have different search behavior
than users with lower performance.
      </p>
      <p>In order to investigate the first hypothesis we evaluate the search
tool performance objectively in terms of the number of correct
answers found for the search tasks and subjectively through a
usability questionnaire. To answer the second hypothesis, we perform an
analysis of the click data logged during search.
2.</p>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTAL DESIGN</title>
      <p>The environment. The setting for our experiment was the
Netherlands Institute for Sound and Vision (S&amp;V), the Dutch national
audiovisual broadcast archive. In the experiment we used the archive’s
collection consisting of around 1.5 M (television) programs with
metadata descriptions provided by professional annotators.</p>
      <p>We also utilized the search interface of S&amp;V.1 The interface is
available in a simple and an advanced version. The simple version
is similar to search engines known from the web. It has a single
search box and submitting a query results in a ranked list of 10
programs. Clicking on one of the programs, the interface shows a
page with the complete metadata description of the program.
Table 1 shows the metadata fields available for a program. Instead of
1http://zoeken.beeldengeluid.nl
the usual snippets presented with each item in a result list, the
interface shows the title, date, owner and keywords for each item on the
result page. Only the keywords and title field provide information
about the actual content of the program while the other fields
provide information primarily used for the organization of programs in
the archive collection. The description and summary fields contain
the most information about the content of programs but are only
available by visiting the program description page.</p>
      <p>We used the advanced version of the interface in the experiment
which next to the search box offers two other components: search
boxes operating on specific fields and filters for certain categories
of terms. Fielded searches operate on specific fields in the program
metadata. The filters become available after a list of programs has
been returned in response to a query. The filters display the top
five most frequent terms in the returned documents for a metadata
field. The metadata fields displayed in the filter component of the
interface are highlighted in bold in Table 1. Once a checkbox next
to one of the terms has been ticked, programs not containing that
term in that field are removed from the result list.
Subjects. In total, 22 first year university students from media
studies participated in the experiment. The students (16 female,
6 male) were between 19 and 22 years of age. As a reward for
participation the students gained free entrance to the museum of
the archive.</p>
      <p>Experiment setup. In each of the five studios available at S&amp;V
either one or two subjects performed the experiment at a time in a
single studio. In case two subjects were present, each of them worked
on machines facing opposite sides of the studio. We instructed
subjects not to communicate during the experiment. During the
experiment one instructor was always present in a studio. Before starting,
the subjects learned the goals of the experiment, got a short
tutorial on the search interface and performed a test query. During this
phase the subjects were allowed to ask questions.</p>
      <p>In the experiment each subject had to complete three search tasks
in 45 minutes. If after 15 minutes a task was not finished, the
instructor asked the subject to move on to the next task. Search tasks
are related to matters that could potentially occur within courses
of the student’s curriculum. Each search task required the subjects
to find five answers before moving on to the next task. A correct
answer was a page with the complete metadata description of a
program that fulfilled the information need expressed by the search
task. Subjects could indicate that a page was an answer through a
submit button added to the interface for the experiment.</p>
      <p>We used the following three search tasks in the experiment: (i) For
the course “media and ethnicity” you need to investigate the role
of ethnicity in television-comedy. Find five programs with
different comedians with a non-western background. (ii) For the course
“television geography” you need to investigate the representation of
places in drama series. Find five drama series where location plays
an important role. (iii) For the course “media and gender” you need
to give a presentation about the television career of five different
female hosts of game shows broadcasted during the 1950s, 1960s or
1970s. Find five programs that you can use in your presentation.</p>
      <p>
        Subjects received the search tasks in random order to avoid any
bias. Also, subjects were encouraged to perform the search in any
means that suited them best. During the experiment we logged
all search actions, e.g., clicks, performed by each subject. After a
subject had finished all three search tasks, he or she was asked to fill
out a questionnaire about the experiences with the search interface.
Methodology for evaluation and analysis. We performed two
types of evaluation of the search interface: a usability questionnaire
and the number of correct answers submitted for the search tasks.
The questionnaire consists of three sets of questions. The first set
involves aspects of the experienced search behaviour with the
interface. The second set contains questions about how useful users
find the filter component, fielded search component, and metadata
fields presented in the interface. The third set asks subjects to
indicate the usefulness of a series of term clouds. The primary goal
is not to evaluate the term clouds or their visualization but to find
preferences for information from certain metadata fields. We
generated a term cloud for a specific field as follows. First, we got
the top 1000 program descriptions for the query “comedian.” We
counted the terms for a field for each of the documents. The cloud
then represented a graphical display of the top 50 most frequent
terms in the fields of those documents, where the size of a term was
relative to its frequency, i.e, the higher the frequency the bigger the
term. In the questionnaire subjects indicate agreement on a 5 point
Likert scale ranging from one (not at all) to five (extremely). The
second type of evaluation was based on the evaluation methodology
applied at TREC [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We pooled the results of all subjects and let
two assessors make judgements about the relevance of the
submitted answers to a search task. An answer is only considered relevant
if both assessors agree. Performance is measured in terms of the
number of correct answers (#correct) submitted to the system.
      </p>
      <p>
        For the analysis of the search behavior of subjects we looked
at (i) the number of times a search query is submitted using any
combination of components (#queries); (ii) the number of times a
program description page is visited (#pages); and (iii) the number
of times a specific component is used, i.e., the general searchbox,
filters and fields. A large value for #queries indicates a look up
type search behavior. It is characterized by a pattern of submitting
a query, checking if the answer can be found in the result list and if
it is not, to formulate a new query. The new query is not
necessarily based on information gained from the retrieved results but rather
inspired by the subject’s personal knowledge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A large value for
#pages indicates a learning style search behavior. In this search
strategy a subject visits the program description of each search
result to get a better understanding of the organization and content of
the archive. New queries are then also based on information gained
from the previous text analysis [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We check the usage frequency
of specific components to see if performance differences between
subjects are due to alternative uses of interface components.
      </p>
    </sec>
    <sec id="sec-5">
      <title>RESULTS</title>
      <p>Search interface evaluation. Figure 1 shows the distribution of
the amount of correct answers submitted for a search task, together
with the distribution of the amount of answers (correct or
incorrect) submitted. Out of the possible total of 330 answers, 173 are
actually submitted. Subjects submit the maximum number of five
s
k
s
a
t
#
0
2
0
1
0
#correct
#submitted
answers for 18 of the tasks. This suggests that subjects have
difficulties in finding answers within the given time limit. Subjects find
no correct answers for 31 of the tasks, five subjects find no
correct answer for any of the tasks, and none of the subjects reaches
the maximum of five correct answers for a task. In total 64 out of
173 answers are correct. This low precision indicates that subjects
find it difficult to judge if an answer is correct based on the
metadata provided by the program description. Table 2 shows
questions about the satisfaction of subjects with the interfaces. Subjects
indicate their level of agreement from one (not at all) to five
(extremely). For all questions the majority of subjects find the amount
of support offered by the interface on the exploratory search tasks
marginal. This finding supports our first hypothesis that the search
interface intended for professional users does not provide
satisfactory support to non-professional users on exploratory search tasks.
Search behavior analysis. Although all subjects are non-experts
with respect to search with this particular interface, some perform
better than others. We investigate whether there is a difference in
the search behavior of subjects that have high performance on the
search tasks and users that have lower performance. We divide
subjects into two groups depending on the average number of
correct answers found aggregated over the three tasks, i.e., 2.9 out of
the possible maximum of 15. The group with higher performance
(group G) consists of 11 subjects with 3 or more correct answers,
whereas the group with lower performance (group B) consists of
11 subjects with 2 or less correct answers.</p>
      <p>Table 3 shows the averages of the search behavior indicators for
each of the two groups. We first look at the usage frequency of the
filter, field, and search box components by subjects in group G vs.
group B. There is no significant difference between the groups,
indicating that there is no direct correlation between performance on
the search tasks and use of specific search components. Next we
look at search behavior as an explanation for the difference in
performance between the groups. Our indicator for lookup searches,
i.e., #queries, shows a small difference in the number of submitted
queries. That subjects in both groups submit a comparable
num</p>
      <sec id="sec-5-1">
        <title>To what degree are you satisfied with the search experience offered by the interface? To what degree did the interface support you by suggesting new search terms?</title>
        <p>To what degree are you satisfied with the
suggestions for new search terms by the interface?
ber of queries suggests that the difference in performance is not
due to one group doing more lookups than the other. The
indicator for learning type search, i.e., #pages, shows that there is a
significant difference in the number of program description pages
visited between subjects of the two groups, i.e., subjects in group
G tend to visit program description pages more often than subjects
of group B. We also find that the average time subjects in group G
spend on a program description page is 27 seconds, while subjects
from group B spend on average 39 seconds. These observations
support our hypothesis that there are differences in search behavior
between subjects that have high performance on exploratory search
tasks and subjects with lower performance.</p>
        <p>Usefulness of program descriptions. One explanation for this
difference in performance is that through their search behavior
subjects from group G learn more about the content and organization
of the archive and are able to assimilate this information faster from
the program descriptions than subjects from group B. As subjects
process more program descriptions they learn more about the
available programs and terminology in the domain. This results in a
richer set of potential search terms to formulate their information
need. To investigate whether subjects found information in the
program descriptions useful in suggesting new search terms, we
analyse the second set of questions from the questionnaire. The top half
of Table 4 shows subjects’ responses to questions about the
usefulness of metadata fields present on the search result page.
Considering responses from all subjects the genre and keyword fields are
found most useful and the title and date fields as well, although to
a lesser degree. The fields intended for professionals, i.e., origin,
owner, rights, and medium are found not useful by the majority of
subjects. Between group B and G there are no significant
differences in subject’s judgement of the usefulness of the fields.</p>
        <p>The bottom part of Table 4 shows subject’s responses to
questions about the usefulness of metadata fields only present on the
program description page and not already shown on the search
result page. Based on all responses, the summary, description, person
and location metadata fields are considered most useful by the
majority of the subjects. These findings further support our argument
that program descriptions provide useful information for subjects
to complete their search tasks.</p>
        <p>When we contrast responses of the two groups we find that group
G subjects consider the description, person, and location metadata
fields significantly more useful than subjects from group B. This
suggests that group B subjects have more difficulties in distilling
useful information from these fields (recall also the longer time
spent on a page). This does not say that these users cannot
understand the provided information. All that is indicated is that the
chosen modality, i.e., text, might not be the right one. A graphical
representation, for example as term clouds, might be better.
Fields as term clouds. In response to the observations just made,
we also investigated how users would judge visual representations
of search results, i.e., in the form of term clouds directly on the
search result page. Here the goal is not to evaluate the visualization
of the clouds or the method by which they are created. Of interest
to us is whether subjects would find a direct presentation of
information normally “hidden” on the program description page useful.</p>
        <p>Recall from §2 that we generate term clouds for each field on
the basis of the terms in the top 1000 documents returned for a
query. From Table 5 we observe that subjects do not consider
the description and summary clouds useful, while previously these
fields were judged most useful among the fields in the program
description. Both clouds contain general terms from the television
domain, e.g., program and series, which do not provide subjects
with useful search terms. Although this could be due to the use
of frequencies to select terms, these fields are inherently difficult
to visualize without losing the relations between the terms. The
genre, keyword, location and, to some degree, person clouds are all
considered useful, but they support the user in different ways. The
genre field supports the subject in understanding how content in the
archive is organized, i.e., it provides an overview of the genres used
for categorization. The keyword cloud provides the user with
alternative search terms for his original query, for example, satire or
parody instead of cabaret. The location and person clouds offer an
indication of which locations and persons are present in the archive
and how prominent they are. For these fields visualization is easier,
i.e., genre, keywords or entities by themselves are meaningful
without having to represent relations between them. Subjects consider
the title field only marginally useful. For this field the usefulness is
dependent on the knowledge of the subject as titles are not
necessarily descriptive. The subjects also consider the organization field
marginally useful, probably due to the nature of our search tasks,
i.e., two tasks focus on finding persons and in one locations play
an important role. We assume though that in general this type of
information need occurs when the general public starts exploring
cloud
mode
2
2,3
4
2
avg
2.8
2.9
3.3
2.2
cloud
mode
avg
2.5
3.4
2.3
3.8
the archive. Together, the above findings suggest that subjects find
a direct presentation of short and meaningful terms, i.e., categories,
keywords, and entities, on the search results page useful.
4.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>CONCLUSION</title>
      <p>We presented results from a user study where non-professional
users perform exploratory search tasks with a search tool originally
developed for media professionals and archivists in an audio visual
archive. We hypothesized that such search tools provide
unsatisfactory support to non-professional users on exploratory search tasks.
By means of a TREC style evaluation we find that subjects achieve
low recall in the number of correct answers found. In a
questionnaire regarding the user satisfaction with the search support offered
by the tool, subjects indicate this to be marginal. Both findings
support our hypothesis that a professional search tool is unsuitable for
non-professional users performing exploratory search tasks.</p>
      <p>Through an analysis of the data logged during the experiment,
we find evidence to support our second hypothesis that subjects
perform different search strategies. Subjects that visit more program
description pages are more successful on the exploratory search
tasks. We also find that subjects consider certain metadata fields on
the program description pages more useful than others. Subjects
indicate that visualization of certain fields as term clouds directly in
the search interface would be useful in completing the search tasks.
Subjects especially consider presentations of short and meaningful
text units, e.g., categories, keywords, and entities, useful.</p>
      <p>In future work we plan to perform an experiment in which we
present non-professional users with two interfaces: the current search
interface and one with a direct visualization of categories,
keywords and entities on the search result page.</p>
      <p>Acknowledgements. This research was partially supported by the
European Union’s ICT Policy Support Programme as part of the
Competitiveness and Innovation Framework Programme, CIP
ICTPSP under grant agreement nr 250430, the PROMISE Network of
Excellence co-funded by the 7th Framework Programme of the
European Commission, grant agreement no. 258191, the DuOMAn
project carried out within the STEVIN programme which is funded
by the Dutch and Flemish Governments under project nr
STE-0912, the Netherlands Organisation for Scientific Research (NWO)
under project nrs 612.061.814, 612.061.815, 640.004.802,
380-70011, the Center for Creation, Content and Technology (CCCT), the
Hyperlocal Service Platform project funded by the Service
Innovation &amp; ICT program, the WAHSP project funded by the
CLARINnl program, and under COMMIT project Infiniti.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] J.-w. Ahn,
          <string-name>
            <given-names>P.</given-names>
            <surname>Brusilovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grady</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>He</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Florian</surname>
          </string-name>
          .
          <article-title>Semantic annotation based exploratory search for information analysts</article-title>
          .
          <source>Inf. Proc. &amp; Management</source>
          ,
          <volume>46</volume>
          (
          <issue>4</issue>
          ):
          <fpage>383</fpage>
          -
          <lpage>402</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Harman</surname>
          </string-name>
          .
          <article-title>The TREC test collections</article-title>
          . In E. M. Voorhees and
          <string-name>
            <surname>D. K</surname>
          </string-name>
          . Harman, editors, TREC:
          <article-title>Experiment and evaluation in information retrieval</article-title>
          .
          <source>MIT</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Huurnink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hollink</surname>
          </string-name>
          , W. van den Heuvel, and M. de Rijke.
          <article-title>Search behavior of media professionals at an audiovisual archive</article-title>
          .
          <source>J. Am. Soc. Inf. Sci. and Techn</source>
          .,
          <volume>61</volume>
          :
          <fpage>1180</fpage>
          -
          <lpage>1197</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Marchionini</surname>
          </string-name>
          .
          <article-title>Exploratory search: from finding to understanding</article-title>
          .
          <source>Comm. ACM</source>
          ,
          <volume>49</volume>
          (
          <issue>4</issue>
          ):
          <fpage>41</fpage>
          -
          <lpage>46</lpage>
          ,
          <year>April 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kules</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Drucker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Schraefel</surname>
          </string-name>
          .
          <article-title>Supporting exploratory search: Special issue</article-title>
          .
          <source>Comm. ACM</source>
          ,
          <volume>49</volume>
          (
          <issue>4</issue>
          ),
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>