=Paper= {{Paper |id=None |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-763/euroHCIR2011_proceedings.pdf |volume=Vol-763 }} ==None== https://ceur-ws.org/Vol-763/euroHCIR2011_proceedings.pdf
                              !
                        euroHCIR2011!
                         4th!July!2011!–!Newcastle,!UK!
                                        !
              Proceedings+of+the+!
         1st!European!Workshop(on(!
      Human"Computer)Interaction)with)
            Information*Retrieval!
                         A"workshop"at"BCS/HCI2011"
                                      "


                           Executive!Summary!
EuroHCIR2011,was,the,first,in,a,series,of,new,workshops,aimed,to,stimulate,the,
   European,Human,Computer,Interaction,and,Information,Retrieval,(HCIR),
  community,in,a,similar,manner,to,series,of,successful,workshops,held,in,the,
   USA.,The,workshop,,which,won,industry,sponsorship,from,LexisNexis,,was,
  highly,successful,,accepting,11,short,papers,and,drawing,participants,from,a,
dozen,countries,across,Europe.,In,addition,to,the,8,insightful,presentations,and,
    3,poster,presentations,,Ann,Blandford,,from,University,College,London’s,
 Interaction,Centre,,gave,an,inspiring,keynote,about,their,work,on,Exploratory,
                             Search,and,Serendipity.,


                               Organised!by!
           Max"L."Wilson"                             Birger"Larson"
Future,Interaction,Technologies,Lab,          The,Royal,School,of,Library,and,
     Swansea,University,,UK,,                  Information,Science,,Denmark,
    m.l.wilson@swansea.ac.uk,                          blar@iva.dk,
                  ,                                           "
        Tony"Russell/Rose"                           James"Kalbach"
            UXLabs,,UK,                               LexisNexis,,UK,
         tgr@uxlabs.co.uk,                    James.kalbach@lexisnexis.co.uk,
                                     !
                               Sponsored!by!!

                                                          ,                         ,
Session!1!
Page!3!!K!! Exploratory"Search"in"an"Audio/Visual"Archive:"Evaluating"a"
            Professional"Search"Tool"for"Non/Professional"Users!
            Marc%Bron,%Jasmijn%Van%Gorp,%Frank%Nack%and%Maarten%De%Rijke%
!
Page!7!K!! Supplying"Collaborative"Source/code"Retrieval"Tools"to"Software"
            Developers!
            Juan%M.%Fernández>Luna,%Juan%F.%Huete%and%Julio%Rodriguez>Cano%
!
Page!11!K!! Interactive"Analysis"and"Exploration"of"Experimental"Evaluation"
            Results!
            Emanuele%Di%Buccio,%Marco%Dussin,%Nicola%Ferro,%Ivano%Masiero,%
            Giuseppe%Santucci%and%Giuseppe%Tino%
!
Page!15!K! A"Taxonomy"of"Enterprise"Search!
            Tony%Russell>Rose,%Joe%Lamantia%and%Mark%Burrell%


Session!2!
Page!19!K! Back"to"MARS:"The"unexplored"possibilities"in"query"result"
            visualization!
            Alfredo%Ferreira,%Pedro%B.%Pascoal%and%Manuel%J.%Fonseca%
!
Page!23!K!! The"Mosaic"Test:"Benchmarking"Colour/based"Image"Retrieval"
            Systems"Using"Image"Mosaics!
            William%Plant,%Joanna%Lumsden%and%Ian%Nabney%
!
Page!27!K!! Evaluating"the"Cognitive"Impact"of"Search"User"Interface"Design"
            Decisions!
            Max%L.%Wilson%
!
Page!31!K!! The"potential"of"Recall"and"Precision"as"interface"design"
            parameters"for"information"retrieval"systems"situated"in"
            everyday"environments!
            Ayman%Moghnieh%and%Josep%Blat%


Posters!
Page!35!K!! Towards"User/Centered"Retrieval"Algorithms!
            Manuel%J.%Fonseca%
!
Page!38!K!! Design"Thinking"Search"User"Interfaces!
            Arne%Berger%
%
Page!42!K!! The"Development"and"Application"of"an"Evaluation"Methodology"
            for"Person"Search"Engines!
            Roland%Brennecke,%Thomas%Mandl%and%Christa%Womser>Hacker%
!
               Exploratory Search in an Audio-Visual Archive:
                 Evaluating a Professional Search Tool for
                          Non-Professional Users

                                             Marc Bron                             Jasmijn van Gorp
                                 ISLA, University of Amsterdam                    TViT, Utrecht University
                                       m.m.bron@uva.nl                              j.vangorp@uu.nl

                                             Frank Nack                             Maarten de Rijke
                                 ISLA, University of Amsterdam              ISLA, University of Amsterdam
                                           nack@uva.nl                               derijke@uva.nl


ABSTRACT                                                                     view, i.e., the type of information included in the metadata, which
As archives are opening up and publishing their content online,              does not necessarily match the expectation of the general public.
the general public can now directly access archive collections. To           This leads to an increase in exploratory types of search [5], as users
support access, archives typically provide the public with their in-         are unable to translate their information need into terms that corre-
ternal search tools that were originally intended for professional           spond with the representation of the content in the archive. The sec-
archivists. We conduct a small-scale user study where non-profes-            ond problem is that archives provide users with professional search
sionals perform exploratory search tasks with a search tool origi-           tools to search through their collections. Such tools were origi-
nally developed for media professionals and archivists in an audio           nally developed to support professional users in searching through
visual archive. We evaluate the tool using objective and subjective          the metadata descriptions in a collection. Given their knowledge of
measures and find that non-professionals find the search interface           the collection, professionals primarily exhibit directed search be-
difficult to use in terms of both. Analysis of search behavior shows         havior [3], but it is unclear to what extent professional search tools
that non-professionals often visiting the description page of indi-          support non-professional users in exploratory search.
vidual items in a result list are more successful on search tasks than          The focus of most work on improving exploratory search is to-
those who visit fewer pages. A more direct presentation of enti-             wards professionals [1]. In this paper we present a small-scale user
ties present in the metadata fields of items in a result list can be         study where non-professional users perform exploratory search tasks
beneficial for non-professional users on exploratory search tasks.           in an audio-visual archive using a search tool originally developed
                                                                             for media professionals and archivists. We investigate the follow-
Categories and Subject Descriptors                                           ing hypotheses: (i) a search interface designed for professional
                                                                             users does not provide satisfactory support for non-professional
H.5.2 [User interfaces]: Evaluation/methodology
                                                                             users on exploratory search tasks; and (ii) users with high perfor-
General Terms                                                                mance on exploratory search tasks have different search behavior
                                                                             than users with lower performance.
Measurment, Performance, Design, Experimentation                                In order to investigate the first hypothesis we evaluate the search
Keywords                                                                     tool performance objectively in terms of the number of correct an-
                                                                             swers found for the search tasks and subjectively through a usabil-
Exploratory search, Usability evaluation                                     ity questionnaire. To answer the second hypothesis, we perform an
                                                                             analysis of the click data logged during search.
1.    INTRODUCTION
   Traditionally, archives have been the domain of archivists and            2.      EXPERIMENTAL DESIGN
librarians, who retrieve relevant items for a user’s request through
their knowledge of the content in, and organization of, the archive.         The environment. The setting for our experiment was the Nether-
Increasingly, archives are opening up and publishing their content           lands Institute for Sound and Vision (S&V), the Dutch national au-
online, making their collections directly accessible for the general         diovisual broadcast archive. In the experiment we used the archive’s
public. There are two major problems that these non-professional             collection consisting of around 1.5 M (television) programs with
users face. First, most users are unfamiliar or only partially famil-        metadata descriptions provided by professional annotators.
iar with the archive content and its representation in the repository.          We also utilized the search interface of S&V.1 The interface is
The internal representation is designed from the expert point of             available in a simple and an advanced version. The simple version
                                                                             is similar to search engines known from the web. It has a single
                                                                             search box and submitting a query results in a ranked list of 10
                                                                             programs. Clicking on one of the programs, the interface shows a
                                                                             page with the complete metadata description of the program. Ta-
Copyright c 2011 for the individual papers by the papers’ authors. Copy-     ble 1 shows the metadata fields available for a program. Instead of
ing permitted only for private and academic purposes. This volume is pub-    1
lished and copyrighted by the editors of euroHCIR2011.                           http://zoeken.beeldengeluid.nl
the usual snippets presented with each item in a result list, the inter-   “television geography” you need to investigate the representation of
face shows the title, date, owner and keywords for each item on the        places in drama series. Find five drama series where location plays
result page. Only the keywords and title field provide information         an important role. (iii) For the course “media and gender” you need
about the actual content of the program while the other fields pro-        to give a presentation about the television career of five different fe-
vide information primarily used for the organization of programs in        male hosts of game shows broadcasted during the 1950s, 1960s or
the archive collection. The description and summary fields contain         1970s. Find five programs that you can use in your presentation.
the most information about the content of programs but are only               Subjects received the search tasks in random order to avoid any
available by visiting the program description page.                        bias. Also, subjects were encouraged to perform the search in any
   We used the advanced version of the interface in the experiment         means that suited them best. During the experiment we logged
which next to the search box offers two other components: search           all search actions, e.g., clicks, performed by each subject. After a
boxes operating on specific fields and filters for certain categories      subject had finished all three search tasks, he or she was asked to fill
of terms. Fielded searches operate on specific fields in the program       out a questionnaire about the experiences with the search interface.
metadata. The filters become available after a list of programs has        Methodology for evaluation and analysis. We performed two
been returned in response to a query. The filters display the top          types of evaluation of the search interface: a usability questionnaire
five most frequent terms in the returned documents for a metadata          and the number of correct answers submitted for the search tasks.
field. The metadata fields displayed in the filter component of the        The questionnaire consists of three sets of questions. The first set
interface are highlighted in bold in Table 1. Once a checkbox next         involves aspects of the experienced search behaviour with the in-
to one of the terms has been ticked, programs not containing that          terface. The second set contains questions about how useful users
term in that field are removed from the result list.                       find the filter component, fielded search component, and metadata
                                                                           fields presented in the interface. The third set asks subjects to in-
Table 1: All metadata fields available for programs. We differ-            dicate the usefulness of a series of term clouds. The primary goal
entiate between fields that describe program content and fields            is not to evaluate the term clouds or their visualization but to find
that do not. Bold indicates fields used by the filter component.           preferences for information from certain metadata fields. We gen-
         content descriptors            organizational descriptors         erated a term cloud for a specific field as follows. First, we got
                                                                           the top 1000 program descriptions for the query “comedian.” We
field        explanation              field    explanation
                                                                           counted the terms for a field for each of the documents. The cloud
description program highlights       medium storage medium                 then represented a graphical display of the top 50 most frequent
person       people in program       genre gameshow; news                  terms in the fields of those documents, where the size of a term was
keyword      terms provided by       rights parties allowed
             annotator                      to broadcast                   relative to its frequency, i.e, the higher the frequency the bigger the
summary      summary of the          owner owner of the                    term. In the questionnaire subjects indicate agreement on a 5 point
             program format                 broadcast rights               Likert scale ranging from one (not at all) to five (extremely). The
organization organization in program date   broadcast date                 second type of evaluation was based on the evaluation methodology
location     locations in program    origin program origin                 applied at TREC [2]. We pooled the results of all subjects and let
title        program title                                                 two assessors make judgements about the relevance of the submit-
                                                                           ted answers to a search task. An answer is only considered relevant
Subjects. In total, 22 first year university students from media           if both assessors agree. Performance is measured in terms of the
studies participated in the experiment. The students (16 female,           number of correct answers (#correct) submitted to the system.
6 male) were between 19 and 22 years of age. As a reward for                   For the analysis of the search behavior of subjects we looked
participation the students gained free entrance to the museum of           at (i) the number of times a search query is submitted using any
the archive.                                                               combination of components (#queries); (ii) the number of times a
Experiment setup. In each of the five studios available at S&V ei-         program description page is visited (#pages); and (iii) the number
ther one or two subjects performed the experiment at a time in a sin-      of times a specific component is used, i.e., the general searchbox,
gle studio. In case two subjects were present, each of them worked         filters and fields. A large value for #queries indicates a look up
on machines facing opposite sides of the studio. We instructed sub-        type search behavior. It is characterized by a pattern of submitting
jects not to communicate during the experiment. During the experi-         a query, checking if the answer can be found in the result list and if
ment one instructor was always present in a studio. Before starting,       it is not, to formulate a new query. The new query is not necessar-
the subjects learned the goals of the experiment, got a short tuto-        ily based on information gained from the retrieved results but rather
rial on the search interface and performed a test query. During this       inspired by the subject’s personal knowledge [4]. A large value for
phase the subjects were allowed to ask questions.                          #pages indicates a learning style search behavior. In this search
   In the experiment each subject had to complete three search tasks       strategy a subject visits the program description of each search re-
in 45 minutes. If after 15 minutes a task was not finished, the in-        sult to get a better understanding of the organization and content of
structor asked the subject to move on to the next task. Search tasks       the archive. New queries are then also based on information gained
are related to matters that could potentially occur within courses         from the previous text analysis [4]. We check the usage frequency
of the student’s curriculum. Each search task required the subjects        of specific components to see if performance differences between
to find five answers before moving on to the next task. A correct          subjects are due to alternative uses of interface components.
answer was a page with the complete metadata description of a
program that fulfilled the information need expressed by the search        3.    RESULTS
task. Subjects could indicate that a page was an answer through a
submit button added to the interface for the experiment.                   Search interface evaluation. Figure 1 shows the distribution of
   We used the following three search tasks in the experiment: (i) For     the amount of correct answers submitted for a search task, together
the course “media and ethnicity” you need to investigate the role          with the distribution of the amount of answers (correct or incor-
of ethnicity in television-comedy. Find five programs with differ-         rect) submitted. Out of the possible total of 330 answers, 173 are
ent comedians with a non-western background. (ii) For the course           actually submitted. Subjects submit the maximum number of five
                                                                         Table 3: Analysis of search behavior of subjects. Significance
                                                                         is tested using a standard two-tailed t-test. The symbol N indi-
            30
                                               #correct                  cates a significant increase at the ↵ < 0.01 significance level.
                                               #submitted
   #tasks


                                                                                    filter    field searchbox #queries #pages
            20

                                                                          B avg     21.3     29.5       44.8          35.2      21.2
                                                                                                                                35.7N
            10


                                                                          G avg     15.2     44.0       42.0          34.3
            0




                      0       1       2      3       4       5           ber of queries suggests that the difference in performance is not
                                                                         due to one group doing more lookups than the other. The indi-
 Figure 1: Distribution of amount correct/submitted answers.
                                                                         cator for learning type search, i.e., #pages, shows that there is a
                                                                         significant difference in the number of program description pages
                                                                         visited between subjects of the two groups, i.e., subjects in group
answers for 18 of the tasks. This suggests that subjects have diffi-
                                                                         G tend to visit program description pages more often than subjects
culties in finding answers within the given time limit. Subjects find
                                                                         of group B. We also find that the average time subjects in group G
no correct answers for 31 of the tasks, five subjects find no cor-
                                                                         spend on a program description page is 27 seconds, while subjects
rect answer for any of the tasks, and none of the subjects reaches
                                                                         from group B spend on average 39 seconds. These observations
the maximum of five correct answers for a task. In total 64 out of
                                                                         support our hypothesis that there are differences in search behavior
173 answers are correct. This low precision indicates that subjects
                                                                         between subjects that have high performance on exploratory search
find it difficult to judge if an answer is correct based on the meta-
                                                                         tasks and subjects with lower performance.
data provided by the program description. Table 2 shows ques-
tions about the satisfaction of subjects with the interfaces. Subjects   Usefulness of program descriptions. One explanation for this dif-
indicate their level of agreement from one (not at all) to five (ex-     ference in performance is that through their search behavior sub-
tremely). For all questions the majority of subjects find the amount     jects from group G learn more about the content and organization
of support offered by the interface on the exploratory search tasks      of the archive and are able to assimilate this information faster from
marginal. This finding supports our first hypothesis that the search     the program descriptions than subjects from group B. As subjects
interface intended for professional users does not provide satisfac-     process more program descriptions they learn more about the avail-
tory support to non-professional users on exploratory search tasks.      able programs and terminology in the domain. This results in a
                                                                         richer set of potential search terms to formulate their information
                                                                         need. To investigate whether subjects found information in the pro-
Search behavior analysis. Although all subjects are non-experts          gram descriptions useful in suggesting new search terms, we anal-
with respect to search with this particular interface, some perform      yse the second set of questions from the questionnaire. The top half
better than others. We investigate whether there is a difference in      of Table 4 shows subjects’ responses to questions about the useful-
the search behavior of subjects that have high performance on the        ness of metadata fields present on the search result page. Consid-
search tasks and users that have lower performance. We divide            ering responses from all subjects the genre and keyword fields are
subjects into two groups depending on the average number of cor-         found most useful and the title and date fields as well, although to
rect answers found aggregated over the three tasks, i.e., 2.9 out of     a lesser degree. The fields intended for professionals, i.e., origin,
the possible maximum of 15. The group with higher performance            owner, rights, and medium are found not useful by the majority of
(group G) consists of 11 subjects with 3 or more correct answers,        subjects. Between group B and G there are no significant differ-
whereas the group with lower performance (group B) consists of           ences in subject’s judgement of the usefulness of the fields.
11 subjects with 2 or less correct answers.
    Table 3 shows the averages of the search behavior indicators for
each of the two groups. We first look at the usage frequency of the
filter, field, and search box components by subjects in group G vs.      Table 4: Questions about the usefulness of metadata fields on
group B. There is no significant difference between the groups, in-      program description pages and the mode and average (avg) of
dicating that there is no direct correlation between performance on      the subjects responses: for all subjects, the good (G) and bad
the search tasks and use of specific search components. Next we          (B) performing group. We use a Wilcoxon signed rank test for
look at search behavior as an explanation for the difference in per-     the ordinal scale. The symbol M (N ) indicates a significant in-
formance between the groups. Our indicator for lookup searches,          crease at the ↵ < 0.05 (0.01) level.
i.e., #queries, shows a small difference in the number of submitted                                           all       B         G
queries. That subjects in both groups submit a comparable num-           question             field         mode mode avg mode avg
                                                                         Degree to which        date              3       2 2.2   3      3.0
                                                                         fields on the result   owner             1       1 1.6   1      2.0
Table 2: Questionnaire results about the satisfaction of subjects        page were useful in    rights            1       1 1.3   1      1.4
                                                                         suggesting new         genre             4       1 2.8   4      3.9
with the search interface. Agreement is indicated on a 5 point           terms                  keyword           4     1,5 3.1   4      3.5
Likert scale ranging from one (not at all) to five (extremely).                                 origin            1     1,2 1.7   1      2.0
 question                                           mode avg                                    title           3,4       2 2.2   4      3.0
 To what degree are you satisfied with the search        2       2.3                            medium            1       1 1.5 1,2      1.6
 experience offered by the interface?                                    Degree to which        summary          4   1,4 2.8   5 3.8
 To what degree did the interface support you by         2       2.4     fields in program      description      4     4 3.3   4M 4.1
 suggesting new search terms?                                            descriptions were      person           4 1,3,4 2.8   4N 3.8
 To what degree are you satisfied with the sug-          2       2.3     useful in suggesting   location     1,3,4   1,3 2.0   4N 3.0
 gestions for new search terms by the interface?                         new terms              organization     1     1 1.8 1,2 2.0
   The bottom part of Table 4 shows subject’s responses to ques-           the archive. Together, the above findings suggest that subjects find
tions about the usefulness of metadata fields only present on the          a direct presentation of short and meaningful terms, i.e., categories,
program description page and not already shown on the search re-           keywords, and entities, on the search results page useful.
sult page. Based on all responses, the summary, description, person
and location metadata fields are considered most useful by the ma-         4.    CONCLUSION
jority of the subjects. These findings further support our argument
                                                                              We presented results from a user study where non-professional
that program descriptions provide useful information for subjects
                                                                           users perform exploratory search tasks with a search tool originally
to complete their search tasks.
                                                                           developed for media professionals and archivists in an audio visual
   When we contrast responses of the two groups we find that group
                                                                           archive. We hypothesized that such search tools provide unsatisfac-
G subjects consider the description, person, and location metadata
                                                                           tory support to non-professional users on exploratory search tasks.
fields significantly more useful than subjects from group B. This
                                                                           By means of a TREC style evaluation we find that subjects achieve
suggests that group B subjects have more difficulties in distilling
                                                                           low recall in the number of correct answers found. In a question-
useful information from these fields (recall also the longer time
                                                                           naire regarding the user satisfaction with the search support offered
spent on a page). This does not say that these users cannot un-
                                                                           by the tool, subjects indicate this to be marginal. Both findings sup-
derstand the provided information. All that is indicated is that the
                                                                           port our hypothesis that a professional search tool is unsuitable for
chosen modality, i.e., text, might not be the right one. A graphical
                                                                           non-professional users performing exploratory search tasks.
representation, for example as term clouds, might be better.
                                                                              Through an analysis of the data logged during the experiment,
Fields as term clouds. In response to the observations just made,          we find evidence to support our second hypothesis that subjects per-
we also investigated how users would judge visual representations          form different search strategies. Subjects that visit more program
of search results, i.e., in the form of term clouds directly on the        description pages are more successful on the exploratory search
search result page. Here the goal is not to evaluate the visualization     tasks. We also find that subjects consider certain metadata fields on
of the clouds or the method by which they are created. Of interest         the program description pages more useful than others. Subjects in-
to us is whether subjects would find a direct presentation of infor-       dicate that visualization of certain fields as term clouds directly in
mation normally “hidden” on the program description page useful.           the search interface would be useful in completing the search tasks.
   Recall from §2 that we generate term clouds for each field on           Subjects especially consider presentations of short and meaningful
the basis of the terms in the top 1000 documents returned for a            text units, e.g., categories, keywords, and entities, useful.
query. From Table 5 we observe that subjects do not consider                  In future work we plan to perform an experiment in which we
the description and summary clouds useful, while previously these          present non-professional users with two interfaces: the current search
fields were judged most useful among the fields in the program de-         interface and one with a direct visualization of categories, key-
scription. Both clouds contain general terms from the television           words and entities on the search result page.
domain, e.g., program and series, which do not provide subjects
                                                                           Acknowledgements. This research was partially supported by the
with useful search terms. Although this could be due to the use
                                                                           European Union’s ICT Policy Support Programme as part of the
of frequencies to select terms, these fields are inherently difficult
                                                                           Competitiveness and Innovation Framework Programme, CIP ICT-
to visualize without losing the relations between the terms. The
                                                                           PSP under grant agreement nr 250430, the PROMISE Network of
genre, keyword, location and, to some degree, person clouds are all
                                                                           Excellence co-funded by the 7th Framework Programme of the Eu-
considered useful, but they support the user in different ways. The
                                                                           ropean Commission, grant agreement no. 258191, the DuOMAn
genre field supports the subject in understanding how content in the
                                                                           project carried out within the STEVIN programme which is funded
archive is organized, i.e., it provides an overview of the genres used
                                                                           by the Dutch and Flemish Governments under project nr STE-09-
for categorization. The keyword cloud provides the user with alter-
                                                                           12, the Netherlands Organisation for Scientific Research (NWO)
native search terms for his original query, for example, satire or
                                                                           under project nrs 612.061.814, 612.061.815, 640.004.802, 380-70-
parody instead of cabaret. The location and person clouds offer an
                                                                           011, the Center for Creation, Content and Technology (CCCT), the
indication of which locations and persons are present in the archive
                                                                           Hyperlocal Service Platform project funded by the Service Innova-
and how prominent they are. For these fields visualization is easier,
                                                                           tion & ICT program, the WAHSP project funded by the CLARIN-
i.e., genre, keywords or entities by themselves are meaningful with-
                                                                           nl program, and under COMMIT project Infiniti.
out having to represent relations between them. Subjects consider
the title field only marginally useful. For this field the usefulness is
dependent on the knowledge of the subject as titles are not neces-         REFERENCES
sarily descriptive. The subjects also consider the organization field      [1] J.-w. Ahn, P. Brusilovsky, J. Grady, D. He, and R. Florian. Se-
marginally useful, probably due to the nature of our search tasks,             mantic annotation based exploratory search for information an-
i.e., two tasks focus on finding persons and in one locations play             alysts. Inf. Proc. & Management, 46(4):383 – 402, 2010.
an important role. We assume though that in general this type of           [2] D. K. Harman. The TREC test collections. In E. M. Voorhees
information need occurs when the general public starts exploring               and D. K. Harman, editors, TREC: Experiment and evaluation
                                                                               in information retrieval. MIT, 2005.
                                                                           [3] B. Huurnink, L. Hollink, W. van den Heuvel, and M. de Rijke.
Table 5: Questions about the usefulness of term clouds based                   Search behavior of media professionals at an audiovisual
on specific metadata fields. Agreement is indicated on a 5 point               archive. J. Am. Soc. Inf. Sci. and Techn., 61:1180–1197, 2010.
Likert scale ranging from one (not at all) to five (extremely).            [4] G. Marchionini. Exploratory search: from finding to under-
 cloud          mode avg cloud               mode avg                          standing. Comm. ACM, 49(4):41 – 46, April 2006.
                                                                           [5] R. White, B. Kules, S. Drucker, and M. Schraefel. Supporting
 title            2        2.8    description    1         2.5
                                                                               exploratory search: Special issue. Comm. ACM, 49(4), 2006.
 person           2,3      2.9    genre          4         3.4
 location         4        3.3    summary        1         2.3
 organization     2        2.2    keyword        4         3.8
       Supplying Collaborative Source-code Retrieval Tools
                     to Software Developers

        Juan M. Fernández-Luna                          Juan F. Huete                       Julio C. Rodríguez-Cano
        Departamento de Ciencias de             Departamento de Ciencias de                Centro de Desarrollo Territorial
        la Computación e Inteligencia           la Computación e Inteligencia               Holguín. Universidad de las
            Artificial, CITIC-UGR.                  Artificial, CITIC-UGR.                  Ciencias Informáticas, 80100
           Universidad de Granada,                 Universidad de Granada,                         Holguín, Cuba
            18071 Granada, Spain                    18071 Granada, Spain                           jcrcano@uci.cu
         jmfluna@decsai.ugr.es                      jhg@decsai.ugr.es

ABSTRACT                                                                    One of the reasons that the existing IR systems do not
Collaborative information retrieval (CIR) and search-driven              adequately   support collaboration is that there are not good
software development (SDD) are both new emerging research                models and methods that describe users’ behavior during
fields; the first one was born in response to the problem of             collaborative tasks. To address this issue, the community
satisfying shared information needs of groups of users that              has adopted CIR as an emerging research field in charge to
collaborate explicitly, and the second to explore source-code            establish techniques to satisfy the shared information needs
retrieval concept as an essential activity during software de-           of group members, starting from the extension of the IR
velopment process. Taking advantages of the recent con-                  process with the knowledge about the queries, the context,
tributions in CIR and SDD, in this paper we introduce a                  and the explicit collaboration habits among group members.
plug-in that can be added to the NetBeans IDE in order                   CIR community identifies four fundamental features in this
to enable remote teams of developers to use collaborative                multidisciplinary field that can enhance the value of colla-
source-code retrieval tools. We also include in this work                borative search tools: user intent transition, awareness, di-
experimental results to confirm that CIR&SDD techniques                  vision of labor, and sharing of knowledge [2].
give out better search results than individual strategies.                  In addition, SDD is a new research area motivated by
                                                                         the observation that software developers spend most of their
                                                                         time searching pertinent information that they need in order
Categories and Subject Descriptors                                       to solve their tasks at hand. We identified that SDD context
H.5.3 [Information Interfaces and presentation (e.g.,                    was a very interesting field where collaborative IR features
HCI)]: Group and Organization Interfaces; H.3.3 [Information could be greatly exploited. For this reason we use the phrase
Storage and Retrieval]: Search Process.                                  collaborative SDD to refer to the application of di↵erent
                                                                         collaborative IR techniques in the SDD process [3].
General Terms                                                               It’s known than some IDE incorporate tools with support
Design, Human Factors.                                                   for developer’s collaboration practices, but without making
                                                                         emphasis in source-code retrieval. In this sense, the objec-
                                                                         tive of this paper is to present the results of the comparison
Keywords                                                                 of traditional SDD and collaborative SDD. In both search
Collaborative Information Seeking and Retrieval, Search-                 scenarios, we use the NetBeans IDE plug-in COSME (CO-
driven Software Development, Multi-user Search Interface.                llaborative Search MEeting) with the appropriate configura-
                                                                         tions. COSME endows NetBeans IDE with traditional and
1. INTRODUCTION                                                          collaborative source-code retrieval tools.
                                                                            This paper is organized as follows: The first section presents
               “Collaboration” seems to be the buzzword this year,       a brief overview of related works and place our research in
                 just like “knowledge management” was last year.
                                                                         context. Then, we describe our software tool and method,
                                                    – David Coleman
                                                                         explaining the di↵erent aspects of our experimental evalua-
   In the last few years, Information Retrieval (IR) Systems             tion. Finally we discuss the results and present some con-
have become critical tools for software developers. Today                clusion remarks.
we can use vertical IR systems focused in integrated deve-
lopment environment (IDE) extensions for source-code re-
trieval as such Strathcona [5], CodeConjurer [6], and Code-              2. RELATED WORK
Genie [1], but these only allow an individual interaction from              There is a small body of work that investigates methods
the team developers’s perspective.                                       to join collaborative information retrieval and search-driven
                                                                         software development. On the one hand, some researchers
                                                                         have identified di↵erent search scenarios where it is necessa-
Copyright c 2011 for the individual papers by the papers’ authors.       ry to extend IR systems with collaborative capabilities. For
Copying permitted only for private and academic purposes. This volume is example, in the Web context, SearchTogether [8] is a sys-
published and copyrighted by the editors of EuroHCIR2011.
                                                                         tem which enables remote users to synchronously or asyn-
EuroHCIR ’11 Newcastle, UK                                               chronously collaborate when searching the Web. It supports
collaboration with several mechanisms of group awareness,         3.    THE COSME PLUG-IN
division of labor, and persistence. On the other hand, the          To improve software developers with shared technical in-
SDD community presents di↵erent prototypes and systems.           formation needs we implemented the COSME front-end as
For example, Sourcerer [1] is an infrastructure for large-scale   a NetBeans IDE plug-in. The principal technologies that
indexing and analysis of open source code. Sourcerer crawls       we used to implement it include the CIRLab framework [2],
Internet looking for Java code from a variety of locations,       NetBeans IDE platform, Java as programming language,
such as open source repositories, public web sites, and ver-      and AMENITIES (A MEthodology for aNalysis and desIgn
sion control systems.                                             of cooperaTIve systEmS) as software engineering method-
   CIR systems can be applied in several domains, such as         ology. COSME is designed to enable either synchronous
travel planning, organizing social events, working on a home-     or asynchronous, but explicit remote collaboration among
work assignment or medical environments, among many oth-          teams of developers with shared technical needs. In the fol-
ers. We identified software development as another possi-         lowing section we are going to outline COSME.
ble application field where much evidence of collaboration
among programmers on a development task can be found.             3.1    Current Features
For example, concurrent edition of models and processes re-          Figure 1 is a screenshot showing various features of our
quire synchronous collaboration between architects and de-        COSME plug-in. We refer to the circled numbers in the
velopers who can not be physically present at a common            following text.
location [7].                                                        1. Search Control Panel: It is integrated in turn for
   However, current SDD systems do not have support for           three collapsible panels; (a) configuration, where the devel-
explicit collaboration among developers with shared techni-       opers can select the search options and engines to accomplish
cal information needs, which frequently look for additional       the search tasks; (b) filters show the user’s interest field ac-
documentation on the API (Application Programming In-             cording to the collection contents; and (c) collection type
terface), read posts for people having the same problem,          permit to specify the type of search result’s items.
search the company’s site for help with the API, or looking          2. Search Results Window: The search results can
for source code examples where other people successfully          be classified according to three di↵erent source-code local-
used the API. Fortunately, in the last few years, some re-        ization: (d) results can be obtained as a consequence of
searchers have realized that collaboration is an important        division of labor techniques introduced by the collaborative
feature, which should be analyzed in detail in order to be        search session (CoSS) chairman. A CoSS is a group of end-
integrated with operational IR systems, upgrading them to         users working together to satisfy their shared information
CIR systems.                                                      needs. One CoSS only can have one developer in the roll of
   As an approach to these situations, we propose in this         chairman; (e) or by explicit recommendations accomplished
work the COSME plug-in [4]. It makes the contribution in          for group members of their CoSS; (f ) finally, search results
current SDD providing explicit support for teams of devel-        also can be obtained by individual search.
opers, enabling developers to collaborate on both the pro-           3. Item Viewer: It shows full item content in di↵erent
cess and results of a search. COSME provides collabora-           formats, e.g. pdf, plain text, and Java source-code files.
tive search functions for exploring and managing source-code      All item formats are showed to the developers within the
repositories and documents about technical information in         NetBeans IDE.
the software development context.                                    4. CoSS Portal: Developer can use the chat tool em-
   In order to support such CIR techniques, COSME pro-            bedded in the CoSS Portal to negotiate the creation of a
vides some collaborative services in the context of SDD:          collaborative search session or to join at any active CoSS.
                                                                  For each CoSS, the chairman can to establish the integrity
   • The embedded chat tool enables direct communication          criteria, membership policy, and division of labor principles.
     among di↵erent developers.
                                                                  4.    EXPERIMENTAL EVALUATION
   • Relevant search results can be shared with the explicit         In this section we are going to show how collaborative
     recommender mechanisms.                                      features applied to SDD improves the traditional opera-
                                                                  tion without them. Then if we consider the null hypoth-
   • Another important feature is the automatic division          esis (H0 ) that AT SDD ACSDD , our alternative hypothesis
     of labor. By implementing an e↵ective division of la-        (H1 ) is that the collaborative work should help to improve
     bor policy the search task can be split across team          the retrieval performance in a SDD task: AT SDD < ACSDD ,
     developers, thereby avoiding considerable duplication        where TSDD stands for Traditional SDD and CSDD for Col-
     of e↵ort.                                                    laborative SDD. To evaluate our proposal we compare 10
                                                                  group interactions in two di↵erent kinds of search scenarios
   • Through awareness mechanisms all developers are al-          (SS) on SDD, SS2k+1 and SS2(k+1) ; k 2 0, . . . , 9. SS2k+1
     ways informed about the team activities to save e↵ort.       represents a team of developers that use a conventional IR
     Awareness is a valuable learning mechanism that help         system, this means that developers do not have access to
     the less experienced developers to view the syntax used      techniques of division of labor, sharing of knowledge, or
     by their teammates, being an inspiration to reformu-         awareness (traditional SDD – TSDD), while S2(k+1) repre-
     late their queries.                                          sents a team of developers that uses a CIR system. Then, 5
                                                                  teams worked in a TSDD context (those with odd subindexes)
   • All search results can be annotated, either for personal     and the other 5 with CSDD (even subindexes). In both
     use, like a summary, or in the team context, for dis-        search scenarios, we used COSME with the appropriate con-
     cussion threads and ratings.                                 figurations for both settings.
                       Figure 1: Screenshot of NetBeans IDE with COSME plug-in installed


   The search scenario was a common task proposed to a           qe 0 .
group of developers without Java background: select the
most relevant classes to manage GUI (Graphical User In-                                                 T
                                                                                                  | qu 0 qe 0 |
terface) components using di↵erent Java API with a total                        sim(qu , qe ) =         S        =        (1)
                                                                                                  | qu 0 q e 0 |
of 2420 files. Specifically, Jidesoft (634), OpenSwing (434)),
SwingX (732)) and Swing (620). We have focussed on these            In Equation 1, is a value between 0 and 1. For this ex-
API because they are directly related to the context of the      periment we assumed that there exists an expert’s relevance
                                                                                                  N +1                  S
experiment although they are not complete: we have only          judgement to qu only if 9         2
                                                                                                       , where N =| qu 0 qe 0 |,
                                                                                                    N
considered their most relevant API packages for the experi-      selecting the relevance judgements that correspond to max
ment.                                                            for each qe .
   For evaluation purposes, we created our own test collec-         In order to measure the e↵ectiveness of the described SST SDD
tion: a group of 10 experts proposed a set of 100 topics         and SSCSDD scenarios, we considered as evaluation mea-
strongly related to the objective of the experimentation,        sures the metrics proposed by Pickens et al. in [9], i.e. se-
then their corresponding queries were submitted to each of       lected precision (Ps , the fraction of documents judged rel-
the following search engines: Lucene, Minion, Indri and Ter-     evant by the developer that were marked relevant in the
rier. A document pool was obtained by ranking fusion and         ground truth), and selected recall (Rs ) as their dependent
later the experts, grouped in pairs, determined the relevant     measures. To summarize e↵ectiveness in a single number we
documents for each topic.                                        use F1s measure.
   In collaborative SDD, it is very important to analyze the        According to the documents that each team selected for
interaction among group members, therefore, unlike the eval-     each common topic, F1s measure was computed. In order to
uation of a traditional SDD system, we can not fix the           accomplish the statistical analysis of the results, we use the
queries. Then each participating group could freely formu-       non parametric test of Wilcoxon (all against all). The Monte
late their queries to the search engine. In order to compare     Carlo method was used and adjusted with the 99% trust
team results, the search engine identified the most similar      intervals and 10000 signs. It was considered the existences
queries formulated by the members of the teams with re-          of significance (Sig.) as appear in Table 1.
spect to those formulated by experts. If the system found           We could notice significative di↵erences between TSDD
enough similarity and if they occur in all the groups, then      and CSDD groups, considered two by two. As F1s values for
these queries are considered that deals with the same topic      CSDD groups are better than those computed from TSDD
and selected for group comparison purposes. The similar-         groups for those cases, then we could conclude that when
ity measure between queries is calculated by Equation 1. A       teams works supported by collaborative tools, they obtain
user query (qu ) and an expert query (qe ) are considered to     better results. From Table 1, we could realize that apart
be the same if they are within a given similarity threshold.     from SS5 , each SST SDD has got at least one SSCSDD with
A new query qu 0 is obtained applying the Porter stemmer         significant di↵erence values of F1s . With this results we
algorithm to qu ’s terms, and analogously, we would obtain       accept H1 , because AT SDD < ACSDD .
                           SS1       SS2      SS3       SS4        SS5     SS6       SS7       SS8       SS9
                                                                F1s
                   SS2     0, 062
                   SS3     0, 180    0, 051
                   SS4     0, 022† 0, 212     0, 038†
                   SS5     0, 272    0, 069   0, 152    0, 054
                   SS6     0, 045† 0, 201     0, 080    0, 290    0, 056
                   SS7     0, 215    0, 031† 0, 340     0,090     0, 206   0, 042†
                   SS8     0, 053    0, 131   0, 061    0, 190    0, 072   0, 158    0, 070
                   SS9     0, 243    0, 072   0, 201    0, 029† 0, 344     0, 068    0, 238    0, 042†
                   SS10 0, 065       0, 098   0, 041† 0, 290      0, 072   0, 235    0, 045†   0, 132    0, 058
                   †: significant di↵erence (0, 01  Sig < 0, 05)
                   ‡: highly significant di↵erence (Sig < 0, 01)


                                             Table 1: Wilcoxon Test Results.


5.   CONCLUSIONS AND FUTURE WORKS                                Search-Driven Development-Users, Infrastructure, Tools
   Collaboration in SDD is just being recognized as an im-       and Evaluation, pages 1–4, Washington, DC, USA,
portant research area. While in some cases collaborative         2009. IEEE Computer Society.
SDD can be handled by conventional search engines, we        [2] J. M. Fernández-Luna, J. F. Huete, R. Pérez-Vázquez,
need to understand how the collaborative nature of source-       and J. C. Rodrı́guez-Cano. Cirlab: A groupware
code retrieval a↵ects the requirements on search algorithms.     framework for collaborative information retrieval
Research in this direction needs to adopt the theories and       research. Information Processing and Management,
methodologies of SDD and CIR, and supplement them with           44(1):256–273, 2009.
new approach constructs as appropriate. In this work we      [3] J. M. Fernández-Luna, J. F. Huete, R. Pérez-Vázquez,
present COSME as a collaborative SDD tool that helps team        and J. C. Rodrı́guez-Cano. Improving search–driven
developers to find better sources than searching with tradi-     development with collaborative information retrieval
tional SDD strategies, as well as an experimental approach       techniques. In HCIR ’09: IIIrd Workshop on
that confirms our hypotheses.                                    Human–Computer Interaction and Information
   Our ongoing work focuses on the COSME back-end which          Retrieval, Washington DC, USA, 2009.
poses fundamental research challenges as well as provides    [4] J. M. Fernández-Luna, J. F. Huete, R. Pérez-Vázquez,
new opportunities to let group members collaborate in new        and J. C. Rodrı́guez-Cano. Cosme: A netbeans ide
ways:                                                            plugin as a team–centric alternative for search driven
   (i) Profile Analysis. We aim to analyze the user-generated    software development. In Group 2010: Ist Workshop on
data using various techniques from the study of di↵erent col-    Collaborative Information Seeking, Florida, USA, 2010.
laborative virtual environments and recommender systems.     [5] R. Holmes. Do developers search for source code
With the results, our goal is to provide better personalized     examples using multiple facts? In SUITE 2009: First
search results, support the users while searching and recom-     International Workshop on Search-Driven Development
mend users to relevant trustworthy collaborators.                Users, Infrastructure, Tools and Evaluation, Vancouver,
   (ii) P2P/hybrid-network Retrieval. Due to scalability         Canada, 2009.
and privacy issues we favor a distributed environment by     [6] W. Janjic. Lowering the barrier to reuse through
means of a P2P (peer-to-peer) retrieval feature based on hy-     test-driven search. In SUITE 2009: First International
brid architecture to store the user-generated data and col-      Workshop on Search-Driven Development Users,
lections (CASPER – CollAborative Search in PEer-to-peer          Infrastructure, Tools and Evaluation, Vancouver,
netwoRks). The main challenges in this respect are to ensure     Canada, 2009.
a reliable and efficient data analysis.                      [7] M. Jiménez, M. Piattini, and A. Vizcaı́no. Challenges
                                                                 and improvements in distributed software development:
6. ACKNOWLEDGMENTS                                               A systematic review. 2009.
   This work has been partially supported by the Spanish re- [8] M. R. Morris and E. Horvitz. Searchtogether: an
search programme Consolider Ingenio 2010: MIPRCV (CSD2007-       interface for collaborative web search. In UIST ’07:
00018), the Spanish MICIN project TIN2008-06566-C04-01           Proceedings of the 20th annual ACM symposium on
and the Andalusian Consejerı́a de Innovación, Ciencia y Em-     User interface software and technology, pages 3–12,
presa project TIC-04526. We also would like to thank Car-        New York, NY, USA, 2007. ACM.
men Torres for support and discussions and for all of our    [9] J. Pickens, G. Golovchinsky, C. Shah, P. Qvarfordt, and
experiment participants.                                         M. Back. Algorithmic mediation for collaborative
                                                                 exploratory search. In SIGIR ’08: Proceedings of the
7. REFERENCES                                                    31st annual international ACM SIGIR conference on
[1] S. Bajracharya, J. Ossher, and C. Lopes. Sourcerer: An       Research and development in information retrieval,
    internet-scale software repository. In SUITE ’09:            pages 315–322, New York, NY, USA, 2008. ACM.
    Proceedings of the 2009 ICSE Workshop on
                          Interactive Analysis and Exploration of
                             Experimental Evaluation Results

               Emanuele Di Buccio                                Marco Dussin                         Nicola Ferro
               University of Padua, Italy                  University of Padua, Italy           University of Padua, Italy
              dibuccio@dei.unipd.it                       dussinma@dei.unipd.it                   ferro@dei.unipd.it
                  Ivano Masiero                             Giuseppe Santucci                       Giuseppe Tino
               University of Padua, Italy               Sapienza University of Rome,          Sapienza University of Rome,
              masieroi@dei.unipd.it                                Italy                                 Italy
                                                        santucci@dis.uniroma1.it                 tino@dis.uniroma1.it

ABSTRACT                                                                    research groups and industries, producing a huge amount of
This paper proposes a methodology based on discounted cu-                   valuable data to be analysed, mined, and understood.
mulated gain measures and visual analytics techniques in                       The aim of this work is to explore how we can improve
order to improve the analysis and understanding of IR ex-                   the comprehension of and the interaction with the experi-
perimental evaluation results. The proposed methodology                     mental results by IR researchers and IR system developers.
is geared to favour a natural and e↵ective interaction of the               We imagine the following scenarios: (i) a researcher or a de-
researchers and developers with the experimental data and                   veloper is attending the workshop of one of the large-scale
it is demonstrated by developing an innovative application                  evaluation campaigns and s/he wants to explore and under-
based on Apple iPad.                                                        stand the experimental results as s/he is listening at the
                                                                            presentation discussing them; (ii) a team of researchers or
                                                                            developers is working on tuning and improving an IR sys-
Categories and Subject Descriptors                                          tem and they need tools and applications that allow them
H.3.3 [Information Search and Retrieval]: [Search pro-                      to investigate and discuss the performances of the system
cess]; H.3.4 [Systems and Software]: [Performance eval-                     under examination in a handy and e↵ective way.
uation (efficiency and e↵ectiveness)]                                          These scenarios call for: (a) proper metrics that allow
                                                                            us to understand the behaviour of a system; (b) e↵ective
General Terms                                                               analysis and visualization techniques that allow us to get an
                                                                            overall idea of the main factors and critical areas which have
Experimentation, Human Factors, Measurement, Performance                    influenced performances in order to be able to dig into de-
                                                                            tails; (c) for tools and applications that allow us to interact
Keywords                                                                    with the experimental result in a both e↵ective and natural
Ranking, Visual Analytics, Interaction, Discounted Cumu-                    way.
lated Gain, Experimental Evaluation, DIRECT                                    To this end, we propose a methodology which allows us to
                                                                            quickly get an idea of the distance of an IR system with re-
                                                                            spect to both its own optimal performances and the best per-
1.     INTRODUCTION                                                         formances possible. We rely on the (normalized) discounted
   The Information Retrieval (IR) field has a strong and long-              cumulated gain (n)DCG family of measures [7] because they
lived tradition, that dates back to late 50s/early 60s of the               have shown to be especially well-suited not only to quantify
last century, as far as the assessment of the performances of               system performances but also to give an idea of the over-
an IR system is concerned. In particular, in the last 20 years,             all user satisfaction with a given ranked list considering the
large-scale evaluation campaigns, such as the Text REtrieval                persistence of the user in scanning the list.
Conference (TREC)1 in the United States and the Cross-                         The contribution of this paper is to improve on the previ-
Language Evaluation Forum (CLEF)2 in Europe, have con-                      ous work [7,11] by trying to better understand what happens
ducted cooperative evaluation e↵orts involving hundreds of                  when you flip documents with di↵erent relevance grades in
1
    http://trec.nist.gov/                                                   a ranked list. This is achieved by providing a formal model
2
    http://www.clef-campaign.org/                                           that allows us to properly frame the problem and quantify
                                                                            the gain/loss with respect to an optimal ranking, rank by
                                                                            rank, according to the actual result list produced by an IR
                                                                            system.
                                                                               The proposed model provides the basis for the develop-
                                                                            ment of Visual Analytics (VA) techniques that give us the
                                                                            possibility to get a quick and intuitive idea of what hap-
                                                                            pened in a result list and what determined its perceived
                                                                            performances. Visual Analytics [8, 10, 14] is an emerging
Copyright c 2011 for the individual papers by the papers’ authors. Copy-
ing permitted only for private and academic purposes. This volume is pub-   multi-disciplinary area that takes into account both ad-hoc
lished and copyrighted by the editors of euroHCIR2011.                      and classical Data Mining (DM) algorithms and Informa-
tion Visualization (IV) techniques, combining the strengths        vector of n documents V , i.e., V [1] contains the identifier of
of human and electronic data processing. Visualisation be-         the document predicted by the system to be most relevant,
comes the medium of a semi-automated analytical process,           V [n] the least relevant one. The ground truth GT function
where human beings and machines cooperate using their re-          assigns to each document V [i] a value in the relevance inter-
spective distinct capabilities for the most e↵ective results.      val {0..k}, where k represents the highest relevance score,
Decisions on which direction analysis should take in order         e.g. k = 3 in [7]. The basic assumption is that the greater
to accomplish a certain task are left to final user. While IV      the position of a document the less likely it is that the user
techniques have been extensively explored [4,13], combining        will examine it, because of the required time and e↵ort and
them with automated data analysis for specific application         the information coming from the documents already exam-
domains is still a challenging activity [9]. Moreover, the         ined. As a consequence, the greater the rank of a relevant
Visual Analytics community acknowledges the relevance of           document the less useful it is for the user. This is mod-
interaction for visual data analysis, and that the current         eled through a discounting function DF that progressively
research activities very often focus only on visual represen-      reduces the relevance of a document, GT (V [i]) as i increases:
tation, neglecting the interaction design, as clearly stated                             ⇢
in [14]. This refers to two di↵erent typologies of interaction:                             GT (V [i]), if i  x
                                                                           DF (V [i]) =                                         (1)
1) interaction within a visualization and, 2), closer to the                                GT (V [i])/ logx (i), if i > x
paper contribution, interaction between the visual and the         The quality of a result can be assessed Pusing the discounted
analytical components.                                             cumulative gain function DCG(V, i) = ij=1 DF (V [j]) that
   The idea of exploring and applying VA techniques to the         estimates the information gained by a user that examines
experimental evaluation in the IR field is quite innovative        the first i documents of V .
since it has never been attempted before and, due to the              The DCG function allows for comparing the performances
complexity of the evaluation measures and the amount of            of di↵erent search engines, e.g., plotting the DCG(i) values
data produced by large-scale evaluation campaigns, there is        of each engine and comparing the curve behavior.
a strong need for better and more e↵ective representation             However, if the user’s task is to improve the ranking per-
techniques. Moreover, visualizing and assessing ranked list        formance of a single search engine, looking at the misplaced
of items, to the best of the authors’ knowledge, has not been      documents (i.e., ranked too high or too low with respect to
addressed by the VA community. The few related propos-             the other documents) the DCG function does not help: the
als, see, e.g., [12], use rankings for presenting the user with    same value DCG(i) could be generated by di↵erent permu-
the most relevant visualizations, or for browsing the ranked       tations of V and it does not point out the loss in cumulative
result, see, e.g., [5], but do not deal with the problem of        gain caused by misplaced elements. To this aim, we intro-
observing the ranked item position, comparing it with an           duce the following definitions and novel metrics.
ideal solution, to assess and improve the ranking quality. A          We denote with OptP erm(V ) the set of optimal permu-
first attempt in such a direction is in [1], where the authors     tations of V such as that 8OV 2 OptP erm(V          ) it holds
explored the basic issues associated with the problem, pro-                                                       V
                                                                   that GT (OV [i])      GT (OV [j])8i, j <= n       i < j, that
viding basic metrics and introducing a VA web based system         is, OV maximizes the values of DCG(OV, i)8i. In other
that allows for exploring the quality of a ranking with re-        words, OptP erm(V ) represents the set of the optimal rank-
spect to an optimal solution.                                      ings for a given search result. It is worth noting that each
   On top of the proposed model, we have built a running           vector in OptP erm(V ) is composed by k + 1 intervals of
prototype where the experimental results and data are stored       documents sharing the same GT values. As an example, as-
in a dedicated system accessible via standard Web services.        suming a result vector composed by 12 elements and k = 3,
This allows for the design and development of various client       a possible sequence of GT values of an optimal vector OV
applications and tools for exploiting the managed data. In         is <3,3,3,3,2,2,2,2,1,1,0,0>; according to this we define the
particular, in this paper, we have started to explore the pos-     max index(V, r) and min index(V, r) functions, with 0 
sibility of adopting the Apple iPad3 as an appropriate device      r  k, that return the greatest and the lowest indexes of el-
to allow users to easily and naturally interact with the ex-       ements in a vector belonging to OptP erm(V ) that share the
perimental data and we have developed an initial prototype         same GT value r. As an example, considering the above 12
that allows us for interactively inspecting the actual experi-     GT values, min index(V, 2) = 5 and max index(V, 2) = 8.
mental data in order to get insights about the behaviour of           Using the above definitions we can define the relative posi-
a IR system.                                                       tion R P os(V [i]) function for each document in V as follows:
   Overall, the proposed model, the proposed visualization          (
techniques, and the implemented prototype meet all the (a-            0, if min index(V, GT (V [i])  i  max index(V, GT (V [i])
c) requirements for the two scenarios introduced above.               min index(V, GT (V [i]) i, if i < min index(V, GT (V [i])
   The paper is organized as follows. Section 2 introduces the        max index(V, GT (V [i]) i, if i > max index(V, GT (V [i])
model underlying the system together with its visualization
                                                                      R P os(V [i]) allows for pointing out misplaced elements
techniques; Section 3 describes the interaction strategies of
                                                                   and understanding how much they are misplaced: 0 values
the system, Section 4 describes the overall architecture of
                                                                   denote documents that are within the optimal interval, nega-
the system, and Section 5 concludes the paper, pointing out
                                                                   tive and positive values denote elements that are respectively
ongoing research activities.
                                                                   below and above the optimal interval. The absolute value
                                                                   of R P os(V [i]) gives the minimum distance of a misplaced
2.      THE PROTOTYPE                                              element from its optimal interval.
     According to [7] we model the retrieval results as a ranked      According to the actual relevance and rank position, the
                                                                   same value of R P os(V [i]) can produce di↵erent variations
3
    http://www.apple.com/ipad/                                     of the DCG function. We measure the contributions of mis-
                                         Figure 1: The iPad prototype interface.


placed elements with the function       Gain(V, i) that com-     position. Similarly, the     Gain vector codes the function
pares 8i the actual values of DF (V [i]) with the correspond-    using colors: light blue refers to positive values, light red
ing values in OV , DF (OV [i]):    Gain(V, i) = DF (V [i])       codes negative values, and green 0 values. Moreover, if the
DF (OV [i]).                                                     user touches a specific area of the R P os vector (that is sim-
                                                                 ulated by the gray round in Figure 1), the main results list
                                                                 automatically scrolls back, providing the end user with a de-
3.     INTERACTION                                               tailed view on the corresponding documents. The rightmost
   A multi-touch prototype interface based on the model pre-     part of the screen shows the DCG graphs of the ideal, the
sented in section 2 has been designed for the iPad device. It    optimal and the experiment vector, i.e. the ranking curves.
has been developed and tested on the iOS 4.24 with the inte-     The navigation bar displays a back button on the right which
gration of the Core Plot5 plotting framework for the graph-      let the user visualize the results for a di↵erent topic.
ical visualization of data. The interface allows the end user
for comparing the curve of the ranked results, for a given
experiment/topic, with the optimal one and with the ideal        4.     ARCHITECTURE
one. This facilitates the activities of failure analysis, eas-      The design of the architecture of the system benefits from
ily locating misplaced elements, blue or red items, that pop     what has been learned in ten years of work for the CLEF and
up from the visualization together with the extent of their      in the design and implementation of Distributed Information
displacement and the impact they have on DCG.                    Retrieval Evaluation Campaign Tool (DIRECT), the system
   Figure 1 shows a screenshot of the current interface: the     developed in CLEF since 2005 to manage all the aspects of
main list on the left represents the top n = 200 ranked result   an evaluation campaign [2, 3].
for a given experiment/topic and it can be easily scrolled by       The approach to the architecture is the implementation
the user. Each row corresponds to a document ID, a short         of a modular design, as sketched in Figure 2, with the aim
snippet of the content is included in the subtitle of each       to clearly separate the logic entailed by the application into
cell and more information on a specific result (i.e. relevance   three levels of abstraction – data, application, and interface
score, DCG, R P os,       Gain) can be viewed by touching the    logic – able to reciprocally communicate, easily extensible
row. On the right side there are two coloured vectors which      and implementable using modular and reusable components.
show the R P os and        Gain functions. The R P os vec-       The Data Logic layer, depicted at the bottom of Figure 2,
tor presents the results using di↵erent color shadings: light    deals with the persistence of the information coming from
green, light red and light blue respectively for documents       the other layers. From the implementation point of view,
that are within, below and above the optimal interval. It        data stored into databases and indexes are mapped to re-
allows for locating misplaced documents and, thanks to the       sources and communicate with the upper levels through the
shading, understanding how they are far from the optimal         mechanism granted by the Data Access Object (DAO) pat-
4
                                                                 tern6 — see point (1) in Figure 2. The Application Logic
    http://developer.apple.com/
5                                                                6
    http://code.google.com/p/core-plot/                              http://java.sun.com/blueprints/corej2eepatterns/
                                                                                                                                                                       Acknowledgements
                                                                                                                                                                       The work reported in this paper has been partially sup-
                                                                                                                                              atio
                                                                                                                                                   n                   ported by the PROMISE network of excellence (contract
                                                                                                                                         plic
                                                                                                                                       Ap and e
                                                                      5
                                                                                                                                               rfa
                                                                                                                                                   c                   n. 258191), as a part of the 7th Framework Program of the
                                                                                                                                         Inte ogic
                                                                                                                                             L
                                                                                                                                                                       European commission (FP7/2007-2013).

                                                                                                                                                                       6.   REFERENCES
                                                                               4
                                                                                                                                                                        [1] N. Ferro, A. Sabetta, G. Santucci, G. Tino, and F.
                     Acc
                               ess
                                                                                                                                                                            Veltri. Visual comparison of ranked result cumulated
                                      Con                                                                                                          n
                                         trol                                                                                                  atio                         gains. In Proc. of EuroVA 2011. Eurographics, 2011.
                RE                                                                                                                        plic
                             STfu
                                 lW                                                                                                     Ap Logic
                                   eb
                                                Serv
                                                       ice
                                                                               3
                                                                                                                                                                        [2] M. Agosti, G. Di Nunzio, M. Dussin, and N. Ferro. 10
                                                                                                             6
                                                                                                                                                                            Years of CLEF Data in DIRECT: Where We Are and
              Resource




                                                                               2
                                                                                                                                                                            Where We Can Go. In Proc. of EVIA 2010, pages
                                                                                                          Logging Infrastructure
                                 Resource




                                                                                                                                                                            16–24. Tokyo, Japan, 2010.
                                                       Resource




                                                                                                                                                                        [3] M. Agosti and N. Ferro. Towards an Evaluation
                                                                                                                                          Da
                                                                                                                                               ta L
                                                                                                                                                   ogic                     Infrastructure for DL Performance Evaluation. In
              Resource DAO




                                                                                                                                                                            Evaluation of Digital Libraries: An Insight to Useful
                                 Resource DAO




                                                                                                                                                                            Applications and Methods. Chandos Publishing,
                                                       Resource DAO




                                                                                                                                           tab
                                                                                                                                               ase
                                                                                                                                                   s                        Oxford, UK, 2009.
                                                                                                                                         Da and s
                                                                                                                                              d exe                     [4] S. K. Card and J. Mackinlay. The structure of the
                                                                                                                                            In
               1

                                                                                                                                                                 ion
                                                                                                                                                                            information visualization design space. In Proc. of
                                                                                                                                                       r   act
                                                                                                                                                   bst                      InfoVis ’97, pages 92–99, Washington, DC, USA,
                                                                                                                                              nA
                                                                                                                                          atio
                                                                                                                                     plic
                                                                                                           n
                                                                                                                                   Ap                                       1997. IEEE Computer Society.
                                                                                                       atio
                                                                                                    ent
                                                                                     n Im
                                                                                            ple
                                                                                                m
                                                                                                                                                                        [5] M. Derthick, M. G. Christel, A. G. Hauptmann, and
                                                                                 atio
                                                                            plic                                                                                            H. D. Wactlar. Constant density displays using
                                                                          Ap

                                                                                                                                                                            diversity sampling. In Proc. of the IEEE Information
                                                                                                                                                                            Visualization, pages 137–144, 2003.
     Figure 2: The Architecture of the Application.
                                                                                                                                                                        [6] R. T. Fielding and R. N. Taylor. Principled design of
                                                                                                                                                                            the modern web architecture. ACM TOIT, 2:115–150,
layer is in charge of the high-level tasks made by the sys-                                                                                                                 2002.
tem, such as the enrichment of raw data, the calculation                                                                                                                [7] K. Järvelin and J. Kekäläinen. Cumulated Gain-Based
of metrics and the carrying out of statistical analyses on                                                                                                                  Evaluation of IR Techniques. ACM TOIS,
experiments. These resources (2) are therefore accessible                                                                                                                   20(4):422–446, October 2002.
via HTTP through a RESTful Web service [6], sketched at                                                                                                                 [8] D. Keim, G. Andrienko, J.-D. Fekete, C. Görg,
point (3). After the validation of credentials and permissions                                                                                                              J. Kohlhammer, and G. Melançon. Information
made by the access control mechanism (4), it is possible for                                                                                                                visualization. chapter Visual Analytics: Definition,
remote devices such as web browsers or custom clients (5)                                                                                                                   Process, and Challenges, pages 154–175.
to create, modify, or delete resources attaching their rep-                                                                                                                 Springer-Verlag, Berlin, Heidelberg, 2008.
resentation in XML7 or JSON8 format to the body of an                                                                                                                   [9] D. Keim, J. Kohlhammer, G. Santucci, F. Mansmann,
HTTP request, and to read them as response of specific                                                                                                                      F. Wanner, and M. Schäfer. Visual Analytics
queries. A logging infrastructure (6) grants the tracking of                                                                                                                Challenges. In Proc. of eChallenges 2009, 2009.
all the activities made at each layer and can be used to ob-                                                                                                           [10] D. A. Keim, F. Mansmann, J. Schneidewind, and
tain information about the provenance of all the managed                                                                                                                    H. Ziegler. Challenges in visual data analysis. In Proc.
resources.                                                                                                                                                                  of IV’06, pages 9–16, 2006.
                                                                                                                                                                       [11] H. Keskustalo, K. Järvelin, A. Pirkola, and
5.     CONCLUSIONS                                                                                                                                                          J. Kekäläinen. Intuition-Supporting Visualization of
   We have presented a model and a prototype which allow                                                                                                                    User’s Performance Based on Explicit Negative
users to easily interact with the experimental results and to                                                                                                               Higher-Order Relevance. In Proc. of SIGIR ’08, pages
work together in a cooperative way while actually accessing                                                                                                                 675–681. ACM Press, NY, USA, 2008.
the data. This first step uncovers new and interesting pos-                                                                                                            [12] J. Seo and B. Shneiderman. A rank-by-feature
sibilities for the experimental evaluation and for the way in                                                                                                               framework for interactive exploration of
which researchers and developers usually carry out such ac-                                                                                                                 multidimensional data. In Proc. of the IEEE
tivities. For example, the proposed techniques may alleviate                                                                                                                Information Visualization, pages 65–72, 2004.
the burden of certain tasks, such as failure analysis, which                                                                                                           [13] B. Shneiderman. The eyes have it: a task by data type
are often overlooked due to their demanding nature, thus                                                                                                                    taxonomy for information visualizations. In Proc. of
making easier and more common to perform them and, as a                                                                                                                     the 1996 IEEE Symposium on Visual Languages,
consequence, improving the overall comprehension of system                                                                                                                  pages 336 –343, 1996.
behaviour. This will be explored in the future work.                                                                                                                   [14] J. J. Thomas and K. A. Cook. A visual analytics
Patterns/DataAccessObject.html                                                                                                                                              agenda. IEEE Computer Graphics and Applications,
7
  http://www.w3.org/XML/                                                                                                                                                    26:10–13, 2006.
8
  http://www.ietf.org/rfc/rfc4627.txt
                               A Taxonomy of Enterprise Search
         Tony Russell-Rose                                     Joe Lamantia                                  Mark Burrell
               UXLabs Ltd.                                         Endeca                                      Endeca
                 London                                          101 Main St.                                101 Main St.
                   UK                                          Cambridge, USA                              Cambridge, USA
             +44 7779 936191                                   +1 617 674 6000                             +1 617 674 6000
           tgr@uxlabs.co.uk                            jlamantia@endeca.com                          mburrell@endeca.com



ABSTRACT                                                                 problem solving strategies and tactics that information seekers
                                                                         employ over extended periods of time (e.g. Kuhlthau, 1991).
Classic IR (information retrieval) is predicated on the notion of
                                                                         In this paper, we examine the needs and behaviours of varied
users searching for information in order to satisfy a particular
                                                                         individuals across a range of search and discovery scenarios
“information need”. However, it is now accepted that much of
                                                                         within various types of enterprise. These are based on an analysis
what we recognize as search behaviour is often not informational
                                                                         of the scenarios derived from numerous engagements involving
per se. For example, Broder (2002) has shown that the need
                                                                         the development of search and business intelligence solutions
underlying a given web search could in fact be navigational (e.g.
                                                                         utilizing the Endeca Latitude software platform. In so doing, we
to find a particular site or known item) or transactional (e.g. to
                                                                         extend the classic IR concept of information-seeking to a broader
find a sites through which the user can transact, e.g. through
                                                                         notion of discovery-oriented problem solving, accommodating the
online shopping, social media, etc.). Similarly, Rose & Levinson
                                                                         much wider range of behaviours required to fulfil the typical goals
(2004) have identified consumption of online resources as a
                                                                         and objectives of enterprise knowledge workers.
further category of search behaviour and query intent.
                                                                         Our approach to enterprise discovery is an activity-centred model
In this paper, we extend this work to the enterprise context,            inspired by Don Norman’s Activity Centred Design, which
examining the needs and behaviours of individuals across a range         “organizes according to usage” whereas “...traditional human
of search and discovery scenarios within various types of                centred design organizes according to topic, in isolation, outside
enterprise. We present an initial taxonomy of “discovery modes”,         the context of real, everyday use.” (Norman 2006). This approach
and discuss some initial implications for the design of more             is an extension of previous activity-centred modelling efforts
effective search and discovery platforms and tools.                      which focused on a “captur[ing] a systematic and holistic view of
                                                                         what users need to accomplish when undertaking information
Categories and Subject Descriptors                                       retrieval tasks more complex than searching” (Lamantia 2006),
H.3.3 [I nfor mation Sear ch and Retr ieval]: Search process;            employing Grounded Theory to provide methodological structure
H.3.5 [Online I nfor mation Ser vices]: Web-based services               (Glaser 1967).
General Terms                                                            In this context, we present an alternative model focused on
Human Factors.                                                           information discovery rather than information seeking per se,
                                                                         which has at its core an initial taxonomy of the “modes of
                                                                         discovery” that knowledge workers employ to satisfy their
Keywords                                                                 information search and discovery goals. We then discuss some
Enterprise search, information seeking, user behaviour,                  initial implications of this model for the design of more effective
knowledge workers, search modes, information discovery, user             search and discovery platforms and tools.
experience design.
                                                                         2. INFORMATION RETRIEVAL MODELS
1. INTRODUCTION                                                          The classic model of IR assumes an interaction cycle consisting of
To design better search and discovery experiences we must                four main activities: the identification an information need, the
understand the complexities of the human-information seeking             specification of an appropriate query, the examination of retrieval
process. Numerous theoretical frameworks have been proposed to           results, and reformulation (where necessary) of the original query.
characterize this complex process, notably the standard model            This cycle is then repeated until a suitable result set is found
(Sutcliffe & Ennis 1998), the cognitive model (Norman 1998) and          (Salton 1989).
the dynamic model (Bates, 1989). In addition, others have
                                                                         In both the above models, the user’s information need is assumed
investigated search as a strategic process, examining the various
                                                                         to be static. However, it is now acknowledged that information
                                                                         seekers’ needs often change as they interact with a search system.
 Copyright © 2011 for the individual papers by the papers' authors.      In recognition of this, alternative models of information seeking
 Copying permitted only for private and academic purposes. This volume   have been proposed. For example, Bates (1989) proposed the
 is published and copyrighted by the editors of euroHCIR2011.            dynamic “berry-picking” model of information seeking, in which
                                                                         the information need (and consequently the query) changes
                                                                         throughout the search process This model also recognises that
                                                                         information needs are not satisfied by a single, final result set, but
by the aggregation of results, insights and interactions along the       There are however some guiding principles that we can apply to
way.                                                                     facilitate convergence on a stable set. For example, an ideal set of
Bates’ work is particularly interesting as it explores the               modes would exhibit properties such as: Consistency (they
connections between the dynamic model and the search strategies          represent approximately the same level of abstraction);
and tactics that professional information-seekers employ. In             Orthogonality (they operate independently to each other); and
particular, Bates identifies a set of 29 individual tactics, organised   Comprehensiveness (they address the full range of discovery
into four broad categories (Bates, 1979). Likewise, O’Day &              scenarios).
Jeffries (1993) examined the use of information search results by        The initial set of discovery modes to emerge from this analysis
clients of professional information intermediaries and identified        consists of a set of nine, arranged into three top-level categories
three distinct “search modes” or major categories of search              consistent with those of Marchionini (2005). The nine modes are
behaviour: (1) Monitoring a known topic or set of variables over         as follows, each shown with a brief definition:
time; (2) Following a specific plan for information gathering; (3)
Exploring a topic in an undirected fashion.                              1. Lookup
O’Day and Jeffries also observed that a given search would often         1a. Locating: To find a specific (possibly known) item; 1b.
evolve over time into a series of interconnected searches,               Verifying: To confirm or substantiate that an item or set of items
delimited by certain triggers and stop conditions that indicate the      meets some specific criterion; 1c. Monitoring: To maintain
transitions between modes or individual searches executed as part        awareness of the status of an item or data set for purposes of
of an overall enquiry or scenario. Moreover, O’Day & Jeffries            management or control.
also attempted to characterise the analysis techniques employed
by the clients in interpreting the search results, identifying the       2. Learn
following six primary categories: (1) Looking for trends or              2a. Comparing: To examine two or more items to identify
correlations; (2) Making comparisons; (3) Experimenting with             similarities & differences; 2b. Comprehending: To generate
different aggregations/scaling; (4) Identifying critical subsets; (5)    insight by understanding the nature or meaning of an item or data
Making assessments; (6) Interpreting data to find meaning.               set; 2c. Exploring: To proactively investigate or examine an item
More recent investigations into the relationship between                 or data set for the purpose of serendipitous knowledge discovery.
information needs and search activities include that of
                                                                         3. Investigate
Marchionini (2005), who identifies three major categories of
search activity, namely “Lookup”, “Learn” and “Investigate”.             3a. Analyzing: To critically examine the detail of an item or data
                                                                         set to identify patterns & relationships; 3b. Evaluating: To use
3. A TAXONOMY OF ENTERPRISE                                              judgment to determine the significance or value of an item or data
                                                                         set with respect to a specific benchmark or model; Synthesizing:
SEACH AND DISCOVERY                                                      To generate or communicate insight by integrating diverse inputs
The primary source of data in this study is a set of user scenarios      to create a novel artefact or composite view.
captured during numerous engagements involving the
development of search and business intelligence solutions                Evidently, the output of this process has been optimized for the
utilizing the Endeca Latitude software platform. These scenarios         current data set and in that respect represents an initial
take the form of a simple narrative that illustrates the user’s end      interpretation that will need to evolve further. For example,
goal and the primary task or action they take to complete it,            “monitoring” may appear to be a lookup activity when considered
followed by a brief description of their job function or role, for       in the context of a simple alert message, but when viewed as a
example:                                                                 strategic activity performed by an executive in the context of an
                                                                         organisational dashboard, a much greater degree of interaction
     x    “I need to understand a portfolio’s exposures to assess        and complexity is implied. Conversely, “exploring” is a concept
          portfolio-level investment mix” (Portfolio Manager)            whose level of abstraction may prove somewhat higher than the
                                                                         others, thus breaking the consistency principle suggested above.
     x    “I need to understand the quality performance of a part
          and module set in manufacturing and the field so that I        However, the true value of the modes will be realised not by their
          can determine if I should replace that part”                   conceptual purity or elegance but by their utility as a design
          (Engineering)                                                  resource. In this respect, they should be judged by the extent to
                                                                         which they facilitate the design process in capturing important
These scenarios were manually analyzed to identify themes or
                                                                         characteristics common to enterprise search and discovery
modes that appeared consistently throughout the set. For example,
                                                                         experiences, whilst flexibly accommodating arbitrary variations in
in each of the scenarios above there is an articulation of the need
                                                                         domain, information resources, etc.
to develop an understanding or comprehension of some aspect of
the data, implying that “comprehending” may constitute one such
discovery mode. Inevitably, this analysis process was somewhat           4. MODE SEQUENCES AND PATTERNS
iterative and subjective, echoing the observations made by Bates         A further interesting observation arising from the above analysis
(1979) in the identification of her search tactics: “While our goal      is that the mapping between scenarios and modes is not one-to–
over the long term may be a parsimonious few, highly effective           one. Instead, some scenarios are seen to involve a number of
tactics, our goal in the short term should be to uncover as many         modes, sometimes with a primary or dominant mode, and often
as we can, as being of potential assistance. Then we can test the        with an implied linear sequence. Moreover, certain sequences of
tactics and select the good ones. If we go for closure too soon,         modes tend to re-occur more frequently than others, forming
i.e., seek that parsimonious few prematurely, then we may miss           specific “mode chains” or patterns, analogous to higher-level
some valuable tactics.”                                                  syntactic units. These patterns provide a framework for
understanding the transitions between modes (echoing the triggers      scale independent, orthogonal, semantically distinct, conceptually
identified by O’Day & Jeffries), and allude to the existence of        connected, and flexibly sequenceable. Such a profile -- analogous
natural seams that can be used be used to provide further insight      to notes in the musical scale, or the words and phrases we
into information enterprise search and discovery behaviour.            assemble into sentences -- should allow the modes to serve as a
These mode chains echo the above-mentioned efforts to create           language for the design of variable scale activity-centered
goal-based information retrieval models, which yielded modes           discovery solutions through common constructive mechanisms
and a set of broadly applicable “information retrieval patterns that   such as concatenation, combination and nesting. And if the modes
describe the ways users combine and switch modes to meet goals:        do act as an elementary grammar for discovery, then sustained use
Each pattern is assembled from combinations of the same four           as a functional and interaction design language should result in
[elemental] modes” (Lamantia 2006).                                    the creation of larger and more complex units of meaning which
                                                                       offer cumulative value.
                                                                       Professional experience with employing the modes as both an
                                                                       analytical framework for understanding discovery needs and as a
                                                                       design grammar for the definition of discovery solutions suggests
                                                                       that both implications are valid. Further, our observations of
                                                                       using the modes suggest the existence of recognizable patterns in
                                                                       the design of discovery solutions. We will briefly discuss some of
                                                                       the patterns observed, doing so at three common levels of solution
                                                                       scale: on the level of a single functional or interface element, for
                                                                       whole screens or interfaces composed of multiple functional
                                                                       elements, and for applications comprising multiple screens.

                                                                       5.1 Single element patterns
                                                                       5.1.1 Comparison Views
                                                                       One of the most common design patterns is to support the need
               Figure 1. Discovery mode network                        for the Compare mode by creating A/B type comparison views
The five most frequent mode patterns are listed below. These have      that present two display panes - each containing data display
been assigned descriptive (if somewhat informal) labels to aid         charts or tables; or single items or groups of items - side by side to
their characterisation, along with the sequence of modes they          emphasize similarities and differences.
represent and an associated example scenario:
                                                                       5.1.2 Contextual Views
     1.   Comparison-driven optimization: (Analyze-Compare-            Another common design pattern supports the Analysis mode by
          Evaluate) e.g. “Replace a problematic part with an           allowing a fore-grounded view of a single chart, table, item, or
          equivalent or better part without compromising quality       list, accompanied by its contextual ‘halo’ - the full body of
          and cost”                                                    information available about the element such as status, origin,
     2.   Exploration-driven optimization: (Explore-Analyze-           format, relationships to other elements; annotations; etc.
          Evaluate) e.g. “Identify opportunities to optimize use of
          tooling capacity for my commodity/parts”                     5.2 Whole screen patterns
     3.   Strategic Insight (Analyze-Comprehend-Evaluate) e.g.         5.2.1 Dashboard
          “Understand a lead's underlying positions so that I can      One of the most common screen-level design patterns is to
          assess the quality of the investment opportunity”            support the Monitoring and Synthesis modes by presenting a
                                                                       collection of metrics which in aggregate provide the status of
     4.   Strategic Oversight (Monitor-Analyze-Evaluate) e.g.
                                                                       independent processes, groups, or progress versus goals in a
          “Monitor & assess commodity status against
                                                                       ‘dashboard’ style screen.
          strategy/plan/target”
     5.   Comparison-driven Synthesis (Analyze-Compare-                5.2.2 Visual Discovery Screen: 4-Dimensions
          Synthesize) e.g. “Analyze and understand consumer-           A second common screen-level design pattern for discovery
          customer-market trends to inform brand strategy &            experiences is the visual discovery screen, which supports modes
          communications plan”                                         such Exploration, Evaluation, and Verification by layering views
                                                                       that present visualizations of several dimensions of a single axis
Further insight may be derived by examining how the mode
                                                                       of focus such as a core process, organizational unit, or KPI. When
patterns combine across all the scenarios to the form of a “mode
                                                                       switching between layered views, the axis in focus remains the
network”, as shown in Figure 1. Evidently, some modes act as
                                                                       same, but the data and presentation in the dimensions adjusts to
“terminal” nodes, i.e. entry points or exit points to a discovery
                                                                       match the preferred discovery mode.
scenario. For example, Monitor and Explore feature only as entry
points at the initiation of a scenario, whilst Synthesize and
Evaluate feature only as exit points to a scenario.
                                                                       5.3 Application-level patterns
                                                                       5.3.1 Differentiated Application
5. DESIGN PRINCIPLES FOR SEARCH                                        The ‘Differentiated Application’ pattern assembles a collection of
AND DISCOVERY SOLUTIONS                                                individual screens whose distinct compositions and designs
The modes establish a ‘taskonomy’ or collection of defined             support individual discovery modes of Analysis, Comparison,
discovery activities which are structurally consistent, domain and     Evaluation and Monitoring in aggregate to address the ‘Strategic
                                                                       Oversight’ mode sequence. Application-level patterns often
address a spectrum of discovery needs for a group of users with        In addition, we have proposed an alternative model focused on
differing organizational responsibilities, such as management vs.      information discovery rather than information seeking which has
detailed analysis.                                                     at its core a taxonomy of “modes of discovery” that knowledge
                                                                       workers employ to satisfy their information search and discovery
6. DISCUSSION                                                          goals. We have also examined some of the initial implications of
The above analysis is predicated on the notion that the user           this model for the design of more effective search and discovery
scenarios provide a unique insight into the information needs of       platforms and tools.
enterprise knowledge workers. However, a number of caveats
                                                                       Suggestions for future work include further iterations on the
apply to both the data and the approach.
                                                                       “propose-classify-refine” cycle using independent data. This data
Firstly, the scenarios were originally generated to support the        should ideally be acquired based on a principled sampling strategy
development of a specific implementation rather than for the           that attempts where possible to address any biases introduced in
analysis above. Therefore, the principles governing their creation     the creation of the original scenarios. In addition, this process
may not faithfully reflect the true distribution or priority of        should be complemented by empirical research and observation of
information needs among the various end user populations.              knowledge workers in context to validate and refine the discovery
Secondly, the particular sample we selected for this study was         modes and triggers that give rise to the observed patterns of usage.
based on a number of pragmatic factors (including availability),
which may not faithfully represent the true distribution or priority   8. REFERENCES
among enterprise organizations. Thirdly, the data will inevitably      [1] Bates, Marcia J. 1979. "Information Search Tactics." Journal
contain some degree of subjectivity, particularly in cases where           of the American Society for Information Science 30: 205-214
scenarios were generated by proxy rather than with direct end-user     [2] Bates, Marcia J. 1989. "The Design of Browsing and
contact. Fourthly, the data will inevitably contain some degree of         Berrypicking Techniques for the Online Search Interface."
inconsistency in cases where scenarios were documented by                  Online Review 13: 407-424.
different individuals.
                                                                       [3] Broder, A. 2002. A taxonomy of web search, ACM SIGIR
We should also acknowledge a number of caveats concerning the              Forum, v.36 n.2, Fall 2002
process itself. In inductive work with foundations in qualitatively
centered frameworks such as Grounded Theory, it is expected that       [4] Kuhlthau, C. C. 1991. Inside the information search process:
a number of iterations of a “propose-classify-refine” cycle will be        Information seeking from the user's perspective. Journal of
required for the process to converge on a stable output (e.g. Rose         the American Society for Information Science, 42, 361-371.
& Levinson, 2004). In addition, those iterations should involve a      [5] Lamantia, J. 2006. “10 Information Retrieval Patterns”
variety of critical viewpoints, with the output tested and refined         JoeLamantia.com, http://www.joelamantia.com/information-
using a separate, independent sample on each iteration. Likewise,          architecture/10-information-retrieval-patterns
the process by which scenarios are classified would benefit from
                                                                       [6] Glaser, B. & Strauss, A. 1967. The Discovery of Grounded
further rigour: this is a critical part of the process and of course
                                                                           Theory: Strategies for Qualitative Research. New York:
relies on human judgement and inference, but that judgement
                                                                           Aldine de Gruyter.
needs to go beyond simple word matching and be consistently
applied to each scenario so that subtle distinctions in meaning and    [7] Marchionini, G. 2006. Exploratory search: from finding to
intent can be accurately identified and recorded.                          understanding. Commun. ACM 49(4): 41-46
That said, some interesting comparisons can already be made with       [8] Norman, Donald A. 1988. The psychology of everyday
the existing frameworks. For example, the first and third of the           things. New York, NY, US: Basic Books.
search modes suggested by O’Day and Jeffries have also been            [9] Donald A. Norman. 2006. Logic versus usage: the case for
identified as distinct discovery modes in our own study, and the           activity centered design. Interactions 13, 6
second (arguably) maps on to one or more of the mode chains
identified above. Likewise, the search results analysis techniques     [10] O'Day, V. and Jeffries, R. 1993. Orienteering in an
that O’Day & Jeffries identified also present some interesting              information landscape: how information seekers get from
parallels.                                                                  here to there. INTERCHI 1993: 438-445
                                                                       [11] Rose, D. and Levinson, D. 2004. Understanding user goals in
7. CONCLUSIONS AND FUTURE                                                   web search, Proceedings of the 13th international
DIRECTIONS                                                                  conference on World Wide Web, New York, NY, USA
To design better search and discovery experiences we must              [12] Salton, G. (1989). Automatic Text Processing: The
understand the complexities of the human-information seeking                Transformation, Analysis, and Retrieval of Information by
process. In this paper, we have examined the needs and                      Computer. Addison-Wesley, Reading, MA.
behaviours of varied individuals across a range of search and
                                                                       [13] A.G. Sutcliffe and M. Ennis. Towards a cognitive theory of
discovery scenarios within various types of enterprise. In so
                                                                            information retrieval. Interacting with Computers, 10:321–
doing, we have extended the classic IR concept of information-
                                                                            351, 1998.
seeking to a broader notion of discovery-oriented problem
solving, accommodating the much wider range of behaviours
required to fulfil the typical goals and objectives of enterprise
knowledge workers.
         Back to MARS: The unexplored possibilities in query
                        result visualization

                   Alfredo Ferreira                           Pedro B. Pascoal                   Manuel J. Fonseca
               INESC-ID/IST/TU Lisbon                      INESC-ID/IST/TU Lisbon               INESC-ID/IST/TU Lisbon
                  Lisbon, Portugal                            Lisbon, Portugal                     Lisboa, Portugal
            alfredo.ferreira@ist.utl.pt                        pmbp@ist.utl.pt                      mjf@inesc-id.pt


ABSTRACT                                                                    on visual queries. However, most existing solutions still face
A decade ago, Nakazato proposed 3D MARS, an immer-                          major drawbacks and challenges to be tackled. Among oth-
sive virtual reality environment for content-based image re-                ers, extensively identified in Datta’s survey [5], we high-
trieval. Even so, the idea of taking advantage of post-WIMP                 light two. First, queries rely mostly on meta-information,
interfaces for multimedia retrieval was no further explored                 often keyword-based. This means that, in a closer analysis,
for content-based retrieval. Considering the latest low-cost,               searches can be reduced to text information retrieval of mul-
o↵-the-shelf hardware for visualization and interaction, we                 timedia objects. Second, the result visualization follows the
believe that is time to explore immersive virtual environ-                  traditional paradigm, where the results are presented as a
ments for multimedia retrieval. In this paper we highlight                  list of items on a screen. These items are usually thumbnails,
the advantages of such approach, identifying possibilities                  but can be just filenames or metadata. Such methodology
and challenges. Focusing on a specific field, we introduce                  greatly hinders the interpretation of query results on collec-
a preliminary immersive virtual reality prototype for 3D ob-                tions of videos or 3D objects.
ject retrieval. However, the concepts behind this prototype
can be easily extended to the other media.                                  Notably, a decade ago, a new visualization system for content-
                                                                            based image retrieval(CBIR) was proposed by Nakazato and
Categories and Subject Descriptors                                          Huang from the University of Illinois. The 3DMARS [11]
H.3.3 [Information Storage and Retrieval]: Information                      was an immersive virtual reality (VR) environment to per-
Search and Retrieval; H.5.2 [Information Interfaces and                     form image retrieval. It worked on the NCSA CAVE [4]
Presentation]: User Interfaces—Interaction Styles, Input                    which provided fully immersive experience and later on desk-
Devices and Strategies                                                      top VR systems. However, despite this ground-breaking
                                                                            work and recent developments in the interaction domain,
                                                                            little advantages have been taken by the multimedia infor-
Keywords                                                                    mation retrieval community from immersive virtual environ-
Multimedia Information Retrieval, 3D Object Retrieval, Im-
                                                                            ments.
mersive Virtual Environment
                                                                            In this paper we bring up the work of Nakazato and Huang
1.      INTRODUCTION                                                        as a starting point to the exploration of new possibilities
Despite advances on multimedia information retrieval (MIR),                 for result visualization in multimedia information retrieval.
this field still on its infancy. Especially when compared to                With the spreading of stereoscopic viewing and last gener-
its textual counterpart. Actual textual search engines are                  ation interaction devices outside lab environment and into
maturely developed and its widespread use makes them fa-                    our everyday lives, we believe that in a short time users will
miliar to most users. The current scenario in MIR is quite                  expect richer results from multimedia search engines than
di↵erent. Indeed, existing content-based MIR solutions are                  just a list of thumbnails. Following this rationale, and de-
far from being largely used by the common user.                             spite it could be applied to any type of media, we will focus
                                                                            our approach on 3D object retrieval (3DOR).
A few exceptional systems were able to strive with relative
success, such as Retrievr1 , a search tool for Flickr2 based
1
    http://labs.systemone.at/retrievr/
                                                                            2.   TRADITIONAL 3DOR APPROACHES
2                                                                           The first and most noticeable 3D search engine, at least
    http://www.flickr.com/
                                                                            within researchers working on this area, is the Princeton
                                                                            3D Model Search Engine[8]. This remarkable work provide
                                                                            content-based retrieval of 3D models from a collection of
                                                                            more than 36000 objects. Four query specification options
                                                                            are available: text based; by example; by 2D sketch; and by
                                                                            3D sketch. The results of this queries are presented as an
                                                                            array of model thumbnails.
Copyright c 2011 for the individual papers by the papers’ authors. Copy-
ing permitted only for private and academic purposes. This volume is pub-
lished and copyrighted by the editors of euroHCIR2011.                      Additionally to queries by example and sketch-based queries,
                                                                            the FOX-MIIRE search engine[1] introduced the query by
photo. This was the first tool capable of retrieve a 3D
model from a photograph of a similar object. However, and
similarly to Princeton engine, the results are displayed as a
thumbnail list.

Outside the research field, Google 3D Warehouse3 of-
fers a text-based search engine for the common user. This
online repository contains a very large number of di↵erent
models, from monuments to cars and furniture, humans and
spaceships. However, searching for models in this collection
is limited by textual queries or, when models represent real
objects, by its georeference. On the other hand, the results
are displayed by model images in a list, with the opportunity
to manipulate a 3D view of a selected model.

Generally, the query specification and visualization of results
in commercial tools for 3D object retrieval, usually associ-
ated with 3D model online selling sites, did not di↵er much               Figure 1: The interface of 3D MARS.
from those presented above. The query is specified through
keywords or by example and results are presented as a list
of model thumbnails.                                              Generally, post-WIMP approaches abandoned the traditional
                                                                  mouse and keyboard combination, favouring devices with six
These traditional approaches to query specification and re-       degrees of freedom (DoF). Unlike traditional WIMP interac-
sult visualization do not take advantage of latest advances       tion style, where it is necessary to map the inputs from a 2D
of neither computer graphics or interaction paradigms. Cur-       interaction space to a 3D visualization space, six DoF de-
rent hardware and software are capable of handling mil-           vices allow straightforward direct mapping between device
lions of triangles per frame and generating complex e↵ects in     movements and rotations and corresponding e↵ects on the
real-time. Additionally, the growingly common use of new          three-dimensional space. This represents an huge leap to the
human-computer interaction (HCI) paradigms and devices            concept of direct manipulation, which, according to Shnei-
brought new possibilities for multi-modal systems.                derman [14], rapidly increments operations and allows the
                                                                  immediate visualization of e↵ects on an manipulated object.
                                                                  This helps making the interaction more comprehensible, pre-
3.   NEW PARADIGMS IN HCI                                         dictable and controllable.
The recent dissemination among common users of new HCI
paradigms and devices (e.g. Nintendo Wiimote4 or Mi-              Combining six DoF devices with stereoscopy, it is possible
crosoft Kinect5 ) brought new possibilities for multi-modal       to make a multi-modal immersive interaction with direct
systems. For decades, the “windows, icons, menus, pointing        and natural manipulation of objects shapes within virtual
device” (WIMP) interaction style prevailed outside the re-        environments. This may be experienced using immersive
search field, while post-WIMP interfaces were being devised       displays (e.g., HMDs, CAVEs) [7] or desktop [15].
and explored [16], but without major impact in everyday
use of computer systems.                                          Despite the growing interest around the application of this
                                                                  new paradigms in HCI, no relevant e↵orts were made to
Particularly, the use of gestures to interact with system has     explore the latest technological advances for multimedia in-
been part of the interface scene since the very early days. A     formation retrieval. Indeed, to the extent of our knowledge,
pioneering multimodal application was “Put-that-there” [2],       there has not been presented any research or new solution
by Bolt. In “Put-that-there”, the user commands simple            that take advantage of immersive virtual environments for
shapes on a large-screen graphics display surface. This ap-       information retrieval since Nakazato’s 3D MARS [11] .
proach combined gestures and voice commands to interact
with the system. However, just recently such interaction
paradigm have been introduced in o↵-the-shelf commodity           4.   3D MARS
products.                                                         The 3D MARS system demonstrates that the use of 3D vi-
                                                                  sualization in multimedia retrieval has two benefit. First,
Recent technological advances allowed development of low-         more content can be displayed at the same time without
cost, lightweight, easy to use systems. With limited re-          occluding one another. Second, by assigning di↵erent mean-
sources, novel and more natural HCI can be developed and          ings to each axis, the user can determine which features are
explored. For instance, Lee [10] used a Wiimote and took ad-      important as well as examine the query result with respect
vantage of its high resolution infra-red camera to implement      to three di↵erent criteria at the same time.
multipoint interactive whiteboard, finger tracking and head
tracking for desktop virtual reality displays. Post-WIMP          Nakazato focused his work on query result visualization.
finally arrived to the masses.                                    Thus 3D MARS supports only query-by-example mechanism
                                                                  to specify the search. The user select one image from a list
3
  http://sketchup.google.com/3dwarehouse/                         and the system retrieves and displays the most similar im-
4
  http://www.nintendo.com/wii/console/controllers                 ages from the image database in a 3D virtual space. The
5
  http://www.xbox.com/en-US/kinect                                image location on this space is determined by its distance
to the query image, where more similar images are closer to
the origin of the space. The distance in each coordinate axis
depend on a pre-defined set of features. The X-axis, Y-axis
and Z-axis represent color, texture and structure of images
respectively.

The interaction with the query results is done through a
wand that the user holds while freely walking around the
CAVE, as depicted in Figure 1. By wearing shutter glasses,
the user can see a stereoscopic view of the world, which
provides a full immersive experience. In such solution, vi-
sualizing query results goes far beyond scrolling on a list
of thumbnails. The user navigates among the results in a
three-dimensional space.

The 3D MARS was a catalyst for the incitement proposed
in this paper: explore immersive visualization systems for         Figure 2: User exploring query results in Im-O-Ret
multimedia information retrieval. Following that idea, we
devised an immersive 3D virtual reality system for the dis-
play of query results of queries for 3D object Retrieval.          even more the visualization since the user gains depth per-
                                                                   ception over the environment.

5.    IMMERSIVE 3DOR                                               The combined use of VE and devices with six DoF, provides
Taking advantage of the new paradigms in HCI, we pro-              a more complete visualization and makes interaction more
pose an immersive VR system for 3D object retrieval (Im-           natural, comprehensible and predictable. Their use, will also
O-Ret). The version of the system presented in this pa-            add some challenges to the implementation of such system.
per relies on a large-screen display, the LEMe Wall [6], and
the a six DoF interaction device, the SpacePoint Fusion, an
o↵-the-shelf device developed by PNI Sensor Corporation.
                                                                   5.2    Challenges
                                                                   While in traditional 3DOR systems the query results are
However, minimal e↵ort is required in order to have the sys-
                                                                   represented and ordered as a list of thumbnails ordered by
tem working in a context with HMD glasses or stereoscopic
                                                                   a given similarity measure, when we move to a virtual envi-
glasses, as well as using other input devices, such as Wiimote
                                                                   ronment, the distribution of results in a 3D space becomes a
or Kinect.
                                                                   challenge. How query results should arranged in 3D space to
                                                                   be meaningful to the user remains an open question. In our
Regardless of the hardware details, the Im-ORet allows the
                                                                   approach we select three shape descriptors and assigned each
user to browse the results of a query to collection of 3D ob-
                                                                   one to a coordinate axis, but this is a preliminary approach.
jects in an immersive virtual environment. The objects are
                                                                   We believe that a final solution is more complex that this.
distributed in the virtual 3D space according to their sim-
                                                                   Further investigation on this topic is clearly required.
ilarity. This is measured by the distance of each result to
the query, which stands in the origin of the coordinates. To
                                                                   On the other hand, the way users navigate and interact with
each of the three axis is assigned a di↵erent shape matching
                                                                   objects in an immersive environment and interact with it
algorithm. The similarity to the query returned by the cor-
                                                                   still an open issue. Norman[12] stated that gesturing is a
responding algorithm determines the coordinate. Current
                                                                   natural, automatic behaviour, but the unintended interpre-
version of Im-O-Ret uses the Lightfield Descriptors [3] on
                                                                   tations of gestures can create undesirable states. Having this
the X-axis, the Coord and Angle Histogram [13] for the Y-
                                                                   in mind, it is important to aim for an interface that is both
axis, the Spherical Harmonics Descriptor [9] for the Z-axis.
                                                                   predictable and easy to learn.
Figure 2 illustrates a user browsing the results of a query.
                                                                   Above all, an important challenge remains open. No easy
5.1    Possibilities                                               query specification mechanism has been presented, neither
Similar to the 3D MARS, this work opens a myriad of new            in traditional search engines, nor with new HCI paradigms.
possibilities. By assigning di↵erent shape matching algo-          Although sketch-based queries apparently provide good re-
rithms to each axis, one can adapt the query mechanism to          sults, they greatly depend on the ability of the user to draw a
specific domains, producing more precise results. Applying         3D model, which hinders the goal of a widely used, content-
transparency to results, it is possible to overlay results of      based, 3D search engine.
distinct queries. Adding e↵ects to results, such as glow or
special colors, it order to convey additional information.         6.    CONCLUSIONS
                                                                   We believe that recent advances in low-cost, post-WIMP en-
Since query results are not images or thumbnails, but three-       abler technology, can be seen as an opportunity to overcome
dimensional models, it is possible to navigate around them in      some drawbacks of current multimedia information retrieval
the virtual environment and even manipulate them. More-            solutions. Combined with the dissemination of stereoscopic
over, instead of a static view of the result, displaying it as a   visualization as a commodity, these interaction paradigms
3D object that can be rotating over one axis, o↵ers a better       will acquaint common users with immersive virtual reality
perception of the model. Adding stereoscopy will improve           environments.
In this paper we highlight that such scenario is a fertile            P. Otto, V. Petrovic, K. Ponto, A. Prudhomme,
ground to be explored by search engines for multimedia in-            R. Rao, L. Renambot, D. Sandin, J. Schulze, L. Smarr,
formation retrieval. In that context, we identified two major         M. Srinivasan, P. Weber, and G. Wickham. The future
research topics: query result visualization and query speci-          of the cave. Central European Journal of Engineering,
fication. While the latest requires further study, we already         1:16–37, 2011. 10.2478/s13531-010-0002-5.
started tackling the first one.                                   [8] T. Funkhouser, P. Min, M. Kazhdan, J. Chen,
                                                                      A. Halderman, D. Dobkin, and D. Jacobs. A search
We developed a novel visualization approach for 3D object             engine for 3d models. ACM Trans. Graph., 22:83–105,
retrieval. The Im-O-Ret o↵ers the users an immersive vir-             January 2003.
tual environment for browsing results of a query to a col-        [9] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz.
lection of 3D objects. The query results are displayed as             Rotation invariant spherical harmonic representation
3D models in a 3D space, instead of the traditional list of           of 3d shape descriptors. In Proceedings of the 2003
thumbnails. The user can explore the results, navigating in           Eurographics/ACM SIGGRAPH symposium on
that space and directly manipulating the objects.                     Geometry processing, SGP ’03, pages 156–164,
                                                                      Aire-la-Ville, Switzerland, Switzerland, 2003.
Looking back to 3D MARS, the initial work proposed by                 Eurographics Association.
Nakazaro, we realize it was a valid idea that fell almost into   [10] J. Lee. Hacking the nintendo wii remote. Pervasive
obliviousness. We expect that our preliminary work, which             Computing, IEEE, 7(3):39 –45, july-sept. 2008.
lies over concepts introduced by 3D MARS, could prove the        [11] T. S. H. Munehiro Nakazato. 3d mars: Immersive
goodness of our incitement to explore the possibilities of-           virtual reality for content-based image retrieval. In
fered by immersive virtual environments to the multimedia             Proceedings of 2001 IEEE International Conference on
information retrieval.                                                Multimedia and Expo (ICME2001), 2001.
                                                                 [12] D. A. Norman. Natural user interfaces are not natural.
7.   ACKNOWLEDGMENTS                                                  interactions, 17:6–10, May 2010.
The work described in this paper was partially supported         [13] E. Paquet and M. Rioux. Nefertiti: a query by content
by the Portuguese Foundation for Science and Technology               software for three-dimensional models databases
(FCT) through the project 3DORuS, reference PTDC/EIA-                 management. In NRC 97: Proceedings of the
EIA/102930/2008 and by the INESC-ID multiannual fund-                 International Conference on Recent Advances in 3-D
ing, through the PIDDAC Program funds.                                Digital Imaging and Modeling, page 345, Washington,
                                                                      DC, USA, 1997. IEEE Computer Society.
8.   REFERENCES                                                  [14] B. Shneiderman. Direct manipulation for
 [1] T. F. Ansary, J.-P. Vandeborre, and M. Daoudi.                   comprehensible, predictable and controllable user
     3d-model search engine from photos. In Proceedings of            interfaces. In Proceedings of the 2nd international
     the 6th ACM international conference on Image and                conference on Intelligent user interfaces, IUI ’97,
     video retrieval, CIVR ’07, pages 89–92, New York,                pages 33–39, New York, NY, USA, 1997. ACM.
     NY, USA, 2007. ACM.                                         [15] B. Sousa Santos, P. Dias, A. Pimentel, J.-W.
 [2] R. A. Bolt. Put-that-there: Voice and gesture at the             Baggerman, C. Ferreira, S. Silva, and J. Madeira.
     graphics interface. In Proceedings of the 7th annual             Head-mounted display versus desktop for 3d
     conference on Computer graphics and interactive                  navigation in virtual reality: a user study. Multimedia
     techniques, SIGGRAPH ’80, pages 262–270, New                     Tools Appl., 41:161–181, January 2009.
     York, NY, USA, 1980. ACM.                                   [16] A. van Dam. Post-wimp user interfaces. Commun.
 [3] D.-Y. Chen, X.-P. Tian, Y. te Shen, and                          ACM, 40:63–67, February 1997.
     M. Ouhyoung. On visual similarity based 3d model
     retrieval. volume 22 of EUROGRAPHICS 2003
     Proceedings, pages 223–232, 2003.
 [4] C. Cruz-Neira, D. J. Sandin, and T. A. DeFanti.
     Surround-screen projection-based virtual reality: the
     design and implementation of the cave. In Proceedings
     of the 20th annual conference on Computer graphics
     and interactive techniques, SIGGRAPH ’93, pages
     135–142, New York, NY, USA, 1993. ACM.
 [5] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image
     retrieval: Ideas, influences, and trends of the new age.
     ACM Comput. Surv., 40:5:1–5:60, May 2008.
 [6] B. R. de AraÞjo, T. Guerreiro, R. J. Costa, J. A. P.
     Jorge, and J. M. Pereira. Leme wall: Desenvolvendo
     um sistema de multi-projecção. 13ž Encontro
     PortuguÃls de ComputaÃğÃčo GrÃafica,
                                            ,     Vila Real,
     Portugal, 2005.
 [7] T. DeFanti, D. Acevedo, R. Ainsworth, M. Brown,
     S. Cutchin, G. Dawe, K.-U. Doerr, A. Johnson,
     C. Knox, R. Kooima, F. Kuester, J. Leigh, L. Long,
       The Mosaic Test: Benchmarking Colour-based Image
            Retrieval Systems Using Image Mosaics

                     William Plant                             Joanna Lumsden                          Ian T. Nabney
              School of Engineering and                    School of Engineering and             School of Engineering and
                  Applied Science                              Applied Science                       Applied Science
                  Aston University                             Aston University                      Aston University
                 Birmingham, U.K.                             Birmingham, U.K.                      Birmingham, U.K.


ABSTRACT                                                                     1.   INTRODUCTION
Evaluation and benchmarking in content-based image re-                       Colour-based image retrieval systems such as Chromatik [1],
trieval has always been a somewhat neglected research area,                  MultiColr [5] and Picitup [10] enable users to retrieve images
making it difficult to judge the efficacy of many presented                  from a database based on colour content alone. Such a facil-
approaches. In this paper we investigate the issue of bench-                 ity is particularly useful to users across a number of different
marking for colour-based image retrieval systems, which en-                  creative industries, such as graphic, interior and fashion de-
able users to retrieve images from a database based on low-                  sign [6, 7]. Surprisingly, however, little research appears to
level colour content alone. We argue that current image                      have been conducted into evaluating colour-based image re-
retrieval evaluation methods are not suited to benchmark-                    trieval systems. Currently, there is no standardised measure
ing colour-based image retrieval systems, due in main to                     and image database to evaluate the performance of an image
not allowing users to reflect upon the suitability of retrieved              retrieval system [8]. The most commonly applied evaluation
images within the context of a creative project and their                    methods are those of precision and recall [8] and the tar-
reliance on highly subjective ground-truths. As a solution                   get search and category search tasks [11]. The precision and
to these issues, the research presented here introduces the                  recall measure is used to evaluate the accuracy of image re-
Mosaic Test for evaluating colour-based image retrieval sys-                 sults returned by a system in response to a query, whilst the
tems, in which test-users are asked to create an image mosaic                target search and category search tasks are both user-based
of a predetermined target image, using the colour-based im-                  evaluation strategies in which test-users are asked to retrieve
age retrieval system that is being evaluated. We report on                   images from a database that are relevant to a given target,
our findings from a user study which suggests that the Mo-                   using the image retrieval system that is being evaluated.
saic Test overcomes the major drawbacks associated with ex-
isting image retrieval evaluation methods, by enabling users                 In this research, we argue that the image retrieval system
to reflect upon image selections and automatically measur-                   evaluation strategies listed above are not suitable for eval-
ing image relevance in a way that correlates with the percep-                uating and benchmarking colour-based image systems for
tion of many human assessors. We therefore propose that                      two fundamental reasons. Firstly, none of the above evalua-
the Mosaic Test be adopted as a standardised benchmark                       tion methods allow test-users to perform an important pro-
for evaluating and comparing colour-based image retrieval                    cess often conducted by creative users, known as reflection-
systems.                                                                     in-action [12]. In reflection-in-action, a creative project is
                                                                             modified by a user and then reviewed by the user after the
Categories and Subject Descriptors                                           modification. After assessing their modification, the creative
H.3.4 [Information Storage and Retrieval]: Systems                           individual will then decide whether to maintain or discard
and Software—Performance evaluation; H.2.8 [Database                         the modification to the project. As an example, a graphic
Management]: Database Applications—Image Databases                           designer will add an image to a web page before making an
                                                                             assessment as to its aesthetic suitability. Secondly, the cat-
                                                                             egory search and precision and recall measures require an
Keywords                                                                     image database and associated ground-truth (a manually
Image databases, content-based image retrieval, image mo-                    generated list pre-defining which images in the database are
saic, performance evaluation, benchmarking.                                  similar to others) for defining image relevance during a sys-
                                                                             tem evaluation. Such human-based definitions of similarity,
                                                                             however, can often be highly subjective resulting in retrieved
                                                                             images being incorrectly assessed as irrelevant.

                                                                             As a result of these drawbacks, no method currently exists
                                                                             for reliably evaluating colour-based image retrieval systems.
                                                                             The following section introduces the Mosaic Test which has
                                                                             been developed to address the current problem, providing
Copyright ! c 2011 for the individual papers by the papers’ authors. Copy-
                                                                             a reliable means for benchmarking colour-based image re-
ing permitted only for private and academic purposes. This volume is pub-
lished and copyrighted by the editors of euroHCIR2011.                       trieval systems.
2. THE MOSAIC TEST                                                to indicate their subjective experience of workload (using
For the Mosaic Test, participants are asked to manually cre-      the NASA TLX scales [2]) post test.
ate an image mosaic (comprising 16 cells) of a predetermined
target image. An image mosaic (first devised by Silvers [14])     The time (number of seconds), subjective workload (user
is a form of art that is typically generated automatically        NASA-TLX ratings) and relevance (image mosaic accuracy)
through use of content-based image analysis. A target im-         measures achieved by colour-based image retrieval systems
age is divided into cells, each of which is then replaced by a    evaluated using the Mosaic Test can be directly compared
small image with similar colour content to the correspond-        and used for benchmarking. When comparing the Mosaic
ing cell in the target image. Viewed from a distance, the         Test measures achieved by different systems, the more ef-
smaller images collectively appear to form the target image,      fective colour-based image retrieval system will be the one
whilst viewing an image mosaic close up reveals the detail        that enables users to create the most accurate image mo-
contained within each of the smaller images. An example of        saics, fastest and with the least workload.
an automatically generated image mosaic is shown in Fig-
ure 1.                                                            2.1   Mosaic Test Tool
                                                                  To support users in their manual creation of image mosaics
                                                                  using the Mosaic Test, we have developed a novel software
                                                                  tool in which an image mosaic of a predetermined target
                                                                  image can be created using simple drag and drop functions.
                                                                  We refer to this as the Mosaic Test Tool. The Mosaic Test
                                                                  Tool has been designed so that it can be displayed simul-
                                                                  taneously with the colour-based image retrieval system un-
                                                                  der evaluation (as can be seen in Figure 2). This removes
                                                                  the need for users to constantly switch between application
                                                                  windows, and permits users to easily drag images from the
                                                                  colour-based image retrieval system being tested to their im-
                                                                  age mosaic in the Mosaic Test Tool. It is important to note
Figure 1: An example of an image mosaic. The
                                                                  that the facility to export images through drag and drop
region highlighted green in the image mosaic (right)
                                                                  operations is the only requirement of a colour-based image
has been created using the images shown (left).
                                                                  retrieval system for it to be compatible with the Mosaic Test
                                                                  Tool and thus the Mosaic Test.
For target images in the Mosaic Test, photographs of jelly
beans are used. The images of jelly beans produce a bright,
interesting target image for participants to create in mosaic
form and the generation of an image mosaic that appears
visually similar to the target image is also very achievable.
More importantly, retrieving images from a database com-
prising large areas of a small number of distinct colours is a
practise commonly performed by users in creative industries.

To complete their image mosaics, participants must identify
the colours required to fill an image mosaic cell (by inspect-
ing the corresponding region in the target image), and re-
trieve a suitably coloured image from the 25,000 contained
within the MIRFLICKR-25000 image collection [4] using the
colour-based evaluation system under evaluation. When se-
lecting images for use in their image mosaic, users can add,
move or remove images accordingly to assess the suitability
of images within the context of their image mosaic. It is
in this way that the Mosaic Test overcomes the first ma-
jor drawback of existing evaluation methods, by enabling
participants to perform the creative practise of reflection-in-   Figure 2: The Mosaic Test Tool (left) and an image
action [12]. Upon completion of an image mosaic, the time         retrieval system under evaluation (right) during a
required by the user to finish the image mosaic is recorded,      Mosaic Test session.
along with the visual accuracy of their creation in com-
parison with the initial target image. Through analysing          The target image and image mosaic are displayed simulta-
the accuracy of user-generated image mosaics (in a manner         neously on the Mosaic Test Tool interface to allow users to
which correlates with the perception of a number of different     manually inspect and identify the colours (and colour lay-
human assessors), the Mosaic Test is able to overcome the         out) required for each image mosaic cell. As can be seen
second drawback associated with existing evaluation tech-         in Figure 2, the target image (the image the user is trying
niques. This is because it does not rely on a highly subjective   to replicate in the form of an image mosaic) is displayed in
image database ground-truth. The image mosaic accuracy            the top half of the Mosaic Test Tool. Coupled with the ease
measure adopted for use with the Mosaic Test is discussed         in which images can be added to, or removed from, image
further in Section 3.1. Additionally, participants are asked      mosaic cells, users of the Mosaic Test Tool can simply as-
sess the suitability of a retrieved image by dragging it to the       content-based image retrieval, to discover which best cor-
appropriate image mosaic cell and viewing it alongside the            relates with human perceptions of image mosaic distance.
other image mosaic cells.                                             To do this, we calculated the image mosaic distance rank-
                                                                      ings according to the existing measure and several colour
3. USER STUDY                                                         descriptors (and their associated distance measures), and
To evaluate the Mosaic Test, we recruited 24 users to par-            then calculated the Spearman’s rank correlation coefficient
ticipate in a user study. Participants were given written             between each of the tested distance measures and the rank-
instructions explaining the concept of an image mosaic and            ings assigned by the users in our study.
the functionality of the Mosaic Test Tool. A practise ses-
sion was undertaken by each participant, in which they were           For the image colour descriptors (and associated distance
asked to complete a practise image mosaic using a small se-           measures), we firstly tested the global colour histogram (GCH)
lection of suitable images. Participants were then asked to           as an image descriptor. A colour histogram contains a nor-
complete 3 image mosaics using 3 different colour-based im-           malised pixel count for each unique colour in the colour
age retrieval systems. To ensure that users did not simply            space. We used a 64-bin histogram, in which each of the red,
learn a set of database images suitable for use in a solitary         green and blue colour channels (in an RGB colour space)
image mosaic, 3 different target images were used. These              were quantised to 4 bins (4 x 4 x 4 = 64). We adopted
target images were carefully selected so that the number of           the Euclidean distance metric to compare the global colour
jelly beans (and thus colours) in each were evenly balanced,          histograms of the image mosaics and corresponding target
with only the colour and layout of the jelly beans varying            images. We also tested local colour histograms (LCH) as an
between the target images. To also ensure that results were           image descriptor. For this, 64-bin colour histograms were
not effected by a target image being more difficult to cre-           calculated for each image mosaic cell (for the image mosaic
ate in image mosaic form than another, the order in which             descriptor), and its corresponding area in the target image
the target images were presented to participants remained             (for the target image descriptor). The average Euclidean
constant whilst the order in which the colour-based image             distance between all of the corresponding colour histograms
retrieval systems were used was counter balanced. After               (in the image mosaic and target image LCH descriptors) was
completing the 3 image mosaics, participants were asked to            used to compare LCH descriptors. Finally, we tested (along
rank each of their creations in ascending order of ‘closeness’        with their associated distance measures) the MPEG-7 colour
to its corresponding target image.                                    structure (MPEG-7 CST) and colour layout (MPEG-7 CL)
                                                                      descriptors [13], as well as the auto colour correlogram de-
We wanted to investigate whether the Mosaic Test does over-           scriptor (ACC) [3].
come the drawbacks of existing evaluation strategies so that
it may be adopted as a reliable benchmark of colour-based             The auto colour-correlogram (ACC) of an image can be de-
image retrieval systems. Firstly, we hypothesised that users          scribed as a table indexed by colour pairs, where the k-th
in the study would perform reflection-in-action and so we             entry for colour i specifies the probability of finding another
wanted to observe whether this was indeed true for partici-           pixel of colour i in the image at a distance k. For the MPEG-
pants when judging the suitability of images retrieved from           7 colour structure descriptor (MPEG-7 CST), a sliding win-
the database. Secondly, we were eager to investigate which            dow (8 × 8 pixels in size) moves across the image in the
method should be adopted for measuring the accuracy of an             HMMD colour space [13] (reduced to 256 colours). With
image mosaic in the Mosaic Test.                                      each shift of the structuring element, if a pixel with colour i
                                                                      occurs within the block, the total number of occurrences in
                                                                      the image for colour i is incremented to form a colour his-
3.1 Assessing Image Mosaic Accuracy                                   togram. The distance between two MPEG-7 CSTs or two
As an image mosaic is an art form intended to be viewed               ACCs can be calculated using the L1 (or city-block) dis-
and enjoyed by humans, it seems logical that the adopted              tance metric. Finally, the MPEG-7 colour layout descriptor
measure of image mosaic accuracy - i.e., how close an image           (MPEG-7 CL) [13] divides an image into 64 regular blocks,
mosaic looks to its intended target image - should correlate          and calculates the dominant colour of the pixels within each
with the inter-image distance perceptions of a number of hu-          block [13]. The cumulative distance between the colours (in
man assessors. An existing measure for automatically com-             the Y Cb Cr colour space) of corresponding blocks forms the
puting the distance between an image mosaic and its corre-            measure of similarity between 2 MPEG-7 CL descriptors.
sponding target image is the Average Pixel-to-Pixel (APP)
distance [9]. The APP distance is expressed formally in                   Accuracy Measure            rs    Significant (5%)
Equation (1), where i is 1 of a total n corresponding pixels              MPEG-7 CST                0.572          YES
in the mosaic image M and target image T , and r, g and b                 APP                       0.275          NO
are the red, green and blue colour values of a pixel.                     GCH                       0.242          NO
                                                                          MPEG-7 CL                 0.198          NO
                                                                          LCH                       0.176          NO
           Pn     q
                     i − r i )2 + (g i − g i )2 + (bi − bi )2             ACC                       0.154          NO
            i=0    (rM    T         M     T         M    T
  AP P =                                                        (1)
                                   n                                  Table 1: The Spearman’s rank correlation coeffi-
                                                                      cients (rs ) between the image mosaic distance rank-
We were eager to compare the existing APP image mosaic                ings made by humans and the rankings generated
distance measure with a variety of image colour descrip-              by the tested colour descriptors.
tors (and associated distance measures) commonly used for
4. RESULTS                                                        ture descriptors from the user-generated image mosaics and
Table 1 shows the Spearman’s rank correlation coefficients        their corresponding target images, and calculating the L1
(rs ) calculated between the human-assigned rankings and          (or city-block) distance between them. As a result of our
each of the rankings generated by the tested colour descrip-      findings, we propose that the Mosaic Test be adopted in all
tors. We compare the rs correlation coefficient for each mea-     future research evaluating the effectiveness of colour-based
sure tested with the critical value of r, which at a 5% sig-      image retrieval systems. Future work will be to publicly re-
nificance level with 22 d.f. (24 − 2) equates to 0.423. Any       lease the Mosaic Test Tool and procedural documentation
rs value greater than this critical value can be considered a     for other researchers in the domain of content-based image
significant correlation at a 5% level.                            retrieval.

5. DISCUSSION                                                     7.   REFERENCES
We observed the actions taken by the participants of the user      [1] Exalead. Chromatik. Accessed December 1, 2010, at:
study when creating their image mosaics. It was clear that             http://chromatik.labs.exalead.com/.
the majority of users performed reflection-in-action when          [2] S. G. Hart. NASA-Task Load Index (NASA-TLX); 20
assessing the relevance (or suitability) of images retrieved           Years Later. In Proceedings of the Human Factors and
from the database for use in their image mosaics. As partic-           Ergonomics Society 50th Annual Meeting, pages
ipants of a Mosaic Test were able to perform this reflection-          904–908, 2006.
in-action [12], it is clear that the Mosaic Test also overcomes    [3] J. Huang, S. R. Kumar, M. Mitra, W. Zhu, and
the first of the two major drawbacks present in current im-            R. Zabih. Image Indexing Using Color Correlograms.
age retrieval evaluation methods. As shown in Table 1, the             In Computer Vision and Pattern Recognition, pages
MPEG-7 colour structure descriptor (MPEG-7 CST) was                    762–768, 1997.
the only colour descriptor (and associated distance measure)       [4] M. J. Huiskes and M. S. Lew. The MIR Flickr
we found to correlate with human perceptions of image mo-              Retrieval Evaluation. In ACM International
saic distance at the 5% significance level. Therefore, by mea-         Conference on Multimedia Information Retrieval,
suring the L1 (or city-block) distance between the MPEG-7              pages 39–43, 2008.
CSTs of the target image and user-generated image mosaics,         [5] idée Inc. idée MultiColr Search Lab. Accessed
the Mosaic Test can automatically calculate the relevance              November 2, 2010 at
of retrieved images in a manner that correlates with human             http://labs.ideeinc.com/multicolr.
perception, thus overcoming the second major drawback of
                                                                   [6] Imagekind Inc. Shop Art by Color. Accessed
existing image retrieval evaluation methods for benchmark-
                                                                       November 2, 2010, at:
ing colour-based image retrieval systems (the reliance on a
                                                                       http://www.imagekind.com/shop/ColorPicker.aspx.
highly subjective image database ground-truth).
                                                                   [7] T. K. Lau and I. King. Montage : An Image Database
                                                                       for the Fashion, Textile, and Clothing Industry in
6. CONCLUSION                                                          Hong Kong. In Third Asian Conference on Computer
Current image retrieval system evaluation methods have two             Vision, pages 410–417, 1998.
fundamental drawbacks that result in them being unsuit-
                                                                   [8] H. Müller, W. Müller, D. M. Squire,
able for evaluating and benchmarking colour-based image
                                                                       S. Marchand-Maillet, and T. Pun. Performance
retrieval systems. These evaluation strategies do not enable
                                                                       Evaluation in Content-Based Image Retrieval:
users to perform the practise of reflection-in-action [12], in
                                                                       Overview and Proposals. Pattern Recognition Letters,
which creative users assess project modifications within the
                                                                       22(5):593–601, 2001.
context of the creative piece he/she is working on. The
existing image retrieval system evaluation methods also rely       [9] S. Nakade and P. Karule. Mosaicture: Image Mosaic
heavily upon highly subjective image database ground-truths            Generating System Using CBIR Technique. In
                                                                       International Conference on Computational
when assessing the relevance of images selected by test users
                                                                       Intelligence and Multimedia Applications, pages
or returned by a system. As a result of these drawbacks, no
                                                                       339–343, 2007.
method currently exists for reliably evaluating and bench-
marking colour-based image retrieval systems. In this paper,      [10] Picitup. Picitup. Accessed January 21, 2011, at:
we have introduced the Mosaic Test which has been devel-               http://www.picitup.com/.
oped to address the current problem, by providing a reliable      [11] W. Plant and G. Schaefer. Evaluation and
means by which to evaluate colour-based image retrieval sys-           Benchmarking of Image Database Navigation Tools. In
tems.                                                                  International Conference on Image Processing,
                                                                       Computer Vision, and Pattern Recognition, pages
The findings of a user study reveal that the Mosaic Test               248–254, 2009.
overcomes the two major drawbacks associated with existing        [12] D. A. Schön. The Reflective Practitioner: How
evaluation method used in the research domain of image re-             Professionals Think in Action. Basic Books, 1983.
trieval. As well as also providing valuable effectiveness data    [13] T. Sikora. The MPEG-7 Visual Standard for Content
relating to efficiency and user workload, the Mosaic Test              Description - An Overview. IEEE Transactions on
enables participants to reflect on the relevance of retrieved          Circuits and Systems for Video Technology, 11(6),
images within the context of their image mosaic (i.e., per-            2001.
form reflection-in-action [12]). The Mosaic Test is also able     [14] R. Silvers. Photomosaics: Putting Pictures in their
to automatically measure the relevance of retrieved images             Place. Master’s thesis, Massachusetts Institute of
in a manner which correlates with the perceptions of mul-              Technology, 1996.
tiple human assessors, by computing MPEG-7 colour struc-
                         Evaluating the Cognitive Impact of
                       Search User Interface Design Decisions
                                                     Max L. Wilson
                                           Future Interaction Technology Labs
                                    Department of Computer Science, College of Science
                                                 Swansea University, UK
                                                m.l.wilson@swansea.ac.uk
ABSTRACT                                                                   highlighted options in unused filters that were related to
The design of search user interfaces has developed                         guide searchers [10]. Frequently, however, we informally
dramatically over the years, from simple keyword search                    noted that searchers spent increasing periods of time on
systems to complex combinations of faceted filters and                     visually comprehending the interface before making their
sorting mechanisms. These complicated interactions can                     first move. In follow up studies, we saw minimal
provide the searcher with a lot of power and control, but at               interaction with facets during the first visit, but recorded a
what cost? Our own work has seen users experience a sharp                  significant increase in the use of faceted features during
learning curve with faceted browsers, even before they                     subsequent return visits. It is the hypothesis of our
begin interacting. This paper describes a forthcoming                      forthcoming work that this non-use of such powerful
period of work that intends to investigate the cognitive                   features is caused by an increased cognitive load created by
impact of incrementally adding features to search user                     the associated increased complexity of the SUI. It is this
interfaces. We intend to produce search user interface                     cognitive impact that we believe can be measured and
design recommendations to help designers maximize                          attributed to specific design decisions.
support for searchers while minimizing cognitive impact.
                                                                           mSpace is one specific faceted browser, but the principle of
Author Keywords                                                            faceted browsing can be implemented in many different
Search, Exploratory Search, User Interface Design,                         ways [2]. We also hypothesize that not only the presence,
Cognitive Load Theory                                                      but also the subsequent design of SUI features can also
                                                                           have an impact. The following sections cover some related
ACM Classification Keywords                                                work before describing our plans to evaluate the cognitive
H5.2. Information interfaces and presentation (User                        impact that adding features to SUIs can have.
Interfaces): evaluation/methodology, screen design. H3.3.
Information search and retrieval: Search process.                          RELATED WORK
                                                                           SUI design is affected by many factors. Interaction
INTRODUCTION                                                               designers can decide how best to support searchers, but
User Interface (UI) Designers are always concerned with                    designs may be limited by the metadata that is available
supporting users effectively and intuitively, but a common                 about the possible results. Both the underlying data and the
recent focus for Search User Interface (SUI) designers has                 graphical design may also have an impact, then, on how the
been to increase the interactive power and control that                    chosen interaction will look and feel. As perhaps the most
searchers have over results. As a community, we want to                    recognized SUI for many users around the world, Google
support users in exploring, discovering, comparing, and                    has always maintained a very clean and clear white design1,
choosing results that meet their needs. SUI designers,                     and make very incremental careful design changes that stay
therefore, are concerned with maximizing the use of                        within that design. Competitor search engines have notably
powerful interface features while maintaining a clear and                  changed over the years, with many now being very similar
intuitive design.                                                          to Google in terms of interaction design, while trying to
                                                                           keep their own visual design consistent.
In our prior work, we developed mSpace [7] as a faceted
browser that lets searchers use combinations of orthogonal                 For more exploratory websites that sell a wide range of
metadata filters to narrow their search. We developed                      products, or provide large collections of information or
advanced interactions for faceted browsers that took                       documents, there are now many different features that
advantage of visual location within the SUI, and                           support people, from tabular or dropdown-based sorting


Copyright © 2011 for the individual papers by the papers' authors.         1
Copying permitted only for private and academic purposes. This volume is
                                                                                  http://searchengineland.com/qa-with-marissa-mayer-
published and copyrighted by the editors of euroHCIR2011.                  google-vp-search-products-user-experience-10370
mechanisms, to categories, clusters, filters, and facets.         where two systems provide the same support, one may be
Some websites that provide these features are frustrating         harder or easier to use because of its simple visual design.
and difficult to use, while others are simple, intuitive, and     Our conclusion is that to understand the success of a SUI,
successful. In these systems it is often the way that the ideal   we must analyse both the support in terms of functionality,
support has been developed that has affected their success.       and the cognitive impact is creates. Being able to
In a study of the success of different faceted browser            understand and predict these two things would help us to
implementations, Capra et al [1] directly compared two            design and build better SUIs
faceted browsers to a government website, all over the same
hierarchical government dataset, and discovered that the          EVALUATING THE SUPPORT PROVIDED BY SUIS
customized hierarchical design of the original website            Beyond the common practice of performing task-oriented
supported searchers far better than the functionally more         user studies, my own doctoral work focused on the design
powerful faceted browsers.                                        of an analytical evaluation metric for SUIs, called the
                                                                  Search Interface Inspector2 (Sii). Sii calculates the support
Both the choice of content and the visual design have both        for different types of users based upon the set of features in
been shown to have an impact on usability. White et al            the interface, and how many interactions they take to use
showed that the text that includes the search terms is best,      [9]. To analyse a SUI, the evaluator catalogues the features
and that highlighting these terms also improves search [12].      of the design and calculates how many interactions are
Similarly, Lin et al. have shown that simply highlighting         required to perform a set of known search tactics. The
the domain name in the URL bar significantly reduces the          method then interpolates the likely support for different
chances that users will be caught be fishing attacks [4].
                                                                  types of searchers (explorers or searchers that know what
Zheng et al [13] have also shown that users can make often-
                                                                  they are looking for, for example), based upon the types of
accurate snap judgments about the credibility of websites         tactics they are likely to perform. Sii can be used to
within half a second. Further, Wilson et al [10] noted that       compare several designs and produces a series of 3
the success of adding guiding highlights to their faceted         interactive graphs that allow evaluators perform an
browser was affected by the choice of highlight-colour and        investigative analysis of the results.
its implied meaning.
                                                                  Sii is based on detailed established information seeking
The choice of SUI features within a single implementation         theory and rewards the design of search functionality that
has also been shown to have an impact on search success.          has simple interaction. Consequently, however, Sii rewards
Diriye et al compared a keyword search interface with a           the addition of new simple functionality, without being able
revised version that also included query suggestions [3].         to estimate the increasing complexity of the SUI as new
Their results showed that such features slowed down
                                                                  features are added. To remedy this problem, a chapter of the
searchers who were performing simple lookup tasks, but
                                                                  thesis investigated Cognitive Load Theory and initially
supported those who were performing more complicated              specified a similar metric that calculated the cognitive load
exploratory tasks. Similarly, Wilson and Wilson have also         of a UI. This second measure of intrinsic cognitive load was
found early results indicating that the simple presence,          proposed for inclusion in Sii, estimated the intrinsic
without interaction, of a keyword cloud provides additional       cognitive load of a SUI. Similar to how the original metric
support, where subsequent interaction provides very little        was correlated with study results, one aim of the work
gain [11] during exploratory tasks. Wilson and Wilson’s           described below is to further refine and validate this
results suggest that searchers can learn more about the           analytical measure of the cognitive impact of SUIs.
result set from seeing the terms in the keyword cloud, than
actually using them to filter the results.                        Cognitive Load Theory highlights that capacity for learning
                                                                  is affected by three aspects: intrinsic, extrinsic, and
The location of features within a SUI has also been shown         germane cognitive load. Intrinsic cognitive load is created
to have an impact. Morgan and Wilson studied the visual
                                                                  by the materials providing the learning experience, or in our
layout of search thumbnails, predicting that having a rack of
                                                                  case the SUI. Extrinsic cognitive load is created by the
thumbnails at the top of the user interface would allow           complexity in the task at hand. Germane cognitive load is
searchers to make faster judgments when trying to re-find         then required to process what is learned and commit it to
pages [5]. Their results showed that a rack of thumbnails         long-term memory. If intrinsic load and extrinsic load are
was significantly more disruptive to searchers when the           too high, then there may not be enough space load left for
target page was not in the results, than the support it           germane cognitive load. Although, it is commonly accepted
provided when it was.                                             that effort can increase overall capacity, the aim should still
The studies above indicate that the success of SUIs can be        be to reduce intrinsic cognitive load by improving the
attributed to the appropriateness of the functionality            design of learning materials or SUIs [6]. Reducing intrinsic
provided, where unnecessary functionality can slow users          load creates space for users to perform increasingly
down. Further, the studies indicate that the success of SUIs
can be determined by simple visual or spatial changes that
do not necessarily impact functionality. Consequently,
                                                                  2
                                                                      http://mspace.fm/sii
complex tasks, or opens-up germane cognitive load so that       turn help us make hypotheses about design issues. This
what is being learned can be retained.                          phase will help us identify the cost of adding a feature,
                                                                where task success would allow us to measure their benefit.
EVALUATING THE COGNITIVE IMPACT OF SUIS
The general structure of the studies we are planning is to      Phase 2 – capturing impact in the context of tasks
use brain scanners to record the cognitive impact that          Where the first phase above allows us to learn to recognize
different SUIs have on a user. The initial phases will focus    the signs from EEG signals, we intend to try and detect
on identifying and measuring such responses to significant      cognitive load in situ, and in the context of a task. We will
and obvious differences, before trying to capture changes to    be setting participants specific simple and exploratory tasks,
more subtle designs and, hopefully, in-situ. Initially, we      whilst controlling the type of user interface features they
will be using EPOC Emotiv headsets3, as shown in Figure         see, to capture the cognitive impact as they start. This phase
1, to take readings. These headsets are commercialized          will help us identify whether the impact of a search user
versions of EEG scanners, but are designed for use in more      interface is affected by task context.
natural contexts. EEG scanners, as with many other brain
scanning systems, are typically affected by simple body         Phase 3 – the impact of different implementations
movements and so are often restricted to confined               While adding features creates an obvious change in the user
conditions. Such scanners, therefore, are often not suitable    interface, different features can be put in different places in
for task-based evaluations, which require action and            the SUI and also be implemented differently. Google, for
movement. In psychology, EEG scanners are typically used        example, puts suggested refinements at the bottom of the
in constrained environments where users are only allowed        page, while Bing has them on the side. Bing also chooses to
to move their thumbs to answer yes or no. Consequently,         provide a mix of refinements and alternative directions. In
this work requires scanners that can be used in more natural    Phase 2 we intend to analyse both of these kinds of
contexts while performing everyday searching tasks. In the      variables to see if they have significant impacts on
future, funding permitting, we also intend to buy an fNIR       cognitive load. This phase will help us identify whether the
scanner, which has been shown to be suitable for task-based     cost of adding SUI features can be minimized by refining
evaluation conditions [8]. We intend to use these               their design.
measurements to understand the impact of design decisions,
in order to make clear recommendations to SUI designers.        Discussion
                                                                There are many challenges remaining in this planned work.
                                                                So far, we have planned very controlled comparisons of
                                                                SUI changes, but in real life these systems are used in the
                                                                context of complex tasks and for extended periods of time.
                                                                Controlled situations will help identify cause and effect, but
                                                                other similar objective measurements, like eye trackers, still
                                                                require interpretation. We hope to expand on these
                                                                methods, and the findings of existing brain scanning HCI
                                                                research [8], by addressing this issue over time. Finally,
                                                                although this research is primarily interested in the
                                                                development of SUI interfaces and how they affect people
                                                                learning to use powerful search features, there are many
                                                                other things that can be distracting in general UI design.
                                                                These methods will likely expand to help address other
               Figure 1: EPOC Emotiv Headset
                                                                design questions; we, however, are particularly aiming to
                                                                answer questions about encouraging exploratory search and
Phase 1 – the impact of additional features
                                                                learning, by increasing the power of SUIs, while reducing
Beginning this summer, with two summer interns, we will
                                                                their impact on searchers.
be performing our first studies, which will simply display
SUIs of incremental complexity to participants. We will
                                                                CONCLUSIONS
begin with a simple keyword search design, and add
                                                                This work has yet to begin formally, but we intend to learn
features such as recommendations and filters. The order
                                                                more about the impact that very simple design decisions
that interfaces are shown to participants will be randomized
                                                                can have on searchers. From previous experience of
to avoid learning and familiarity bias. The aim of this phase
                                                                searcher success in evaluations, both industry and academia
is to prove that the learning curves experienced by users
                                                                know that such changes can seriously impact the success of
exist and the cognitive load can be measured objectively.
                                                                a search user interface. This work will use objective
We hope that the results will show initial insight into the
                                                                measurements of brain response to help us identify the
amount of impact that different features have, which may in
                                                                factors that make search user interfaces hard to
                                                                comprehend. We hope that such measurements will a) help
3
    http://www.emotiv.com/                                      us analyse the cost-benefit trade-off of adding additional
support to search user interfaces, and b) help us develop            multimodal exploratory search. Commun. ACM 49, 4
design recommendations for implementing search user                  (April 2006), 47-49.
interface features so that they have minimal impact.            8.   Erin Treacy Solovey, Audrey Girouard, Krysta
                                                                     Chauncey, Leanne M. Hirshfield, Angelo Sassaroli,
REFERENCES                                                           Feng Zheng, Sergio Fantini, and Robert J.K. Jacob.
1.   Robert Capra, Gary Marchionini, Jung Sun Oh, Fred               2009. Using fNIRS brain sensing in realistic HCI
     Stutzman, and Yan Zhang. 2007. Effects of structure             settings: experiments and guidelines. In Proc. UIST
     and interaction style on distinct search tasks. In Proc.        '09. ACM, New York, NY, USA, 157-166.
     JCDL '07. ACM, New York, NY, USA, 442-451.
                                                                9.   Max L. Wilson, M. C. schraefel, and Ryen W. White.
2.   Edward C. Clarkson, Shamkant B. Navathe, and                    2009. Evaluating advanced search interfaces using
     James D. Foley. 2009. Generalized formal models for             established information-seeking models. J. Am. Soc.
     faceted user interfaces. In Proc. JCDL '09. ACM, New            Inf. Sci. Technol. 60, 7 (July 2009), 1407-1422.
     York, NY, USA, 125-134.
                                                                10. Max L. Wilson, Paul André, and mc schraefel. 2008.
3.   Abdigani Diriye, Ann Blandford, and Anastasios                 Backward highlighting: enhancing faceted search.
     Tombros. 2010. Exploring the impact of search                  In Proc UIST '08. ACM, New York, NY, USA, 235-
     interface features on search tasks. In Proc. ECDL'10.          238
4.   Eric Lin, Saul Greenberg, Eileah Trotter, David Ma,        11. Wilson, M. J. and Wilson, M. L. Tag Clouds and
     John Aycock. Does Domain Highlighting Help People              Keyword Clouds: evaluating zero-interaction benefits.
     Identify Phishing Sites. In Proc. CHI2011 (in press).          In Ext. Abstract CHI’11.
5.   Rhys Morgan and Max L. Wilson. 2010. The Revisit           12. Ryen W. White, Ian Ruthven, and Joemon M. Jose.
     Rack: grouping web search thumbnails for optimal               2002. Finding relevant documents using top ranking
     visual recognition. In Proc. ASIS&T '10.                       sentences: an evaluation of two alternative schemes.
6.   Sharon Oviatt. 2006. Human-centered design meets               In Proc. SIGIR '02. ACM, New York, NY, USA, 57-
     cognitive load theory: designing interfaces that help          64.
     people think. In Proc. MULTIMEDIA'06. ACM, New             13. Xianjun Sam Zheng, Ishani Chakraborty, James Jeng-
     York, NY, USA, 871-880.                                        Weei Lin, and Robert Rauschenberger. 2009.
7.   m.c. schraefel, Max Wilson, Alistair Russell, and              Correlating low-level image statistics with users -
     Daniel A. Smith. 2006. mSpace: improving                       rapid aesthetic and affective judgments of web pages.
     information access to multimedia domains with                  In Proc. CHI '09. ACM, New York, NY, USA, 1-10.
  The potential of Recall and Precision as interface design
  parameters for information retrieval systems situated in
                   everyday environments
          Ayman Moghnieh                                                                                     Josep Blat
       Universitat Pompeu Fabra                                                                     Universitat Pompeu Fabra
      C/Tanger 122-140, E-08018                                                                    C/Tanger 122-140, E-08018
           Barcelona, Spain                                                                             Barcelona, Spain
    ayman.moghnie@upf.edu                                                                             josep.blat@upf.edu



ABSTRACT                                                                entrances, and public squares, represent new border zones that
In this paper, we investigate ways for a tighter integration of IR      maintain connectivity and mutual presence between the real and
and HCI in new urban contexts, as HCI expands its reach outside         the digital worlds, and actively sustain flows of useful or relevant
the workplace towards environments where efficiency and                 information towards nearby people who in-turn search, discover,
performance no longer constitute the backbone of interaction            and interact with the displayed information.
requirements. In particular, we propose to use Recall and               The human interaction with information via situated interfaces
Precision as design parameters to describe the information settings     creates new challenges for conventional information retrieval (IR)
and performance of situated interfaces acting as retrieval systems      systems: first, the relationship between people and digital
in these environments. To explore this notion, we follow an             information spaces becomes more explicit and the technology that
inductive design research process by which different prototypes         supports it more ubiquitous. Second, the human interaction with
are designed, developed, and evaluated. Our experience shows            information spaces adopts a more direct approach supported by
that Recall and Precision, as design parameters, help to reflect the    the coming of age of new interaction paradigms (e.g. touch,
information requirements onto the interface design, and contribute      gesture, speech) that emulate the manipulation of objects. Third,
to adapting IR to the contemporary challenges it faces, although        the information space hosted by a situated interface tends to be
more work is needed to consolidate its role vis-à-vis the growing       specialized in subjects and themes befitting the environment
ubiquity of computer technologies.                                      where the interface is situated, and the goals and interests of the
                                                                        people present in it. Fourth, the interaction properties may vary
Categories and Subject Descriptors                                      considerably in terms of interaction duration and the amount of
H.5.2 User Interfaces.                                                  user attention delegated to the situated interface [1].
                                                                        These challenges, among others [2], justify the search for a tighter
General Terms                                                           coupling of interface and interaction design, and IR systems, by
Design, Experimentation, Human Factors, Theory.                         which IR as a supporting technology for interacting with
                                                                        information contributes to making the interface design more
                                                                        transparent and the human-information interaction more fluid and
Keywords                                                                direct. Therefore, we reason that the performance of situated
Information Retrieval, Human-Information Interaction, Situated          interfaces as IR systems ought to be attuned according to the
Interfaces, Interface and Interaction Design                            nature of each specific interaction scenario, given that a
                                                                        maximization of IR performance, may not be adequate for
1. INTRODUCTION                                                         answering the interaction design requirements in all kinds of user
As computer technologies become more ubiquitous and versatile,          experiences with situated interfaces [5, 10]. Consequently, IR
and get further integrated in human environments, several genres        performance tilts towards becoming a design issue that determines
of situated information interfaces (e.g. interactive peripheral         some of the characteristics of situated interfaces that mediate this
displays, ambient displays, and interactive surfaces) are starting to   interaction.
assume a mediating role between people and digital information
                                                                        Currently, two metrics (Recall and Precision) are used to assess
spaces in different environments. From an HCI perspective, these
                                                                        the performance of IR systems in response to user queries [3].
situated interfaces, primarily found in public and semi-public          Recall is the fraction of retrieved information elements from the
environments such as malls, public transportation, building             entire existing set of elements that are relevant to the user query in
                                                                        the information space. Precision is the fraction of retrieved
 Copyright © 2011 for the individual papers by the papers’              elements found relevant with respect to the user query, over the
 authors. Copying permitted only for private and academic               entire set of retrieved elements. However, the query as a
                                                                        middleman between humans and information spaces goes against
 purposes. This volume is published and copyrighted by the
                                                                        the transparent design of situated interfaces that support a direct
 editors of euroHCIR2011.
                                                                        interaction with information spaces. In addition, the information
                                                                        spaces hosted by situated interfaces are usually predetermined or
                                                                        pre-queried in accordance with the specific interests of potential
users and the characteristics or nature of the environments where        Miller’s Law argues that the total number of different objects that
the hosting interfaces are situated. Instead of querying, the explicit   humans can simultaneously hold in their working memory is
momentarily needs of users are answered by direct interaction            approximately seven [4]. This affects the manner by which
with the visualized information. This superlatively converts the         information is perceived when the cardinality of the visualized set
relevance of the displayed information to the user interests from a      of objects increases. In particular, there is a natural observable
performance factor to a design issue.                                    tendency to perceptually cluster or group these objects recursively
                                                                         whenever the perceivable number exceed Miller’s threshold. To
Therefore, we argue that the definition of Recall and Precision can      observe this phenomenon, eight 10 minutes long think-aloud
be loosened or reinterpreted to respectively describe the quantity       sessions were organized with eight different university students
of retrieved information elements and their visual diversity as          that watched InformationCasserole showing magazine ads
displayed on the interface, since relevance is no longer a               progressively being added to the water container, and commented
performance factor from an HCI stance. These two metrics can             on how the number of ads shown in the casserole affects the way
consequently act as parameters that bind the design and                  they perceive the set of visualized ads.
performance of situated interfaces as retrieval systems to the
informational expectations of users, by controlling the amount and       We observed that when one object is shown, it tends to engage the
diversity of visualized information in order to maximize the             subjects in a prolonged and detailed examination. This changes
transparency of their designs to support a direct human-                 when two to seven objects are displayed since subjects become
information interaction.                                                 more interested in identifying relations among the objects and
                                                                         comparing them. The interest in object relations abates with a
In order to explore this idea further, we followed a line of             higher object number, and instead the relations among clusters or
inductive design research by conceptualizing, designing, and             collections of objects start to proportionally grab attention. When
evaluating experimental prototypes. We first introduce two sets of       the number of visualized objects crosses a certain threshold,
prototypes devised to understand how users perceive the quantity
                                                                         which we estimate at Miller’s number squared, the casserole
and visible diversity of information objects. We then define             becomes perceptually saturated and the subjects begin to treat the
parameterization scales for Recall and Precision based on these          set of ads as a space, reasoning about different regions in it. In
experiments. In order to develop a thoughtful understanding of           conclusion, we find that the quantity of visualized objects (R) is
how Recall and Precision, which we will consecutively refer to as        perceived in four different density thresholds, and to each we
R and P, can act as design parameters for situated interfaces, we        accord a parameter value: R=0 for visualizing no or a single
use them in the analysis, design, and evaluation of five different       object; R=1 for a single collection of seven or less objects; R=2
situated interfaces. Next, we investigate how these two parameters       for seven or less collections; and R=3 for single information space
can be dynamically controlled by users through the design of two
                                                                         or more than seven squared objects. This is reflected in figure 2.
interactive interfaces for searching and browsing news articles.
We conclude by assessing our experience and discuss the viability
and implications of our approach.

2. RECALL AND PRECISION FROM A
PERCEPTUAL STANCE

                                                                                        Figure 2. R as a design parameter
                                                                         In order to study the effects that the visible diversity of
                                                                         information objects (P) has on the manner by which people
                                                                         perceive information, eight paper-based prototypes similar to the
                                                                         InformationCasserole were conceived. Each prototype shows a
                                                                         combination of twelve to fifteen information objects from
                                                                         different genres (e.g. classified ads, news headlines, blog posts,
                                                                         news pictures, movie posters, youtube videos, secondhand goods,
                                                                         and city events). The object genre was emphasized and
                                                                         differentiated by aesthetic design. The visible object diversity
                                                                         encourages people to search for relations among visualized
                                                                         objects [6]. Therefore, the combinations, ranging from one to
                                                                         eight genres, were designed to encourage subjects to search for
Figure 1. An instance of the InformationCasserole prototypes             patterns and relations among the objects. Six twenty minutes
                                                                         think-aloud sessions were organized with subjects whom were
InformationCasserole is a series of video prototypes (figure 1)          asked to search for and identify different genres of objects in each
designed to study the effect that the number of visualized               of the eight combinations presented in random order.
elements (R) has on the way humans perceive the information
revealed on the interface. They show classified ads from                 As expected, the subjects perceptually clustered the objects
magazines and newspaper floating on different levels in a glass          primarily in accordance to their genre. However, they sometimes
container filled with slowly moving water. Therefore, their              tended to search for inner-divisions in objects of the same genre
settings emulate a transparent interface design and foster a direct      (e.g. clustering movies according to their cinematic kind or news
relationship between the human and digital information spaces.           articles in familiar news categories), or to merge related genres as
                                                                         a single genre (e.g. news articles and blog posts, or movie posters
                                                                         and news pictures). In total, the subjects perceived the diversity of
objects (P) in four different levels, and to each level we accord a     ·      The amount of available user attention (e.g. MetroWindow
corresponding parameter value inversely proportional to the                 disposes of little attention in contrast with DigiJuke).
number of visible object genres: the first level is a single-genre      ·      The duration of human interaction with information (e.g.
diversity (P=3); the second level is a diversity of two to three            NewsWall remains in contact for prolonged durations, while the
genres (P=2); the third level refers to diversity of three to four          interaction with YouServe is more momentarily).
genres (P=1); the fourth level describes a diversity of five to seven
genres of objects (P=0). Figure 3 shows the number of visible           ·      The convergence or divergence of the information seeking
genres of objects in each of the eight combinations as seen by the          tasks (e.g. YouServe supports finding a specific library service,
subjects, and the P value of each of the four identified diversity          while Arts&Movies is designed to acquaint people with many
levels.                                                                     movies).
                                                                               Table 1. Values of R and P parameters for each interface
                                                                                   Situated interface              Recall      Precision
                                                                            Arts&Movies                              2             1
                                                                            DigiJuke                                 3             3
                                                                            YouServe                                 1             2
                                                                            NewsWall                                 1             1
                                                                            MetroWindow                              0             3

                                                                        The results of this R and P qualification are summarized in table
               Figure 3. P as a design parameter
                                                                        1. They show how R and P can characterize, from a perceptual
                                                                        stance, the role of a situated interface as an information retrieval
3. SITUATED INTERFACES AS IR                                            engine, and parameterize the design of its information settings
SYSTEMS                                                                 accordingly. For example, when the user objectives are to search
In order to assess how R and P act as design parameters for the         for specific objects (e.g. YouServe), R is minimized, while P can
information settings of situated interfaces, the following five         be maximized when the search converges on specific genres (e.g.
interfaces that act as retrieval systems in real-world environments     MetroWindow) or minimized when it diverges to cover many
were analyzed, and for each a corresponding design was                  genres (e.g. NewsWall). A maximized R signals that the
developed and evaluated in settings that resemble or emulate its        interaction tackles a large number of objects. In this case, when P
deployment environment.                                                 is maximized (e.g. DigiJuke), it determines that this large number
                                                                        is a single collection of similar objects, or, when it is minimized
The Arts&Movies is a situated interface intended for movie              (e.g. Arts&Movies), it signals that this large number of objects is a
theatre lobbies to support the search and discovery of new              visually diversified information space.
interesting movies through an animated visualization that draws
attention to relationships between movies and concepts. The             The designers also developed the interfaces information
DigiJuke is installed inside a bar to allow people to browse and        architecture and aesthetic design, but these activities lies outside
select music songs on the touch-screen, and play their video clips      the scope of this paper. The final designs are shown in figure 4.
accompanied by related images on the projection display. The
YouServe prototype is collocated in a university library lobby to
assist people in familiarizing themselves with the available library
services, and finding a service relevant to specific needs. The
NewsWall is a large display situated in the news production room
of a broadcasting corporation. The prototype subtly visualizes the
constantly evolving news information space on the web. The
MetroWindow is designed for metro wagons and broadcasts
summarized local news about cultural and civic events in the city
of Barcelona.
In related works [7, 8] we have argued how R and P, as design
parameters, can be quantified during requirement analysis and
used alongside other aspects to conceptualize the design of
information interfaces. For each situated interface, a couple of
                                                                                   Figure 4. The situated interfaces final designs
designers analyzed the characteristics of three entities being: the
deployment environment, the humans present in it, and the
adequate information space, which was defined based on an               4. USER CONTROL OVER R AND P
understanding of the needs and goals of the humans alongside the        Based on the discerned ability of R and P to describe the
nature of the environment and the information and activity flows        information settings of situated interfaces and consequently their
that it hosts. Based on this analysis, the designers qualified the      performance as information retrieval systems, we explored the
values of R and P for each situated interface, and consequently         possibility of allowing users to control them dynamically in
described its information settings, being the quantity of               classic search and retrieval scenarios. Therefore, we designed two
information to visualize and its visible diversity. This                experimental prototypes (figure 5) for querying a large
qualification of R and P was defined in accordance with several         information space of news articles, by which users can set and
non-disjoint or co-dependent situational aspects of human-              control the values of both R and P. The prototypes were evaluated
information interaction such as:                                        to assess the feasibility of this approach and its utility.
The NewSearch prototype collocates two slide-bars adjacently to        re-querying, a more profound study should be conducted for
the query textbox for setting R and P explicitly, and returns an       further analysis. Such endeavor will constitute the essence of our
equivalent clustered visualization of news articles. Users control     future work.
the number of clusters (discerned by color) by P and their average
cardinality by R. The 3DQuery prototype uses a tag-map as a new        6. DISCUSSION
concept for defining user queries, and shows a corresponding map
                                                                       The approach that we presented in this paper demonstrates that a
of news articles. The tag-map is a rectangular box where users can
                                                                       tighter integration of HCI and IR is possible, by exploring the
place different tags of distinct sizes. The position of each tag
                                                                       potential of R and P as design parameters for the information
determines that of the corresponding cluster of news articles, and
                                                                       settings of situated interfaces. The use of these two performance
the tag size the cluster cardinality.
                                                                       metrics as design parameters may be seen as controversial,
                                                                       however, it is justified given that efficiency and information
                                                                       relevance no longer constitute the backbone of user expectations
                                                                       in all cases of human-information interaction. Instead, new
                                                                       aspects of human-information interaction (e.g. emotional,
                                                                       cognitive, experiential, situational, and cultural) are affecting the
                                                                       manner by which we conceptualize information systems. Our
                                                                       approach does not comprehensively address all these aspects, and
                                                                       therefore can be complemented by introducing new parameters to
                                                                       reflect with a higher affinity the aspects of human-information
                                                                       interaction onto the system design.
  Figure 5. NewSearch (left) and 3DQuery (right) prototypes
Each prototype was evaluated by a different group of ten subjects      7. ACKNOWLEDGEMENTS
in the lab. The subjects were asked to browse and read the             The authors would like to thank Oriol Galimany and other
collection of news articles for fifteen minutes, and then answer a     members of the Interactive Technology Group at Universitat
set of open-ended questions concerning their utility and usability.    Pompeu Fabra for their support.
The user evaluations of both prototypes showed that their learning
curve is not negligible. Subjects were not naturally inclined to use   8. REFERENCES
the slide-bars of NewSearch to control the information settings.       [1] Vogel, D. and Balakrishnan, R. 2004. Interactive public
An explanation for this may well be that they are accustomed to a          ambient displays: transitioning from implicit to explicit,
given query paradigm and the difficulty lies in making the                 public to personal, interaction with multiple users.
paradigm change [9]. However, this issue requires further                  Proceedings of UIST '04, pp. 137- 146.
investigations. Subjects found it easy to use the tag-map paradigm
in general, but it was deemed too complicated for simple queries       [2] NJ Belkin. Some (what) grand challenges for information
and more useful for prolonged search and exploration since it              retrieval. ACM SIGIR Forum, 2008
allows users to dynamically adjust queries and therefore               [3] R.A. Baeza-Yates and B. Ribeiro-Neto. 1999. Modern
eliminates or reduces the need for re-querying.                            Information Retrieval. Addison-Wesley Longman Publishing
                                                                           Co., Inc., Boston, MA, USA.
The experience and knowledge gathered with the design and
evaluation of these two prototypes would be used for developing        [4] Miller G. The Magical Number Seven, Plus or Minus Two:
future prototypes that intent to delegate more intuitively a               Some Limits on Our Capacity for Processing Information.
dynamic control over the information settings of information               The Psychological Review, 1956.
retrieval interfaces to their users.                                   [5] L. Hallnäs and J. Redström. 2001. Slow Technology,
                                                                           Designing for Reflection. Personal Ubiquitous Comput. 5, 3
5. CONCLUSIONS                                                             (January 2001), 201-212.
During the course of this paper we have explored ways to tightly       [6] Koffa, K. (1935): Principles of Gestalt Psychology. London,
integrate IR and HCI in a variety of human-information                     Routledge & Kegan Paul Ltd.
interaction scenarios where interfaces act as information retrieval
                                                                       [7] Moghnieh, A., & Blat, J. (2009). A basic framework for
systems. In particular, we studied how R and P as design
                                                                           integrating social and collaborative applications into learning
parameters can describe the information settings of these
                                                                           environments. Proceedings of m-ICTE’09 Vol. 2 (pp. 1057-
interfaces. Both aspects were parameterized on a 0-3 scale on the
                                                                           1061), 2009.
basis of conducted experiments to analyze different possible
information settings. Consequently, five situated interfaces were      [8] Moghnieh, A., Sayago, S., Arroyo, E., Sopi, G., and Blat, J.
designed and analyzed to discern how R and P are qualified                 Parameterized User-Centered Design for Interacting with
during requirement analysis, and how together they describe the            Multimedia Repositories. In Proc. MMEDIA '09, IEEE.
information settings of situated interfaces, and therefore help        [9] B. Buxton. 2007. Sketching User Experiences: Getting the
reflect the interaction requirements onto the interface design.            Design Right and the Right Design. Morgan Kaufmann
Finally, we investigated the feasibility and utility of delegating         Publishers Inc. CA, USA.
control of R and P dynamically to users during classic search and      [10] S. Bødker. 2006. When second wave HCI meets third wave
retrieval scenarios, and concluded that while this approach is              challenges. In Proceedings of NordiCHI '06.
clearly advantageous for exploration tasks and tasks that require
                  Towards User-Centered Retrieval Algorithms

                                                             Manuel J. Fonseca
                                          Department of Computer Science and Engineering
                                            INESC-ID/IST/Technical University of Lisbon
                                            R. Alves Redol, 9, 1000-029 Lisboa, Portugal
                                                                mjf@inesc-id.pt


ABSTRACT                                                                    not be able to find what they want or they may not even be
Nowadays almost all retrieval algorithms (for text, images,                 able to submit a query to the system.
drawings, etc.) are mainly concerned in achieving good                        For illustration purposes let us consider the following hy-
system-centered measures, such as precision and recall. How-                pothetic scenario: “We developed a system for retrieving
ever, these systems are used by users, who try to achieve                   generic complex vector drawings, like for instance techni-
goals through the execution of tasks. To better satisfy the                 cal drawings, architectural plants or clipart drawings. We
users’ needs we must involve them in the development pro-                   evaluated it using query-by-example and a set of predefined
cess of the retrieval systems.                                              drawings, achieving a good precision and recall measure. Af-
   In this paper, we argue that a user-centered approach,                   terwards, when we delivered the system to users, we noticed
where users are included in the development cycle of the                    that they were not able to use it, because they could not find
overall retrieval system, can lead to improved retrieval algo-              the (first) drawing that they must use as query to find the
rithms and also to a better user satisfaction while using the               desired drawing. Moreover, users do not want to search for
system.                                                                     the complete drawing, but only by a subpart of the drawing.”
                                                                              This scenario could be avoided if before we developed the
                                                                            retrieval system we asked users what were their needs, what
Categories and Subject Descriptors                                          did they want to perform on the system and how they want
H.3.3 [Information Storage and Retrieval]: Information                      to do it. To collect all this information we need to apply
Search and Retrieval; H.5.2 [Information Interfaces and                     a user-centered approach where users are involved in the
Presentation]: User Interfaces - Graphical user interfaces                  development of the retrieval system and algorithms.
(GUI)                                                                         In this paper we defend an user-centered approach as a
                                                                            way to create better retrieval algorithms and improve the
                                                                            overall retrieval system. We start by shortly describe the
General Terms                                                               user-centered approach and the iterative cycle used in the
Design, Human Factors                                                       user interface design. In Section 3 we describe our appli-
                                                                            cation of the user-centered approach in the development of
Keywords                                                                    retrieval algorithms. Finally, we present some conclusions.
User-Centered Design, User-centered approach, Retrieval al-
gorithms                                                                    2.   USER-CENTERED DESIGN
                                                                               The user-centered design (UCD) is a design methodology,
1.    INTRODUCTION                                                          where the needs, skills and limitations of the users are taken
   The majority of the retrieval algorithms, whether they                   into account during all stages of the development of the sys-
are for text, images, drawings, 3D objects, audio, video, etc.,             tem. The key premise of the user-centered design is that
are mainly interested in performing well for system-centered                the active involvement of the users in the development pro-
measures, like for instance precision and recall. However,                  cess as well as in the evaluation of the interactive products
these systems are used by users who want to perform spe-                    can lead to well-designed systems that best meet the desired
cific tasks and achieve specific goals. We can develop a good               usability goals. These systems will take advantage of users
retrieval system, that performs well against a predefined                   skills, will be relevant to their work and activities, and will
ground truth, but when we delivery it to users they may                     help them rather than constrain their actions.
                                                                               One of the principles from the UCD [4] states that we
                                                                            first need to identify who the users will be (profile, skills
                                                                            limitations, etc.) and what tasks they perform and/or wish
                                                                            to perform. The second principle mentions that the systems
                                                                            should be exposed to users in the early stages of development
                                                                            to collect feedback from them. Finally, the third principle is
Copyright c 2011 for the individual papers by the papers’ authors. Copy-    iterative design. The results and feedback from user testing
ing permitted only for private and academic purposes. This volume is pub-   should be used to fix and improve the system. The UCD
lished and copyrighted by the editors of euroHCIR2011.
EuroHCIR ’11 Newcastle, UK                                                  assumes an iterative cycle with identification of the users’
.                                                                           needs, design of the solution and evaluation, repeated as
often as necessary, as depicted in Figure 1.                      (system and user centered measures) should be used to im-
                                                                  prove the system and to refine the user and functional re-
                         !"#$%&'(%)&"*%                           quirements of the retrieval system.
                           +'&,-"."%                                 One of the things that we observed in one evaluation ses-
                                                                  sion with users, was that users did not care about where
                                                                  in the order of retrieval the intended drawing appears, the
                                                                  important fact being that it was there. One of the users pro-
                                                                  duced this comment “It [the system] found it [the drawing]!
                                                                  That is what counts!” However, when we evaluate retrieval
                                                                  systems, the majority of the existing measures and ground
            89&,1&20'%                      /0,120'%              truth datasets privilege precision. Of course this system-
                                          3#".4'%&'(%             centered evaluation is important, but we should also take
                                          5$0606-7.'4%            into account the users perspective, where they privilege re-
                                                                  call.

                                                                  3.1   An Example
                                                                     Involving the users can a↵ect the way we develop the re-
     Figure 1: User-centered design iterative cycle.
                                                                  trieval algorithms. In recent years we developed a generic
                                                                  approach for complex vector drawing retrieval, based on the
                                                                  topology and geometry of the elements present in the draw-
3.    USER-CENTERED RETRIEVAL                                     ing. These two features were used to describe the content
   Typically when we want to develop a new retrieval ap-          of the drawings, and during matching, we first compare the
proach, we look at the media to retrieve (text, audio, video,     drawings using topology and them we compare the geome-
drawings, images, etc.), identify the features that better de-    try of those with similar topologies, giving the same weigh
scribe the media, create a matching algorithm and finally         to both features (for more details see [1]). This generic re-
we compute precision and recall. Although this methodol-          trieval approach was used to develop one system for retriev-
ogy allows us to create retrieval systems, we believe that by     ing technical drawings [3] and another for retrieving clipart
including the user in the development cycle will allow us to      drawings [2].
deliver better and more usable retrieval systems, that will          Before we developed this solution and the two retrieval
allow users to achieve their goals and not only systems that      systems, we performed user and task analysis to understand
have a good precision and recall performance.                     how users wanted to make queries to this type of systems.
   Moreover, we should not develop retrieval systems, and         We notice that they prefer to draw sketches of the drawing
that includes descriptor computation, matching algorithms         that they were looking for than to submit an existing draw-
and presentation of the results, without first identifying a      ing to perform a query-by-example. Moreover, most of the
set of user needs and functional requirements (first step in      times they do not have a drawing similar to the one that
the user-centered design). We need to know our users, their       they are looking for.
skills, their background, their profile. We must identify their      The two systems were both evaluated with users, and from
needs and requirements, their goals and how they achieve          those evaluations we observed that the way users search for
them. In summary, we need to do an user and task analysis         technical drawings was di↵erent from the way they search
before we start developing our retrieval system. User and         for clipart drawings [6]. While in the case of technical draw-
task analysis should not only influence the design of the         ings users draw more complete sketches with several visual
user interface, but also the design of the retrieval approach     elements, and consequently defining a richer topological con-
or algorithm.
   For instance, users could use various strategies to perform
a search in a drawing retrieval system. They could use a
drawing that they already have, in a file, to search for sim-
ilar drawings using query-by-example, or they could draw
a sketch of the drawing that they want to find. As we can
see, the retrieval solution (feature extraction, indexing and
matching algorithms) will be di↵erent on each case. While
in the first case we only need to compare two drawings of
the same complexity and with the same characteristics (sets
of lines and polygons), in the second case we need to com-
pare complex drawings with sketches (typically simpler and
with less elements). Thus, the way users perform the task
to achieve their goal influence the retrieval approach that
we should develop.
   After developing the retrieval solution based on the user
requirements, we should evaluate the retrieval system, using
not only system-centered measures, but also user-centered
measures, such as time to complete tasks, error rates, sat-
isfaction, etc. As in the user-centered design of interactive     Figure 2: Sketch specifying a query to find a tech-
systems, results from the evaluation of the retrieval system      nical drawing.
                                                                   4.   CONCLUSIONS
                                                                      In this paper we defended a user-centered approach for
                                                                   the development of retrieval systems. As in the case of user
                                                                   interfaces design, also for retrieval systems is important to
                                                                   know our users, adapt the algorithms to them, and involve
                                                                   the users in the evaluation of the system.
                                                                      We believe, and we had confirmed, that the involvement
                                                                   of the user in the development cycle of retrieval systems can
                                                                   conduct to better systems that satisfy users needs and are
Figure 3: Sketch specifying a query to find a clipart              more adapted to them.
drawing.
                                                                   5.   ACKNOWLEDGMENTS
                                                                     This work was supported by FCT through the PIDDAC
figuration, as illustrated in Figure 2; for clipart drawings,      Program funds (INESC-ID multiannual funding) and the
users produced simpler sketches, with fewer elements and           Crush project, PTDC/EIA-EIA/108077/2008.
with a poorer topological description (see Figure 3).
   Due to this observation during tests with users, we refine
our retrieval algorithm for retrieving clipart drawings [5],       6.   REFERENCES
putting more emphasis on the geometry than on topology.            [1] M. J. Fonseca. Sketch-Based Retrieval in Large Sets of
With this change we were able to achieve a better precision            Drawings. PhD thesis, Instituto Superior Técnico /
and recall measure for clipart drawings, and we adapted our            Technical University of Lisbon, July 2004.
retrieval system to the users’ way of sketching queries.           [2] M. J. Fonseca, B. Barroso, P. Ribeiro, and J. A. Jorge.
                                                                       Retrieving clipart images by content. In Proceedings of
3.2    Discussion                                                      the 3rd International Conference on Image and Video
   We can not develop our retrieval algorithms without in-             Retrieval (CIVR’04), volume 3115 of Lecture Notes in
volving our users into the development cycle. As in the                Computer Science, pages 500–507. Springer-Verlag,
design of interactive systems, also in the development of re-          Dublin, Ireland, July 2004.
trieval systems we must involve the users.                         [3] M. J. Fonseca, A. Ferreira, and J. A. Jorge.
   They must be involved in the initial phase, so we can               Content-based retrieval of technical drawings.
understand how they search for the information, what are               International Journal of Computer Applications in
their knowledge, what are their limitations and what is their          Technology (IJCAT), 23(2–4):86–100, 2005.
profile. With this we are able to identify users needs and         [4] J. D. Gould and C. Lewis. Designing for usability: key
functional requirements.                                               principles and what designers think. Commun. ACM,
   Later on, during the development of the algorithms we               28(3):300–311, 1985.
should take into account this input and adapt the algorithms       [5] P. Sousa and M. J. Fonseca. Geometric matching for
to provide “good results” for ”our” users, and not for the users       clip-art drawing retrieval. Journal of Visual
in general, or for the system.                                         Communication and Image Representation (JVCI),
   Finally, during the evaluation stage, besides computing             20(2):71–83, February 2009.
the traditional system-centered measures, for a set of datasets    [6] P. Sousa and M. J. Fonseca. Sketch-based retrieval of
defined as ground truth, we should also involve users in the           drawings using spatial proximity. Journal of Visual
evaluation to collect quantitative and qualitative measures.           Languages and Computing (JVLC), 21(2):69–80, April
Information gather during evaluation should be used to im-             2010.
prove the retrieval algorithms and the overall retrieval sys-
tem, in the next iteration of the iterative cycle of the user-
centered approach.
           Design Thinking for Search User Interface Design
                                                                Arne Berger
                                                   Chemnitz University of Technology
                                                       Strasse der Nationen 62
                                                           09107 Chemnitz
                                                              Germany


                                                   arne.berger {at} informatik.tu-
                                                           chemnitz.de



ABSTRACT                                                                 better understanding, DT is used as an expression for the design
The paper describes with the help of a brief example how design          process, while DM is used as an expression for any design method
methods, namely those formed in design thinking can help search          from the DT or any other DM toolbox.
user interface design to innovate throughout the software
development process.                                                     2. CURRENT STATE OF DESIGN
                                                                         METHODS IN SEARCH USER INTERFACE
Categories and Subject Descriptors                                       DESIGN
H.5.2 [Ergonomics, Evaluation/methodology]: Design Methods               The possibilities of DM are still badly implemented into product
in Search User Interface Design                                          development. However, a subset of DM, namely User Centered
                                                                         Design (UCD) is fairly well implemented in the domain of
                                                                         interface design, including that of search user interface design.
General Terms                                                            UCD significantly helps evaluating user needs but often fails to
Measurement, Documentation, Performance, Design, Human
                                                                         innovate. UCD methods mainly consist of a relatively strict set of
Factors, Experimentation
                                                                         methods compared to what DT and DM have to offer [9.]. Those
                                                                         methods are capable of gaining insight and evaluating interfaces
Keywords                                                                 but do not encourage an innovation process for future user
Design Thinking, User Interface Design, Design Methods,                  interfaces.
Qualitative Studies
                                                                         As an user interface design professional working in an academic
                                                                         development environment that is mainly formed by information
1. INTRODUCTION                                                          retrieval experts, the following description of a typical workflow
Since Tim Browns ingenious talk on TED [1.], Design Thinking             abstracts the prototypical UCD process of developing search user
(DT) had a huge impact on the business and design world. By              interfaces.
injecting the way designers think into accustomed business
processes, CEOs hoped to gain an advantage in competition.
Designers on the other hand hoped their overall influence might
                                                                         2.1 Current Process of Search User Interface
increase. However, the field has more to offer than bringing             Design
creative techniques to supposedly uncreative domains. The first          1. Users tasks and problems are observed via Site Visits or
publications on the matter appeared as early as the late 1960s [2.,      Website Analytics [10.]. Those methods help to gain insight into
3., 4.] as a way to externalize the enigmatic design process. Since      specific user problems. The combination of both nowadays is the
then, the creative application of design methods (DM) has proven         holy grail of gaining insight into users issues [10.].
its effectiveness, fun and relevance countless times. [5., 6.]
                                                                         2. Information retrieval experts and search user interface
Despite its persistent application in typical creative domains, the
                                                                         designers use methods like brainstorming to plan a software
radical application of DM for digital age products is still a young
                                                                         product. It is used mainly as a conversation starter, but also
discipline.
                                                                         functions as a way to frame the current state of technical
                                                                         possibilities.
1.1 Design Thinking vs. Design Methods
The difference between DT coined and developed at Stanford [7.]          3. Users problems (step 1.) are interpreted and tried to be solved
and DM as defined by Jones amongst many others [3.] needs to be          with the help of the technical possibilities (step 2.) which are then
precised in another publication. For now, the author (a Designer)        implemented.
is grateful to see the broad spectrum of DM finally being brought
                                                                         4. The usability of the search user interface proposed in 3. is
to attention due to the success of DT. However, there are way
                                                                         evaluated via user studies comparable to the ones in step 1.
more methods to use than the 51 methods as suggested by DT [8.]
and there are way more feasible design processes than defined in         Iterations: The abovementioned steps are iteratively repeated
DT. Because of the briefness of this paper and for the sake of a         several times. With the help of prototypes the interface is refined
                                                                         before a final implementation takes place. However these steps
 Copyright © 2011 for the individual papers by the papers' authors.
 Copying permitted only for private and academic purposes. This volume   only help to streamline the interface. They are not fully useful for
 is published and copyrighted by the editors of euroHCIR2011             innovating an interface according to DTs possibilities.
2.2 Critics of the Current Process                                      3.1.1 Very Low-Fi Prototype (Conceptual Model)
We believe that the process of nailing down the problem and             Generated by: user
suggesting a vital solution after framing technical possibilities and
                                                                        Function: none, may not be technically feasible
observing users is insufficient. Those well established methods
have the main advantage of providing hard numerical measures.           Workflow: only conceptual
Which is even more so, when measures like precision and recall
are used to learn how efficient a system is. Via those standardized     Visual Design: none
measurements a comparison between different solutions is easy to        Medium: analog
draw. Relying on those hard measures only shows insights, which
can be formulated in numbers and concluded from those.                  Modality: any

On the other hand, soft properties of a search user interface like      Usually user generated, often not understandable without the
»what user really want«, »fun of use«, »suitability to unusual          creators explanations. It only describes a preliminary workflow of
tasks« and in parts »user satisfaction« are next to impossible to       operations and functions and is not necessarily technically
measure via hard numbers. Although efforts exist [11.]                  feasible.
measurability of qualitative soft properties is hard to be
standardized. Outcomes therefore are less clear cut and often fail      3.1.2 Low-Fi Prototype (e.g. Paper Prototype)
to be comparable via statistics. As the academic viewpoint in the       Generated by: user, designer
field tends to analytic comparison, soft properties are seldom          Function: none, may not be technically feasible
explored, described and measured. Therefore subsequent findings
often fail to be implemented.                                           Workflow: preliminary, mimicking operations
Based on the before mentioned, we propose the radical application       Visual Design: none
of DT in search user interface design via »participatory
                                                                        Medium: analog
prototypes«. This concept integrates users and developers alike.
We demonstrate its process briefly in the next chapter and explain      Modality: any
its application in three following examples.
                                                                        Usually presented via the Wizard-Of-Oz technique it incorporates
3. PROPOSED DESIGN THINKING                                             as many operations as possible and always fakes function.
PROCESS FOR SEARCH USER                                                 3.1.3 Mock-Up
INTERFACES                                                              Generated by: designer
In the business world (see introduction) DT is foremost a process
                                                                        Function: none, may not be technically feasible
used for innovating new products.
                                                                        Workflow: mimicking operations closely
The DT process is defined as following [8.]
                                                                        Visual Design: none
Understand: Understand problem and context.
Observe: Externalize future users problems via e.g. extreme user        Medium: digital
interviews or empathy maps.                                             Modality: any
Define: Interpreting and weighting the gained knowledge from
                                                                        Is often (and should be) visually unapealing, mimicking
the previous steps via e.g. ad-hoc personas.
                                                                        operations closely, but fakes function.
Ideate: Using common or uncommon creative techniques, e.g.
body storming for generating many ideas.                                3.1.4 Dummy (often refered to as Click Dummy)
Prototype: Visualize and communicate ideas with the help of fast        Generated by: designer
and cheap prototypes with paper, Lego bricks or the product box         Function: none, may not be technically feasible
method.
                                                                        Workflow: mimicking operations
Test: Future users test those prototypes, via e.g. story telling
techniques.                                                             Visual Design: existing, often visually polished
We believe that DT can and should be incorporated in any                Medium: digital
possible stage of a development cycle. Interface design prototypes
are extraordinary easy to manufacture and cost next to nothing.         Modality: any

We suggest to apply the DT process more closely to the                  Incorporates a polished visual design, mimicking operations, but
development of search user interfaces to benefit from its many          fakes function. May or may not incorporate the proposed
advantages, esp. to force the pace of innovation.                       interaction paradigm. The most common implementation of the
                                                                        later is a browser based click dummy that fakes the functions off a
3.1 Prototype Categories                                                mobile touchscreen device.
As the label »prototype« may be misleading, we tend to think of
anything capable of producing feedback as a prototype. To make          3.1.5 High-Fi Prototype
further understanding easier we classify prototypes as following in     Generated by: designer, developer
the order of their advancement:                                         Function: incorporates some or most of the proposed functions
                                                                        Workflow: mimicking operations
Visual Design: existing, often visually polished                      we introduced participatory prototypes to search user interface
                                                                      design for the creation of playlists for mobile video consumption.
Medium: digital
                                                                      Two other successful projects include Design Thinking for a
Modality: same as end product                                         customized faceted navigation and Design Thinking for a
Is similiar to a Dummy but also incorporates some of the              multitouch interface for searching in large multimedial
proposed functions. It also incorporates the proposed interaction     repositories.
paradigm.
                                                                      4. DESIGN THINKING THE CREATION
3.1.6 Alpha Grade Version                                             OF PLAYLISTS FOR MOBILE VIDEO
Generated by: developer
                                                                      CONSUMPTION
Function: incorporates some or most of the proposed functions         We wanted to address a problem, know to many smartphone users
                                                                      on the move. We understand that, weather commuting or going
Workflow: mostly operational
                                                                      out with friends users usually avoid constructing complex search
Visual Design: may or not be existing                                 queries to find suitable content to watch.
Medium: digital                                                       To define the problem, we asked users what they miss and want
                                                                      from a mobile TV application. Two main points emerged:
Modality: any
                                                                      With services like youtube consumers are left having to refine a
A prototype proposed by developers that demonstrates most basic       search query several times or to use non-customized item lists
functions, usually does not feature a polished design.                such as »most viewed«. On the other hand, in traditional TV a
                                                                      moderator weaves a golden thread and guides viewers via this
3.1.7 Beta Version                                                    potentially emotional connection through a series of video clips.
Generated by: developer                                               After an ideate session the most promising prototype was a mixed
Function: incorporates some or most of the proposed functions         breed of playlists, woven together by emotional metadata. To gain
                                                                      insight into users mindsets regarding the construction of those
Workflow: fully operational                                           personalized playlists we applied various DM.
Visual Design: existing                                               To find out which emotional content attributes users are looking
                                                                      for, we asked participants to map out a virtual space of content
Medium: digital
                                                                      properties and show how they thought to navigate within it. This
Modality: same as end product                                         method usually helps to discover pathways and interests in which
                                                                      people make sense of a particular content space. The results
A visually polished prototype most often proposed by developers       eventually help to make sense of how to construct queries for
is a functioning program that may have bugs or quirks and is          filter specification.
mainly used in order to get rid of those.
                                                                      Users were asked to individually draw a map or diagram of what
3.2 Observations for Prototypes                                       comes to their mind when being on the move and having a mobile
As this brief listing suggests most of the prototyping work in        video handset available, whether sitting on public transportation
search user interface design is done by a designer. Thus helping to   alone or being in a pub with friends. The six users had 15 minutes
maintain a conversation between what users want and what              time to draw a map or scheme and were asked to freely associate
developers can implement.                                             parameters to form a personalized playlist. Given the mindset of
                                                                      being on the move, users formed questions from a simple
There are usually no direct prototypes from the users. Users          vocabulary and subsequently wanted to change only certain
comments or observations are interpreted multiple times. First        parameters after watching a few video items. A discussion with all
they are made operable via prototypes, crafted by designers,          participants followed.
which subsequently are interpreted by the developers.
                                                                      The results lead to the assumption that users are interested in
Prototypes from the perspective of a developer are used only for      direct mood filters. Most of the user generated maps feature mood
evaluation during the end of the implementation cycle. As a lot of    clusters or the simple question »how« in a list of questions.
code and effort went into these, heavy changes are omitted and
hopefully eliminated with earlier prototypes.                         Based on those findings the developers of the future interface with
                                                                      the help of a designer proposed a low fidelity prototype containing
While the main goal of DT is to encourage interdisciplinary user      a filter named »How« together with more filters based on the four
groups to create innovative prototypes, it does not focus on direct   cardinal questions Who, Where, When, What. This was done
prototypes from users or developers.                                  because all those metadata fields could be filled with metadata
                                                                      readily available in the existing database. To prove the concept it
3.3 Implications for Process                                          was introduced to twelve users. Users’ feedback on this approach
We want to continously implement user prototypes into the             was insightful in two ways. On one hand, users at large expressed
development and we also encourage a process where developers          their general approval on the advantages that might arise by
explain technical feasibility via prototypes even in very draft and   constructing exhaustive content filters with just a few steps of
early stages.                                                         interaction. On the other hand, the pre-structured characteristic
This realization came through practical usage of various DM in a      was heavily criticized. However, the rigidly defined prototype
couple of projects. The following chapter briefly describes how       inspired participants to incredibly rich feedback. This proposal in
                                                                      combination with open ended questions has proved to be a fast
and convenient way to gain user feedback on a large variety of        References
issues without a lot of explanation. The main insight is, that all    [1] http://www.ted.com/talks/tim_brown_urges_designers_to_thi
users found and used the filter option »how«. Most user feedback          nk_big.html (accessed Apr 29, 2011)
was given on only this feature. Findings are discussed in depth in
[12].                                                                 [2] Archer. Design as a discipline. Design Studies (1979) vol. 1
                                                                          (1) pp. 17-20
TV Anytime [13.] is a metadata standard that defines metadata for
                                                                      [3] Jones. Design Methods. John Wiley and Sons (1992)
broadcasts. It is common to use in describing video items and also
features 53 moods. For the sake of technical interoperability we      [4] Newell et al. The processes of creative thinking. (1959)
wanted to stay within the realm of this particular metadata           [5] Lawson. How designers think: the design process
standard but also wanted to make the proposed moods more                  demystified. (2006) (Elsevier)
accessible for users. Based on those technical restrictions and the
previous results we individually asked 45 potential users to sort     [6] Schön. The reflective practitioner: how professionals think in
the moods into self-defined categories that made sense to them.           action. Basic Books (1983)
                                                                      [7] Kelley. The Art of Innovation: Lessons in Creativity from
At least two completely different ways of sorting prevailed. One
                                                                          IDEO. Crown Business (2001)
group of users preferred an order that resembles a classification
into movie genres, while a second group was interested to sort        [8] Plattner. Design Thinking. Mi Wirtschaftsbuch (2009)
them according to emotional dependencies. While a number of 45        [9] Cooper. About Face 3. Wiley and Sons (2007)
users was significant enough to reveal two groups, users assigned
to the first group were too few to manifest significance. Focusing    [10] Hearst. Search User Interfaces. Cambridge University Press
on the larger group (35 participants) seven mood categories were           (2009)
filled unanimously. Apart from very few moods all other moods         [11] Hassenzahl et. al. AttrakDiff: Ein Fragebogen zur Messung
are mutually joint to groups. This could make the previous                 wahrgenommener hedonischer und pragmatischer Qualität.
discussed low fidelity prototype more flexible in navigating               In: Proceedings of Mensch & Computer (2003)
complete mood sets. Based on those findings, users proposed an        [12] Knauf, Berger, et. al. Constraints and simplification for a
interface that asks questions in an order that is more determined          better mobile video annotation and content customization
by them. A subsequent High-Fi prototype was built, incorporated            process. In Workshop Proceedings of the EuroITV. (2010)
1000 video items. It allows the selection of a variety of moods as
well as a combination of filters derived from the five cardinal       [13] TV-Anytime Phase 1: Metadata schemas
questions. A formal user study is now underway.                            http://www.etsi.org/deliver/etsi_ts/102800_102899/1028220
                                                                           301/01.02.01_60/ts_1028220301v010201p.pdf (accessed Oct
5. Acknowledgements                                                        10, 2010)
This publication was prepared as a part of the research initiative
sachsMedia (http://sachsmedia.tv), which is funded by the
German Federal Ministry of Education and Research under the
grant reference number 03IP608. The authors take sole
responsibility for the contents of this publication.
              The Development and Application of an
        Evaluation Methodology for Person Search Engines
           Roland Brenneke                                    Thomas Mandl                          Christa Womser-Hacker
          Information Science                               Information Science                          Information Science
        University of Hildesheim                          University of Hildesheim                     University of Hildesheim
         Marienburger Platz 22                             Marienburger Platz 22                        Marienburger Platz 22
               Germany                                           Germany                                      Germany
    roland.brenneke@gmx.de                            mandl@uni-hildesheim.de                    womser@uni-hildesheim.de

ABSTRACT                                                                 Web search or go directly to social networks to find out about
                                                                         people. Nevertheless, 10% is still a significant share and hit rates
This paper presents a user oriented evaluation methodology for           for person search engines are constantly high. In addition, many
comparing person search services on the Web. Many established            of these searches may have a high impact. Many recruiters use
system oriented methods from information retrieval cannot be             person search engines for checking on candidates.
applied to this domain. Our user oriented methodology is applied
                                                                         A questionnaire study among 548 enterprises was published in
to a test comparing the person search engines yasni, pipl.com and
                                                                         2010 [5]. This Social Media HR Report 2010, revealed that in
123people. The user study with over 30 participants led to
                                                                         2009 over 59% of the companies have used the internet to check
relevant results. The coverage of data object types within the
                                                                         on applicants. Almost 10% had already turned down an
person search engine results is quite different. Especially the
                                                                         application because of information on the Web. Companies who
amount of pictures and social media network entries which are
                                                                         do not use the Web for checking on applicants` state that lack of
presented by the systems and which are perceived by the test users
                                                                         time and ethical questions are the main reasons not to do so [5].
differ greatly. The results also revealed a tendency to judge people
more positively when more information was found.                         An international study showed that this behaviour is more
                                                                         widespread in the US than in European countries [3]. Interviews
                                                                         with decision makers in German companies revealed that they are
                                                                         well aware of the potential of retrieving applicant information
1. INTRODUCTION                                                          [11].
Person search engines are important specialized search services on
the Web. These systems consult other services for information            The use of person search engines for job applicants is only one
about a person and integrate it in one interface. They can be            potential usage scenario; however, it is a very prominent one.
regarded as meta search services or one point stops for personal         Other than that, there are many reasons for why a user would want
information. Mostly, they are tailored for normal people and not         to search for a person. And despite the use of a named entity in
for celebrities and other famous people. As such, it is different        the search, the information need is rather vague and can be
from named entity search in general.                                     rephrased with “Find out something about person X”.
Especially in the Web 2.0 and its ease of publishing content on          The success of a person search engine depends on many factors.
the Web, many people deposit much information about them or              Person search engines are meta services which extract results from
content they created in various sites. Users need to have the            a large variety of different online media. The presentation of these
proper information competence to foresee the consequences of             results in the user interface is an essential factor for the success of
such behavior. Often, users are advised not to publish too much          the search service. If a result is far down on the result page and
information. Online reputation management becomes an                     the user never scrolls there, potentially relevant items cannot be
important issue. On the side of the users, social networks and           found. That means that the search capability is only one success
person search services lead to information ethical considerations        factor for person search engines. Consequently, our experiment
about the use of personal information.                                   was designed as a user test. We intended to evaluate the user
                                                                         experience and the success with the tool person search engine and
Searching on information about others is a very frequent                 neither specific system components nor absolute retrieval
information need and a reason for using a search service.                performance.
According to Google Trends, the most popular person search
services receive over 200,000 hits per day. However, 90% of the
users do not rely on person search engines but they use general          2. RELATED WORK
                                                                         The evaluation of retrieval systems is central in information
 Copyright © 2011 for the individual papers by the papers' authors.      retrieval research because the system performance cannot be
 Copying permitted only for private and academic purposes. This volume   predicted. The most influential retrieval evaluation methodology
 is published and copyrighted by the editors of euroHCIR2011.            is called the Cranfield paradigm. Information retrieval research
                                                                         has adopted an evaluation scheme which tries to ignore subjective
 EuroHCIR 2011. The 1st European Workshop on Human-Computer              differences between users in order to be able to compare systems
 Interaction and Information Retrieval. July 4th 2011. Newcastle, UK     and algorithms. The user is replaced by a prototypical and
                                                                         constant user. Relevance judgments are provided by domain
                                                                         experts [8, 10].
Cranfield evaluations have often been criticised for several             We selected people who had posted a large amount of information
reasons. The main objections come from advocates of user                 about themselves in the network. Again, this was done to obtain
oriented studies. The search situation of users depends on many          similar and comparable difficulty for the three test cases. Three
individual and contextual factors which can only be captured in          person search engines were selected for the comparative test. We
user experiments [6]. The real user experience and the success in        chose yasni, pipl.com and 123people because they were very
a real world situation cannot be measured with the laboratory style      popular at the time of the study according to Google trends. All
experiments based on the Cranfield paradigm [12].                        three companies claim that they exploit only information available
Person search engines have a higher chance to succeed than               on the public Web.
general purpose search services. The retrieval with named entities
is known to be easier than searches without names entities [9].          4. STUDY
The selection of a person search engine hints the type of result.        Students of the University of Hildesheim were recruited through a
Consequently, synonymy between names and words are a smaller             mailing list of students. Participation was voluntarily and no
problem than in general purpose search engines. Synonymy                 gratification was given. None of the participants had a computer
between names, on the other hand, is a big challenge for person          science background. They all were frequent Internet users and had
search engines.                                                          searched for people before but only 10% had used a person search
                                                                         engine before. The others use Google or social networks to find
3. METHODOLOGY                                                           information on people.
The balance between control and realism is a challenge for each          The issue of relevance is always a crucial one in information
experiment. For the presented study, we chose a user experiment          retrieval evaluation. In our study, any item could contribute to the
to test person search engines because an approach purely                 full picture of the applicant. Despite the clearly defined scenario,
dedicated to retrieval power does not mirror the user experience         it remains vague which information is needed and what type of
for person search engines well. It is necessary to limit the realism     information is useful. It is difficult to assign relevance to items or
in a user experiment in order to allow comparison across                 even weights to categories. The user interfaces of the person
participants in the test. We selected a job applicant scenario in        search engines present the items in categories like e.g. social
order to make the experiment interesting for the users. Applicant        network entries or videos.
search is a very prominent usage type. The method was successful         A questionnaire study [7] showed that users search mainly for the
in making the experiment attractive. The test users liked the            following items in the order presented when retrieving
experiment very much and through word of mouth, more                     information about a specific person:
applicants wanted to register for the experiment than were needed.
                                                                              •    Contact information
The selection of persons for the task defines the content for the
                                                                              •    Profile on a social network
test. It seemed necessary to identify people for whom much
                                                                              •    Photo
information can be found on the Web. If there were no videos,
working results like presentations or social network entries, then            •    Information about professional accomplishments or
the performance of the person search engine could not be tested                    interests
with our experiment. So even if the persons selected are not
representative in terms of amount of online information for the          The most frequently researched item, contact information does not
whole population or all persons who are indexed in a person              apply for our scenario because the persons had sent a letter of
search service it increases the validity of the test to select persons   application. The next two most frequent items are included. The
with a large amount of online information.                               fourth item is rather vague as some of the other items following as
                                                                         far as the categories of person search engines are concerned. As a
Three people were carefully selected who had similar                     consequence, the data available does not justify the assignment of
qualifications. For them, a job profile was developed which was          weights to some items. In our study, all clicks on items were
given to the participants together with the names of the people.         scored equally. The results will also show which of the items were
The users were asked to search for these people who would be             most popular. The time per applicant was limited to 10 minutes.
interviewed for the position and check if they were appropriate.         The entire experiment took 45 minutes on average including the
The job description and the name of each applicant were given to         pre- and post questionnaire.
the test persons. Each of the candidates was well qualified for the
job but had one negative aspect in his online data. One was an           One search service modified the interface after the first two tests.
advocate of nuclear power and the job was for offered by an              So it was necessary to eliminate three test sessions from the
alternative energy company. The second applicant was a serial            results and recruit further test users. This shows that not only the
entrepreneur who portrayed himself on Facebook in pictures with          dynamics of the personal data presents a challenge for the test but
attractive women and sports cars. The third applicant had party          also the ongoing modifications of the search engine. Overall, 34
photos online where he could be seen smoking cigarettes and he           took part in the experiment. Due to the problems of a relaunch of
considered himself as lazy in one social network while he had a          one service, we could consider the experiments of 10 users of
very business oriented self image in another social network.             123people, 11 users of Pipl and 10 user of Yasni.

Obviously, such a scenario has some limitations. Person search           Each test person worked with one search engines on all three
engines need to disambiguate between people with the same                applicants. This between groups approach was applied was mainly
name. We decided to choose people who are not ambiguous in               applied to avoid a long learning phase for each of the person
order to have the same difficulty for each person. Such issues are       search engines. All tests were recorded with appropriate software.
evaluated in the system oriented campaign WEPS [1].
                                   Figure 1: Popularity of person search engines according to Google Trends

5. RESULTS
The result description focuses on the information perceived by
users and the performance of the test users in the application task.
The information items clicked by the users were categorized. It
can be seen that the services lead to a similar number of clicks
when summed up over all users. Each of the services resulted in
between 110 to 120 clicks for the ten test persons. In the case of
Pipl, 11 test persons were considered. Each engine leads to a
sufficient number of entries and has abundant information on the
applicants in our scenario. This was a goal of the test design and
was accomplished.
The type of information which was encountered was quite
different. It can be easily seen, that 123.people facilitates access to
photos whereas Pipl leads more users to social network entries. A
comparative analysis for the services for the most popular item
types is shown in Table 1.
In the post test questionnaire, users were asked about their
subjective impression of the service they had used. In the overall
satisfaction, 123people was rated highest. For the page structure,
pipl received the best grades and the coverage of different
business networks yasni was rated as most successful. In the latter
case, the finding from the objective click data was confirmed.
Further details on the results are provided in [2].                            Figure 2: Clicks on items in the three person search engines


                                                   Table 1: Comparison of data types encountered

                        Item                               123people            Pipl               Yasni
                        Photo                                  ++               +−                  −−
                 Business network                               −                −                  ++
                   Social network                               −               ++                   +                          Perception
                  Homepage/Blog                                 +                +                  +−
                                                                                                                           ++        Excellent
                     Microblog                                  +               +−                   +
                    Yellow pages                               +−               −−                   +                      +          Good

                     Forum post                                 −               +−                   +                     +−        Moderate
                      Videoclip                                 +               +−                  +−                      −           Poor
                     Publication
                                                                                                                           −−      Unperceived
                    Presentation
                                                          Because of a very low number of clicks is no rating
                   Email address                                               possible.
                       Address
                   Phone number
For two services, applicant 1 was selected by the majority of the    [3] CrossTab Marketing Services. 2010. Europäischer
test users. These two services had identified most items for this        Datenschutztag: Studie zur Online Reputation
applicant. For yasni, applicant 2 was chosen as the best                 Trustworthy Computing Group, Microsoft (Hrsg.).
applicant despite the fact that the other two services found on           http://www.microsoft.com/germany/sicherheit/datenschutzstudie.
average 10 items more for this person. Applicant 3 was given              mspx
the last place for all three person search services. For each        [4] Hellmann, R.; Griesbaum, J.; Mandl, T. 2010. Quality in
service, he is the applicant with the fewest items. There might be       Blogs: How to find the best User Generated Content. In:
a trend to rate people higher when more information is available         13th Intl Conf on Business Information Systems (BIS 2010)
online.                                                                  Berlin, 3.-5. May. Berlin et al.: Springer [LNBIP 47] pp.
                                                                         47-58.
6. RESUME                                                            [5] Zur Jacobsmühlen, T. (2010): Social Media HR Report
We presented a holistic evaluation methodology for person                2010 Stepstone.de & HRM.de (eds.).
search engines. The performance of these search services is              http://www.jacobsmuehlen.de/studie/
measured by observing the perception of test users. The test
methodology is built on a realistic scenario and use case but it     [6] Lamm, K.; Greve, W.; Mandl, T.; Womser-Hacker, C.
does not cover all the relevant quality aspects of person search         2010. The Influence of Expectation and System
engines. The important capability to resolve the ambiguity of            Performance on User Satisfaction with Retrieval Systems.
names was not dealt with. In future work, it might be promising          In: Proc EVIA 2010: The First Intl Workshop on
to develop a performance based test for this task only.                  Evaluating Information Access June 2010 National
                                                                         Institute of Informatics (NII) Tokyo, Japan, June 15-18,
The complete information seeking behaviour and its success is            http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings
also not measured with our test. In a realistic scenario, people         8/EVIA/09-EVIA2010-LammK.pdf
might access the social media networks through a person search
engine and continue their search mainly there. This issue could      [7] Madden, M.; Smith, A. 2010. Reputation Management and
be resolved by observing real behaviour.                                 Social Media: How people monitor their identity and
                                                                         search for others online. PEW Internet & American Life
In the test, the search engine 123people was the winner. It not          Project. http://pewinternet.org/Reports/2010/Reputation-
only led users to the highest number of items, but it was also           Management.aspx
subjectively judged to be the best person search engine.
However, in several aspects other systems performed better and       [8] Mandl, T. 2008. Recent Developments in the Evaluation of
were judged better. The evaluation showed that the different             Information Retrieval Systems: Moving Toward Diversity
tools are all based on the freely available data on the Web but          and Practical Applications. In: Informatica – An Intl.
that they lead to different results. The most sought items in our        Journal of Computing and Informatics vol. 32. pp. 27-38.
test were photos, entries and profiles in social and business        [9] Mandl, T.; Womser-Hacker, C. 2005. The Effect of Named
networks and personal homepages. Each of the engines                     Entities on Effectiveness in Cross-Language Information
exhibited a strength in one of these items, e.g. 123people for           Retrieval Evaluation. In: Proc 2005 ACM SAC Symposium
photos because they are shown as top results. This is also               on Applied Computing (SAC). Santa Fe, New Mexico,
confirmed by the questionnaire study among American                      USA. March 13.-17. 2005. pp. 1059-1064.
recruiters [7].                                                      [10] Robertson, S. 2008. On the history of evaluation in IR. In:
For the users who publish information about themselves and                Journal of Information Science 34(4). pp. 439-456
who become information providers by doing that the issue of          [11] Schäuble, T.; Griesbaum, J.; Mandl, T. 2009. Mehr-
information competence will become more and more important.               wertpotenziale von Online-Social-Business-Netzwerken für
Personal Online Identity Management is a growing field and                die Personalbeschaffung von Fach- und Führungskräften.
several new companies are entering the market.                            In: Informatik 2009 - Beiträge 39. Jahrestagung der
                                                                          Gesellschaft für Informatik e.V. (GI) Lübeck [LNI P-154]
7. REFERENCES                                                             pp. 2166 – 2180.
[1] Artiles, J.; Borthwick, A.; Gonzalo, J.; Sekine, S.; Amigó,
    E. 2010. WePS-3 Evaluation Campaign: Overview of the             [12] Tawileh, W.; Mandl, T.; Griesbaum, J. 2010. Evaluation of
    Web People Search Clustering and Attribute Extraction                 five web search engines in Arabic language. In: LWA–
    Tasks. In: CLEF Working Notes                                         Lernen - Wissensentdeckung – Adaptivität: Proc Work-
    http://nlp.uned.es/weps/weps-3/papers                                 shopwoche GI, Universität Kassel. Workshop Information
                                                                          Retrieval.
[2] Brenneke, R. 2010. Evaluation von Personen-                           http://www.kde.cs.uni-kassel.de/conf/lwa10/papers/ir1.pdf
    suchmaschinen und Umgang mit persönlichen Daten im
    Internet. Master Thesis, University of Hildesheim,
    Germany. International Information Management.