=Paper=
{{Paper
|id=Vol-1174/CLEF2008wn-iCLEF-Vundavalli2008
|storemode=property
|title=Mining the Behaviour of Users in a Multilingual Information Access Task
|pdfUrl=https://ceur-ws.org/Vol-1174/CLEF2008wn-iCLEF-Vundavalli2008.pdf
|volume=Vol-1174
|dblpUrl=https://dblp.org/rec/conf/clef/Srinivasarao08
}}
==Mining the Behaviour of Users in a Multilingual Information Access Task==
<pdf width="1500px">https://ceur-ws.org/Vol-1174/CLEF2008wn-iCLEF-Vundavalli2008.pdf</pdf>
<pre>
 Mining the Behaviour of users in a Multilingual
            Information Access Task
                                Srinivasarao Vundavalli
                                   SIEL, LTRC, IIIT
                                   Hyderabad, India
                          srinivasarao@research.iiit.ac.in


                                             Abstract
     This paper summarizes the participation of IIIT-H in the CLEF 2008 interactive task.
     Our goal was to mine the logs and extract conclusions about the behavior of users when
     facing a strictly multilingual information access task. We are provided the search logs
     which are generated by an online game, known-item image retrieval from Flickr. In
     this paper we describe the following tasks. We looked for the differences in the search
     behavior according to the language skills. We clustered the users based on the score of
     the user, precision of the user and the number of hints he asked for. We then studied
     the behavior of the most successful user cluster, the least successful (unsuccessful) user
     cluster and the users in between the above two. Our results show that, most of the
     users start with monolingual interface and soon they realize cross-lingual is interface is
     more useful than mono-lingual interface, and the users are more comfortable to search
     in their mother language or the languages that they know.

Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Infor-
mation Search and Retrieval

General Terms
Interactive information retrieval, cross-language information retrieval

Keywords
CLEF, iCLEF, Flickr, multilingual search, user behavior


1    Introduction
The CLEF [1] interactive track (iCLEF [2]) (the CLEF interactive track) has been devoted, since
2001, to the study of Cross-Language Retrieval from a user-centered perspective. The aim has
always been to investigate real-life cross-language problems in a realistic scenario, and to obtain
indications on how best to aid users in solving them. Multilingual information retrieval is par-
ticularly interesting from an interactive point of view, because the need for search assistance is
substantially higher than in monolingual information retrieval: normally, the user can quickly
adapt to the system’s modus operandi, but not to an unknown target language. iCLEF2006 was
concentrated on a realistic multilingual search. The main limitation of iCLEF2006 was that it was
not possible to have a large-scale user logs through the above experiment.
    iCLEF 2008 proposes a task, which consists of searching images in a naturally multilingual
database, Flickr [3], which has millions of photographs shared by people all over the planet, tagged
and described in a mixture of most languages spoken on earth. The concentration was on collecting
a large-scale user logs, and let the participants mine those logs to gain knowledge about the user
behavior when they need to search in unknown languages.
    We used the search logs provided to us to know how the users behave when facing a multilingual
information access task. Users. language skills are important in examining the cross-language
search. If they are provided the cross-lingual interface, how do their language skills influence their
search behavior?


2     Methodology
We followed the iCLEF guidelines which are briefly mentioned here.

   Task definition
   The task is known-item image retrieval based on photos from Flickr: the user is given an
image, and the goal is to find the image again from Flickr. The user does not know in advance in
which languages the image is annotated; therefore searching in multiple languages is essential to
successfully find the images. The advantage of this kind of search task is that it has clear goals for
the user, it has a clearly defined measure of success (the image is either found or not) and whilst
searching for the required image, users will invoke different (and potentially interesting) search
patterns.

   Default MLIR front-end to Flickr
   The participants are given multilingual information retrieval interface to Flickr with the fol-
lowing functionalities:
    • Multilingual search: query in one language, get search results in up to six languages (English,
      Spanish, French, Italian, Dutch and German).
    • Term-to-term translations between six languages (English, Spanish, German, French, Dutch
      and Italian) using freely available dictionaries (taken from http://xdxf.revdanica.com/down/).
    • Selection of “best” target translations according to
         – Presence in the Flickr related terms for the query, which often include target-language
           terms because they co-occur with the query terms in images annotated in multiple
           languages, something which is not unusual in the Flickr database; and
         – String similarity between the source and target words. This was included because the
           free dictionaries used did not have information about the most frequent sense/translation.
    • Enables user to pick/remove translations, and add their own translations. We did not
      provide back-translations to support this process, in order to study correlations between
      target language abilities (active, passive, none) and selection of translations.
    • Control over the game-like features of the task: flow of images, users ranking, etc.
    • Provision of search suggestions (Flickr related terms plus tags from displayed results)


    Participation
    Participants in iCLEF2008 can essentially do two tasks:
    1. Search log analysis: participants will have access to the search logs, and can freely perform
data mining studies on them. Initial examples are: looking for differences in search behavior ac-
cording to language skills, or looking for correlations between search success and search strategies,
etc.
   2. Interactive experiments: participants can recruit their own users and conduct their own
experiments with the interface. For instance, they can recruit a set of users with passive abilities
and another with active abilities in certain languages and, besides studying the search logs, they
can perform observational studies on how they search, conduct interviews, etc.

   We selected the Search log analysis and mined the behavior of the users when facing a strictly
multilingual information access task.
   We clustered the users based on the parameters score no of hints and the precision. The users
are given some weight which is calculated as s ∗ p/h + 1.
   Where
   s is the score of the user
   p is the precision of the user
   h is the no of hints taken by the user

    Based on the weight the users are clustered by using a threshold limit. We studied the behavior
of the most successful users, the least successful (unsuccessful) users and the users in between these
two.
    We also looked for the differences in the search behavior of the users according to their language
skills.


3     Results and Discussions
3.1    Behavior of User clusters
There are 307 users in total, in the search logs. For the user clusters built using the weight assigned
to each user, we studied the behavior of the most successful users, the least successful users and
the users in between the above two. The results are shown in Table 1.
    41% of the successful users look at atmost the first two pages of search results. 30% of them
did not ask for hints. The successful users reformulated the query frequently instead of going
through many result pages. 65% the unsuccessful users searched for 1 image or less and 80% of
them searched for 2 images or less. 45% of them looked at atleast three pages of search results.
34% of them asked for atleast one hint. The unsuccessful users rarely reformulated the query.
Only if they do not find the image after going through many result pages, they reformulate the
query.
    On an average all the users (in the search log) reformulated the query around 9 times per
image, looked at around 8 result pages per image and they asked around 1 hint per image.

3.2    Differences in search behavior according to language skills
Language skills are important in examining the cross-language search. Except for a small number
of users, almost all the users preferred to search in their mother language or active languages.
They did not prefer unknown languages while searching. A few users used passive languages in
cross-language interface. The users asked for more number of hints while searching in languages
other than their mother language. They reformulated the query very frequently while searching in
their mother language as opposed to searching in other languages. Many users seemed to assume
they could find everything in their interface language (mainly mother language). After searching
for sometime they came to know that this was not the case here.
    Majority of the users are native speakers of Spanish with Italian and English in the second
and third positions respectively.
    Our results indicated that users predominantly search in their native language, using other
languages (unknown/passive) relatively infrequently. The users behaved closer to native language
ability when using an active language as opposed to one that was unknown.
            Users               Average number     Average number     Average number
                                of Reformula-      of results pages   of hints taken
                                tions per image    visited per im-    per image
                                                   age
            Most successful     12.22              6.12               0.49
            users
            Users in be-        10.2               7.87               1.21
            tween successful
            and unsuccess-
            ful users
            Least success-      7.52               13.27              1.86
            ful(unsuccessful)
            Users
            All the users(in    9.23               7.72               1.27
            the search log)


Table 1: Table 1. Behavior of the most successful users, the least successful (unsuccessful) users,
the users in between the above two and the average behavior of all the users in the search log.


3.3    User Questionnaires
The search log contains the information about the questionnaire which was shown to the user after
he found or gave up the image. If the user gave up an image, one of the main reasons was there
were too many images for the user.s search. The other main reason was that the user could not
find suitable keywords for that image. If the user found an image successfully, one of the main
points he filled in the questionnaire was it was easy to find that image.
    In ‘give up’ questionnaire 66% of the users said that there were too many images for their
search, so they could not find the image. 20% of the users gave up an image because they could
not find suitable keywords for the image.
    In ‘found image’ questionnaire 75% of the users said that it was easy to find an image.

3.4    Observations
    • Users feel more confident when searching in the languages they know.
    • 18% of the users have the precision value ‘1’ and 18% of the users have precision value ‘0’.

    • On an average the users found more images using monolingual interface.

    • Most of the users start with monolingual interface and soon they realize cross-lingual is
      interface is more useful than mono-lingual interface.


4     Conclusions
In this paper, we presented the participation of IIIT-H in the interactive CLEF 2008 task. Our
goal was to mine the logs and extract conclusions about the behavior of users when facing a strictly
multilingual information access task. We are provided the search logs which are generated by an
online game, known-item image retrieval from Flickr.
    Our results show that, most of the users start with monolingual interface and soon they re-
alize cross-lingual is interface is more useful than mono-lingual interface, and the users are more
comfortable to search in their mother language or the languages that they know.
5      References
    1. http://clef.iei.pi.cnr.it/

    2. http://nlp.uned.es/iCLEF/index.htm

    3. http://www.flickr.com

    4. Artiles, J., Gonzalo, J., Lopez-Ostenero, F., Peinado, V.: Are users willing to search cross-
       language? an experiment with the flickr image sharing repository. [In This volume] (2006)

    5. Jennifer Marlow, Paul Clough, Juan CigarrRecuero, Javier Artiles: Exploring the Effects of
       Language Skills on Multilingual Web Search. ECIR 2008: 126-137

    6. Paul Clough, Julio Gonzalo, Jussi Karlgren, Emma Barker, Javier Artiles, Victor Peinado:
       Large-Scale Interactive Evaluation of Multilingual Information Access Systems . the iCLEF
       Flickr Challenge. iCLEF 2008 workshop paper.

    7. Karlgren, J., Gonzalo, J. Clough, P. iCLEF 2006 Overview: Searching the Flickr WWW
       Photo-Sharing Repository. In CLEF 2006 Proceedings. 2007.

</pre>