A Study of Users’ Image Seeking Behaviour in Flickling Evgenia Vassilakaki, Frances Johnson, R.J. Hartley, David Randall Dept. Information & Communications Manchester Metropolitan University evgenia.vassilakaki@student.mmu.ac.uk, f.johnson/r.j.hartley/d.randall@mmu.ac.uk Abstract This study aims to explore users’ image seeking behaviour when searching for a known, non-annotated image in Flickling provided by iCLEF2008 track. The task assigned to users was to search for the three first images given after first login. Users did not know in advance in which of the six languages (English, German, Dutch, Spanish, French, Italian) the images were described, forcing them to search across languages. The main focus of our study was threefold: a) to identify the reasons that determined users’ choice over a specific interface, b) to examine whether users were thinking about languages when searching for images and to what extent and c) to examine if used, how helpful the translations proved to be for finding the images. This study used four different, both quantitative and qualitative methods (question- naires, retrospective thinking aloud, observation and interviews) to meet its research questions. Results show that two out of ten users were using only the monolingual in- terface because they did not feel confident with languages and the rest were switching between interfaces for a variety of reasons in which languages played a small part. Only four out of ten users were actually thinking about languages when searching for the images, while the rest were more preoccupied with finding the images and completing the task successfully. As a consequence, only four users paid attention to translations and only judged the translations in languages known to them. Overall, the transla- tions were not considered to be helpful due to their inconsistency in coverage and their tendency to lead to irrelevant results. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.3 Information Search and Retrieval; H.2.3 [Database Management]: Languages—Query Languages General Terms Languages, Human Factors, Experimentation Keywords Multilingual Information Retrieval, User Behaviour, User Image Seeking Behaviour, Flickr, Flick- ling, iCLEF 1 Introduction Cross Language Evaluation Forum (CLEF) is an annual evaluation campaign that aims to pro- mote the development of monolingual and multilingual information retrieval systems for European Languages. The main research interest of CLEF has gradually moved over the years, from only textual document retrieval to question answering (QA) and Geographic information retrieval [2]. In this context, a pilot interactive track, known as interactive CLEF (iCLEF), was introduced in 2001 focusing initially on document selection questions. Since then, iCLEF has been included as a regular event in CLEF. The iCLEF tracks in CLEF2002 and 2003 focused on examining support mechanisms for query formulation and refinement, as well as user-assisted translations experiments. In the 2004 track, the participants, all using a common evaluation design, tried to assess the ability of their own interactive systems to find specific answers to specific questions in a language other than the one of the initial query. In addition, the 2005 track studied the problem of QA from a user-inclusive perspective [8]. Finally in 2006, the iCLEF track moved to Flickr, a photo-sharing multilingual database in order to promote interactive experimentation on multilingual search tasks and study users’ behaviour [6]. In particular, three studies were submitted: a) UNED [1] examined the attitude of users towards cross-language searching when the search system allows three search modes (no translation, automatic translation, assisted translation), b) U. Sheffield (and IBM) [3] used bilingual Arabic-English students to test the Arabic interface of Flickr that they have developed and finally c) Swedish Institute of Computer Science (SICS) [7] focused on evaluating information access based on user satisfaction and user confidence. In this context, the 2008 iCLEF track focuses both on acquiring a large set of search session logs for the participants to mine and on allowing participants to perform their own interactive experiments with the Flickling interface provided and adopting the task predefined by the Orga- nizers [4]. The aim of this study is to explore users’ image seeking behaviour when searching and retrieving known, non-annotated images across languages in Flickling. In particular, the research questions that will be addressed in this paper are: • Identify the reasons that determined our users’ choice of a specific interface (monolin- gual/multilingual). • Examine if and/or to what extent users were thinking about languages when searching and retrieving images. • Examine if and/or to what extent users were paying attention to translations when searching and retrieving images. The remainder of this paper is structured as follows: a description of the Flickling interface, of our user sample, of the task given, of the four different methods that we used in assembling the data, of the way that the study was carried out and the data processed are illustrated in section 2. We provide an analysis of our findings and an extended discussion of them in sections 3 and 4 respectively. Finally, we conclude summarizing the different image seeking behaviours that our users developed while using Flickling in section 5. 2 Method In this section, further details about the test object of the study, the users, the task given, the specific methods used such as retrospective thinking aloud, observation and interviews, and the data processing are presented. 2.1 Test Object The test object of this study was the Flickling interface, a basic cross-language search front-end to the well-known web application Flickr. Flickr was adopted as the target collection by the iCLEF organizers mainly for two reasons [4]: a) it is a multilingual database enabling the users to tag and comment in different languages the uploaded images and b) it can provide the baseline for a series of both realistic and challenging multilingual search tasks. The Flickling interface is a multilingual information retrieval interface that encompasses the following functionalities [4]: a) multilingual search across six languages (English, Spanish, German, French, Dutch, Italian), b) term-to-term translations between six languages using freely available dictionaries, c) select of best target translations, d) pick/remove translations and adding of new ones by the users, f) provision of search suggestions and g) control over the game-like features of the task. The Flickling interface was intended for two user groups: a) CLEF participants, researchers who expressed interest in conducting experiments based on the provided multilingual interface and b) Flickr/Web users, ordinary users of Flickr application that would like to participate in this game offered by CLEF organizers. Our main reason of interest in participating in the iCLEF2008 Flickr challenge was to inves- tigate the behaviour of users when asked to search and retrieve a known, non-annotated image across languages. 2.2 Users The study was carried out with a sample of 10 users, three male and seven female, ranging in age from 20 to 40. They were all related in one way or another to Manchester Metropolitan University (MMU). In particular, from the sample of ten users, seven were research postgraduate students, one taught postgraduate student, one lecturer and one MMU staff member. In addition, four of the users were English native speakers, two Greek, one German, one Spanish, one Arabic and one Luganda (see Table 1). Moreover, one of the users was monolingual, four stated knowledge of a language other than their native and five were multilingual. Native Language No. Users Arabic 1 English 4 German 1 Greek 2 Luganda 1 Spanish 1 Table 1: Users’ Native Language In addition, the users were asked to state their level of comprehension for the languages used in Flickling but also any other additional language. In particular, from the six non English native speakers, two stated an Excellent knowledge of English, three Very Good knowledge and one Basic knowledge. In regards of German, three of the nine non-German native speakers stated a Basic knowledge of German. Four out of ten users stated knowledge of French, three of whom Basic and one Good. Concerning Italian two out of ten users stated knowledge of Italian language, Basic and Good respectively. Two out of ten stated a Basic knowledge of Dutch and finally, three out of nine non Spanish native speakers stated a Basic knowledge of Spanish (see Table 2). All ten users have searched in the past for an image on the web. In particular, four stated that they “rarely” have, three “sometimes”, two “very often” and one “often”. In addition, nine out of ten stated that they have searched for an image on the web in a language other than their native and only one had not. In particular, the nine users identified the following reasons for having done so: “university research, searching for holiday info”, “to increase the numbers of results because I could not find any relevant image by using keywords in my native language”, “because there are only few web resources that I am interested in the Greek language”, “for my course arguments, assignments”, “looking for shoes and clothing in Portuguese and French”, “because the images I wanted were provided in English [Luganda native speaker]”, “I was looking for an image of a region Language Basic Good Very Good Excellent English 1 0 3 2 German 3 0 0 0 French 3 1 0 0 Italian 1 1 0 0 Dutch 2 0 0 0 Spanish 3 0 0 0 Table 2: Users’ Knowledge of Languages of Poland” and last “because language was not an issue”. The user who who had not searched for images on the web in other languages justified it as “not necessary”. In addition to the users’ previous knowledge and experience with Flickr, only nine out of ten users answered this question. Three of whom stated that “Yes” they have used Flickr in the past. When asked the reason why they have used it, one stated for searching images, one “just to see what it is” and the third user stated for uploading, sharing and searching images. The other six users gave the following justification for not having used Flickr before: “because I was not aware of it”, “not interested, security issues”, “never needed to”, “I don’t Know what Flickr is”. The participants were evenly assigned to the conditions in the experiment with no difference in gender, age, and prior knowledge of Flickling interface. 2.3 Task Our users were asked to find a given image which it was not annotated from Flickr using the Flicking interface. The users did not know in advance in which of the six languages (English, German, Dutch, Spanish, French, Italian) the image was described enforcing them to use both monolingual and multilingual features to find the given image. Each of our 10 users was asked to search and retrieve the first three given images after login by using all the features and the help instructions of the Flickling interface. The images presented to users were not controlled but given randomly from a set of 100 stored in the Flickling database. 2.4 Retrospective Thinking Aloud Retrospective thinking aloud is a widely used method for usability testing of software and inter- faces. Its basic principle is to ask from potential users to complete a certain task with the testing object in question and to describe their thoughts and actions afterwards on the basis of a video recording their task performance [5]. This method focuses on peoples’ cognitive processes after having completed a specific task. It is a method that enables the users and not the experts to point out the problems concerning the test object in the usability test. The use of retrospective thinking aloud method to carry out our study, like any other method, entails both drawbacks and benefits. In particular, it allows users to complete the task in their own way and pace, spending as much time as they wish and are therefore not likely to perform better or worse than usual. In addition, it enables the exact recording of the time spent for each part of the task, as it reflects the real time that a user has spent on completing the task. Moreover, it provides the possibility to users to reflect on their own actions while using the test object and highlight particular causes or milestones that they have personally encountered. In addition, retrospective thinking aloud is considered to be an appealing method for conducting tests across languages as it is less difficult for users to disclose their thoughts in a foreign language after completing the task. Apart from benefits, the use of retrospective thinking aloud has also some drawbacks the most important of which are: a) the duration of every user session varies according to the time that user will spend on completing the task plus the time that user will need to describe the video in retrospect and b) there is also risk that users may forget what they were thinking during specific phases of the task or withhold others for reasons of social desirability. In this context, we have used the Camtasia Studio (v.5.1), a premiere screen recorder, in order to capture the users’ search sessions in individual videos and a digital recorder for the retrospective thinking aloud and for the individual interviews that followed. 2.5 Observation In addition to retrospective thinking aloud, observation was also adopted. The observation method was used to form specific questions regarding preselected research areas of the test object (transla- tions, layout of the interface, etc) in an attempt to shed light on specific behaviours of the users on specific occasions. A form was created to assist the work of the facilitator at focusing on specific areas of interest and at the same time reflecting on users’ behaviour. This form was categorized according to the areas that they were to be tested. Every category had a set of predefined ques- tions/ remarks that the facilitator had to fill in according to user’s behaviour each time and write additional comments for the questions to be asked to users during individual interviews. The facilitator, one of the organizers of our study, had to fill in a set of three forms, one for each image of the task, for each user. These forms were coded according to the number assigned to each user and the order of the images (eg. 01/01, 01/02, 01/03). 2.6 Interviews The last part of the study consisted of small scale individual interviews with every user after the completion of the retrospective thinking aloud. The interviews lasted no more than 10 minutes for every user. The questions asked varied according to user’s answers to the questionnaire, search session, retrospective thinking aloud and the notes gathered throughout the experiment. The main goal of these questions was to clarify specific actions of the user’s image seeking behaviour during the search session and expressions that the user used to describe what he/she was doing. 2.7 Experimental Procedure The study was carried out in 10 individual sessions, which they were all held in the same lab and each lasted from one to two hours approximately. During each session, users were given general instructions about the way that the study will be carried out and about the task that they had to complete. These instructions were read to each user explaining that there were no more instructions to be given throughout the session and the facilitator was there to observe only. After that, users were asked to fill in the questionnaire on personal details and prior experience. Users were then instructed to register, login and start completing the task while screen recording software was taping the computer screen. Having done that, users were asked to watch the recorded session that was played back to them and describe what they were doing and what they were thinking in retrospect. Finally, a no more than 10 minutes semi-structured interview was carried out with each user. 2.8 Processing of the data Once the 10 sessions were completed, transcripts were created based on users’ retrospective think- ing aloud and interviews, as well as analysis of the questionnaires and comments on video record- ings of the users’ search sessions. The analysis of the data gathered focused on the way that users were interacting and using the Flickling interface and its features to complete the given task. The transcripts of the retrospective thinking aloud and interviews, as well as the video sessions and observation notes were examined specifically to identify the users decision to choose a specific interface (monolingual/ multilingual) and identify the extent of the role played by languages and translations in forming their image seeking strategy. These parts were then grouped, when possible, to enable better presentation and discussion of the findings. In addition, users also occasionally experienced technology problems, such as trouble with the function of the interface (problem messages coming up, interrupting the search and thought process of the users), the flickering of the cursor due to the software used to record users’ search sessions. These problems were excluded from the study. 3 Findings The analysis of data gathered by retrospective thinking aloud, video recordings and interviews focused on three distinct areas, such as: a) to identify the reasons that determined users’ choice over a specific interface, b) if and/or to what extent users were thinking about languages when searching for images and c) if and/or to what extent they were paying attention to translations and how helpful they proved to be for finding the images. The findings will be presented according to the three research questions in the subsections 3.1, 3.2 and 3.3 consecutively. 3.1 Reasons The Flickling was providing to users with two different interfaces, monolingual and multilingual in order to cope with the problem of searching across languages for the target three images. This study’s first research question aims at identifying the reasons that determined each time users’ choice of a specific interface (monolingual/ multilingual). Out of the ten users, two used only the monolingual interface and the rest switched between interfaces. The reasons our ten users gave for their behaviour in the thinking aloud process and interviews are stated below. 1. Only Monolingual Interface Two out of ten users did not use at all the multilingual interface, even though they were given images to search in a language unknown to them. The first user, an English native speaker with basic knowledge in French when informed by the system that the image was annotated in French, stated while still on the monolingual interface: “I kindly instantly gave up, because I am not good in French...I realized that I am never going to find it...So, I decided to give up”. When asked why the subject didn’t use the multilingual interface, the subject answered: “Because I did not trust my abilities with other languages, to be able to put the decent search words in...Because I did not know the keywords to search in other languages”. As a final remark, the subject added: “I was not confident with the languages”. The second user, a Luganda native speaker with no knowledge of French, stated: “I went for the hint and it said that the image is described in French. So, I felt there is no need to...Well, I thought, I do not speak French, I can’t understand that”. When asked why the subject did not use the multilingual interface, answered: “If I knew how to use another language, then I could use the multilingual and access the same image in another language. But because my first language is not accessible in there [Flickling], then I thought I should keep to monolingual to where I know what I am looking for”. When asked how the subject was planning to cope with the problem of searching a French annotated image on monolingual interface by using English keywords, the subject stated: “I thought that the image was not available and all images should be described in English [as well]. So, I thought that it was inaccessible...that I could not get it”. 2. Switching between Monolingual & Multilingual Interface The remaining eight users switched between monolingual and multilingual interfaces in order to complete the given task. A variety of reasons to justify these actions were reported by the users during the retrospective thinking aloud process and interviews. In particular, users identified the following reasons why: “In order to increase or decrease the number of results, depending on the results that I had on the beginning of my search”, “I have chosen to use the multilingual interface because I assumed that it would give me the highest possible number of relevant results in relation to my query”, “I am trying to find the right combination of keywords”, “Because of the setting of the image...I believe that this system, if you know where the picture was from, or if you know the place then you can like recognize the language in which you can type in”, “I was looking to isolate words and translate them”, “Simple because I wasn’t getting any of the results that I wanted”, “I tried to increase my chances of getting the image...I am widening my possibilities”, “I am just trying out the system”, “So, it was not there [monolingual English], I guess it was in other language” and lastly: “For me the problem was more kind of how to find where the image was from”. Also, hints played a significant part in users’ choice over an interface. As stated by the users: “I switched to monolingual because the hint told me that the image was described in English”, “ok, I have learned about the hints, so I gave up and asked for a hint...Off I went to multilingual to ask to translate...and then I went back to monolingual and searched for it” and “I went to ask for a hint on language just in case because that seamed to save me lots of time”. Two users, both English native speakers, stayed on multilingual interface though after taking the hint, they both knew that their image was described in English. When asked why they haven’t switched to monolingual, they gave the following explanations for their choice consecutively: “Because I did not think that would make any difference, because I was assuming that it is in English as well” and “Well, because I was there. I did not realize that...I thought, to be honest, I thought, it’s not going to make that much difference really. It is set to do a search in English, so if it does search in other languages that does not make any difference...is not going to increase my chances in monolingual English...maybe, it would but I don’t know that”. There were also some cases that although users were seemingly using a specific interface (monolingual or multilingual) they stated during retrospective thinking aloud process and confirmed afterward with the interviews that: “I did not really, even think about it. I was just...I was at that point...I was thinking about getting this title”, “I was not paying attention to the fact that it was multilingual. Maybe, I forgot about that and left it as it was” and “I was so focused on trying to see how to describe the image that I was not paying attention to the interface”. 3.2 Role of Languages The second research question that our study was set out to explore was if and/or to what extent languages were forming the image seeking behaviour of our users. As already stated, our users did not know in advance in which of the six languages (English, German, Dutch, Spanish, French, Italian) supported by the system the given images were described. The task was set in that way so users had to include the element of different languages. As a consequence, a set of different behaviours were identified which can be grouped in the following, again through the analysis of the data gathered from retrospective thinking aloud and interviews: 1. Two users out of ten used only the monolingual interface searching in English, although they knew that the images may not be described in English (see subsection 3.1.1). In particular, the English native speaker stated: “My French are not good, so I decided to give up because I could not find the appropriate translations for my keywords, so I was never to find it”. The subject also added: “I was not confident with the languages”. The Luganda native speaker admitted that: “I would not search an image in any other language; I would only search images in English. If I would search images from my home country, from my background, then I would use my first language. But any other image, I would search in English”. When asked if the subject was thinking about languages while searching, the user answered: “No. It did not...because when I am searching for images on the Internet, I normally get them in English because I imagine that...I guess it’s a little bit of arrogance, I speak English and I imagine that images...That if you put them in Internet, they should have English tags”. 2. The other eight users who were switching between monolingual and multilingual interfaces, can be divided in two groups: a) those who were thinking about languages and b) those for whom languages were not a variable when performing the given task. In particular, four of them stated clearly during retrospective thinking aloud that: “Now, I made the relationship of country, Florida...I write them [keywords] in English”, “I had the feeling that the building which I recognized, was described in German”, “It was not within my results, so, I guessed that it is in other language”, “Because by looking at the tortoise had written on it...it was written in English. So, I assumed that it would be in English...And I was also thinking at this time, I wonder if it is English or not...because the child got a little blue and red hat and I was thinking, maybe the child is French...Yes, I changed into multilingual because I think that maybe it is French, with the outside possibilities that it might be Italian” and lastly “Well, that’s probable a bit Anglo-centrism. You know, well, it is a picture in England”. On the other hand, the remaining four users when asked if they were thinking about lan- guages during the task, they said that: “To be honest, I was not thinking about languages...I did not consider it a variable that influences my results”, “I did not bother about languages...I did not really think about them...I did not focus on languages while performing my searches. Maybe, because I am not used to, is not widely used or maybe I am not using languages when retrieving information on the web”, “I was not taking languages under consideration when searching for the images” and “For me it was not a question of language...In my mind language was a very small factor in there [Flickling]. It did not really play any important role”. 3.3 Role of Translations All users were given a minimum one image out of three which was described in a language unknown to them. The multilingual interface was provided to cope with this problem and help retrieve the image. The third and last research question of this study was to examine the use of translations and the influence of translations on the users’ information seeking behaviour. We are obliged at this point to exclude the two users who used only the monolingual interface and the four users who used the multilingual interface but with no thought to the translations. The remaining four users tried both the monolingual and multilingual interfaces driven by the need to identify the language of the image and the appropriate keywords to retrieve the given images. In particular the four users, when asked if they were paying attention to the translations, stated: “Yes, but it did not translate anything. I thought like, it did not give me anything, because it did not translate anything”, “Yes, I did use them”, “Yes, at this point I am trying to figure out how this translation thing works” and “Yes, I was paying attention to the translations”. In addition, when users asked if they could judge the translations that were given to them, users answered: “Overall, I had the feeling that the translations of the system were not that good...I switched to monolingual because the translations were not doing anything”, “I would trust the system to give me the right translations...I would have to for languages unknown to me”, “Because I went for the languages that I had a vague idea about and it did not tell me something that I did not really know” and last “I was not satisfied with the German translations because I can understand German...it’s not the right word in German for a man. So, it should have been something else...in Dutch I don’t know what the translation is, so, I had to accept it, whatever it is...Yes, I was satisfied [Dutch translations] because the computer knows the Dutch language better than I do... maybe that’s not the best translation, so, I just had to accept it. There was anything that I could do about it really”. Finally, when the users were asked if the translations were helpful in terms of actually con- tributing to the retrieval of the image, users stated: “Ok, I have got the translations but they are not doing anything to me...at the end, I totally disregard the translations”, “I think that I stop searching for translations, when I stop having much confidence that it was bringing me the right translations...So, I got used to asking for a hint...I thought I am gonna ask for another hint now and then it would tell me in what it was translated it, not necessarily what translation [the system] has given me”, “The words that I was trying to isolate like particular words like London, the different translations there were not coming up...and what it was saying, like gigante in Italian for giant, it told something that I already knew. So it was not isolating the words in the way that I wanted it to. It was just telling me the adjectives were, which I did not really need” and finally “At the end I was not paying attention to the translations, I was purely interested in finding the image as quickly as possible because once more I did not think the translations would necessarily help me”. 4 Discussion The evaluation of CLIR effectiveness often does not involve the end user. On the one hand, it may be assumed that since the translation is automated the user has no role to play or possibly that the user has no interest in the translation, providing the system is effective. On the other hand, the non trivial challenges posed in the effort in designing realistic task scenarios, recruiting participants, analyzing large amounts of data to obtain user assessments or to observe search behaviour can be prohibitive. However, we take the view of Petrelli [9] that effective system design must be in accordance with the end users’ needs and to best assist users involved in cross- language information retrieval we need to understand their behaviours and the search problems they face. Petrelli’s study of users involved in CLIR presented a number of interesting findings. In particular, the users preferred the interface which hid the translation and that language knowledge and sight of the translation affected search behaviour. The present study shows that users form image seeking behaviours according to a series of reasons from which thinking about languages is the least important one. In addition, users were so focused on completing the task that were not paying attention to interfaces, languages and translations though these were factors that affected and predefined the results. In particular, regarding the first research question, the reasons why two users used only the monolingual, we conclude they were not feeling confident of their language skills and they were not used to searching for images in languages other than English. These users would use the multilingual interface only if they could speak the language in which they were searching and that one would be other than English. The other eight users described a variety of reasons from which languages played a small part. Users’ need to minimize or maximize the number of results was the driving forces of switching between interfaces. Moreover, they were more concerned with identifying the images in terms of the keywords and in the right combination. In the whole process, languages were playing an insignificant role. Only four users were trying to identify the language of the images from its context and use it to their benefit. Some were treating the multilingual interface as a translator, trying to isolate specific words, translate them on the multilingual interface and use the translations to retrieve the images on the monolingual interface. Another user stated that: “I think I just saw it as a translation tool and not as an integrated translation thing that already was retrieving images. I did not really use it in this way because in my mind, it was only translating my keywords”. The Hint feature was also a factor forming users’ image seeking behaviour to a large extent. Users became accustomed to using the hints after a few minutes of unsuccessfully searching for the image. On the whole, it would appear that users were so focused on completing the task, “obsessed” (as a user stated) of finding the images that even from the beginning of the task, they were not thinking really which interface they are going to use and for what reason. Even users who were concerned about languages, at the end of the task, also admitted that they were not paying much attention to the interfaces because they thought that it was not making any difference. Going back to Petrelli’s findings, our users did not express any preference in hiding or not the translations, though they were not satisfied with the position of the translations. They stated that the translations were not obvious because they were on the right side and not on the left, close to the search box. They commented even the way that the translations were presented. It was not clear to them how the translations were working or how they could write their query and get translations. They also stated that it was taking some time to figure out how the multilingual interface was built and how they could use the translations to their benefit. Since it was not clear that the system was retrieving both their search terms and the translations, users were typing both their search terms and the translations provided in the search box and running the search again. In short, they used it as a translator tool and not as an integrated feature of the multilingual interface. However, at no point, did the users state that the presence of translations was distracting or that they would prefer them to be hidden. On the contrary, they said that they used them in order to retrieve relevant results and disregarded them when they were not getting the results that they were expecting or hoping for. Both Petrelli’s and our study aimed to explore user information seeking behaviour in cross- language information retrieval but from a different perspective and for different reasons. As a consequence, the methodologies adopted in both studies differ. Petrelli used mainly observation and interviews whereas we used retrospective thinking aloud and interviews as the main methods to record user behaviour, supplemented with questionnaires and observation to further verify our findings. Our aim was to derive findings entirely on users’ thoughts, comments and search behaviour rather than depending on the facilitator’s observations, interpretations and questions asked. Although, in retrospective thinking aloud there was some risk of users not remembering what exactly they were thinking, our users were always able to recall what caused them search for an image in such a way. Not only did they justify their actions, they also revealed details about their thought process and the reasons that motivated them or made them feel distracted, uncomfortable or even bored while using Flickling. Our study based on retrospective thinking aloud revealed a complex picture of the influence (or not) of language skills and confidence therein and of perceptions of the role of the multilingual interface, language and translations in image retrieval. Most revealing and of potential interest to future study of users of CLIR is the finding that less than half of our users appeared to consider identification of the language to be essential in retrieving the image. The majority either lacked confidence in using different languages or were so focused on finding the given images and completing the task that were not thinking at all about languages. Indicative of this was the comment “...completing the task successfully. What was success for me? That you find the image. In any way I possible could. I was not focusing on translations...I thought my task is to find that image and I will do whatever I could to find it”. Of those for which languages played a significant role in the process of identifying keywords to search for the images, the translations were judged to be poor as either the translations were not coming up, were not corresponding to the users’ keywords or were judged to be resulting in the retrieval of irrelevant results. As a consequence, users were losing interest and trust in translations resulting in no usage of them or not paying attention to them. One of the initial aims of this study was to try and look in greater detail at how working with the translations affected search behaviour with regards to the actual search terms entered by users. Unfortunately, this study could not reach a conclusion because only four out of ten users used the translations and this was in a way not anticipated. The reasons why this happened are: • the Experiment Design: the fact that users were given a specific task to complete, in a context of a game, was so overwhelming that they were not thinking about anything else other than carrying out the task. The task to find clues for unambiguous query terms from the images was sufficiently challenging in the Flickling interface that perhaps our users chose to ignore the language and translations. Our users were even feeling that they were failing the task or disappointing the organizers in some way when they had to give up. As a result, completing the task was regarded as such a challenge that they were not paying attention to anything else. • the Flickling Interface: the way that the Flickling interface was developed, providing the two different interfaces, monolingual and multilingual, to search across languages created some issues in studying users’ information seeking behaviour. The option of the two interfaces created a confusion to most of the users because they are accustomed to using only one search box on one interface. In addition, users had to interpret what monolingual and multilingual meant, something that again users are not accustomed in doing when using a search engine. Others driven by this fact were not paying attention to interfaces, they were just using what was in front of them without making any informed choice. The users who used the multilingual interface did so for various reasons but, as stated, languages played an insignificant role. The four users who used the translations to enhance their chances in finding the target image rather than viewing them as an integrated feature of the multilingual interface could not regarded to be forming their search strategy based on the translations. Although, it would have been interesting to look at how translations influenced or not users’ information seeking behaviour because of the reasons stated above we were not able to collect data as we might have hoped. 5 Conclusion This study aimed at investigating the users’ image seeking behaviour when retrieving a known, non-annotated image in Flickling. The task assigned to ten users was to retrieve the first three images given after first login. The images could have been described in any of the six languages (English, German, Dutch, French, Spanish, Italian) supported by the Flickling. As a consequence, users had to use monolingual and multilingual interface to search across languages and retrieve the images. This study used a combination of four different methods: a) questionnaires, b) retrospective thinking aloud, c) observation and d) interviews. These both quantitative and qualitative methods contributed to meeting the research questions of our study. In particular, we identified the reasons why two of our users were choosing to search only on the monolingual interface and the eight switching between interfaces. We demonstrated that only four users were thinking about languages when trying to retrieve the given images while the rest of our users were more preoccupied with finding the images and completing “successfully” the task. Consequently, we showed that only these four users were paying attention to translations provided by the system. These stated that translations were not helpful or they were not making much difference in finding the given images since the results were irrelevant to what they were looking for. This small study has also shown that if we are to ask whether a CLIR system should display query translations or not, then the answer is no. Our users were either not interested in the translations or found them to be poor. However taking the findings to such conclusion would be foolhardy given the complexity of the activity highlighted in the users’ comments that they were so engaged in finding the image that language or translations played little or no part. Rather than reaching firm conclusions, this small study has suggested the need for more research into users’ search behaviour with translations (and in image retrieval) if we are to design CLIR systems which will not place additional or unnecessary cognitive demands on the user and will support effective search behaviour and performance. References [1] J. Artiles, J. Gonzalo, F. López-Ostenero, and V. Peinado. Are users willing to search cross-language? an experiment with the flickr image sharing repository. In CLEF, pages 195–204, 2006. http://www.clef-campaign.org/2006/working_notes/workingnotes2006/ artilesCLEF2006.pdf. [2] M. Braschler, G. Di Nunzio, N. Ferro, J. Gonzalo, C. Peters, and M. Sanderson. From clef to trebleclef: promoting technology transfer for multilingual information retrieval. In Second DELOS Conference on Digital Libraries, 5-7 December 2007, Tirrenia, Pisa (Italy), pages 1–7, 2007. http://www.trebleclef.eu/getfile.php?id=38. [3] P. Clough, A. Al-Maskari, and K. Darwish. Providing multilingual access to flickr for ara- bic users. In CLEF, pages 205–216, 2006. http://www.clef-campaign.org/2006/working_ notes/workingnotes2006/cloughCLEF2006.pdf. [4] P. Clough, J. Gonzalo, J. Karlgren, E. Barker, J. Artiles, and V. Peinado. Large-scale interac- tive evaluation of multilingual access systems: the iclef flickr challenge. In Workshop on Novel Methodologies for Evaluation in Information Retrieval, 30 March 2008, Glasgow, Scotland, 2008. http://nlp.uned.es/iCLEF/ECIR-evaluation-workshop.pdf. [5] M. Haak, M. de Jong, and P. J. Schellens. Retrospective vs. concurrent think-aloud protocols: testing the usability of an online library catalogue. Behaviour & Information Technology, 22(5):339–351, 2003. http://www.students.cs.uu.nl/people/jpwester/WO1/Artikelen/ p339.pdf. [6] J. Karlgren, J. Gonzalo, and P. Clough. iclef 2006 overview: Searching the flickr www photo-sharing repository. In CLEF, pages 186–194, 2006. http://eprints.sics.se/321/ 01/iclef-2006-overview-v4.pdf. [7] Fredrik Olsson and Jussi Karlgren. Trusting the results in crosslingual keyword-based im- age retrieval. In Proceedings of Evaluation of Multilingual and Multi-modal Information Retrieval, 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain, September 20-22, 2006, Revised Selected Papers, page 3, Alicante, Spain, 2007. http://eprints.sics.se/319/01/iclef-2006-SICS.pdf. [8] C. Peters. Comparative evaluation of cross-language information retrieval systems. In From Integrated Publication and Information Systems to Virtual Information and Knowledge En- vironments, pages 152–161, 2005. http://dienst.isti.cnr.it/Dienst/Repository/2.0/ Body/ercim.cnr.isti/2004-TR-43/pdf?tiposearch=cnr&langver=. [9] Daniela Petrelli, Micheline Beaulieu, Mark Sanderson, George Demetriou, Patrick Herring, and Preben Hansen. Observing users, designing clarity: A case study on the user-centered design of a cross-language information retrieval system. JASIST, 55(10):923–934, 2004. http: //dblp.uni-trier.de/db/journals/jasis/jasis55.html#PetrelliBSDHH04.