I Need More Time!: The Influence of Native Language
                   on Search Behavior and Experience
                             Pengyi Zhang1, Chang Liu1, Preben Hansen2

                         1. Department of Information Management, Peking University
                            5 Yiheyuan Rd, Haidian District, Beijing 100871, China
                                        {pengyi, imliuc}@pku.edu.cn

                   2. Department of Computer and Systems Sciences, Stockholm University
                              Borgarfjordsgatan 12, SE-164 07 Kista, Sweden
                                            preben@dsv.su.se


           Abstract. This paper describes our participation in the interactive track (ChiSwe Group) of
           the Social Book Search lab organized at CLEF 2016. This is our first participation in CLEF
           SBS interactive track. A total of 112 participants (29 native Chinese speakers 27 native
           English speakers, and 56 native speakers of other languages) participated in the SBS
           Interactive Track. We found that native Chinese speakers devoted more search efforts in
           searching, i.e. spent longest time to complete search tasks, selected the most number of
           books, have switched most between search, browse and review model of the search system,
           viewed more items and more metadata, and annotated more than native English speakers
           and native speakers of other languages. However, when evaluating the search engagements,
           Chinese speakers had the highest scores while English speakers had the lowest scores.


1    Introduction

This paper describes our participation in the interactive track (ChiSwe Group) of the Social
Book Search (SBS) lab organized at CLEF 20161. This is our first participation in CLEF
SBS interactive track. Our group conducted all the experiments in China and the
participants’ native language in our group is Chinese. It has been of recent interest to
explore the search behavior of multi-lingual users [1]. We noticed using the data set from
CLEF SBS 2015, Skov and Bogers examined the differences in search behavior between
native speakers and non-native speakers of English [2]. But surprisingly, their results
showed few significant differences in search behaviors between native and non-native
speakers. Other researchers had found that searching in a foreign language requires
significantly longer time, more query reformulations, and more websites viewed [3,4]. For
example, Hansen and Karlgren concluded that their hypotheses included results for
assessment in a foreign, albeit near-native competence, language would be more
time-consuming and taxing than those for the first language. Assessing texts in English (27
seconds average assessment time per document) took longer than for Swedish (20 seconds)
(p > 0:95; Mann Whitney U) [5]. During the experiments in China, our participants have

1 http://social-book-search.humanities.uva.nl/#/interactive
commented that they have difficulties in using the English interface of the search systems
and understanding the requirements of search tasks. Since this is the first time that a
significant number of Chinese searchers participated in this experiment and English is the
language used for the search system, we think it might be interesting to explore what role
the native language plays in interactive social book search. Therefore, we divided all the
participants in three groups: Chinese native speakers, English native speakers and native
speakers of other languages.

Our main research question is: What are the influences of native languages on search
behaviors and search experience? Specifically, we will examine four specific RQs:
     RQ1. What are the influences of native languages on the task completion time?
     RQ2. What are the influences of native languages on the number of search
     interactions, e.g. book search, browsing and bookbag behavior?
     RQ3. What are the influences of native languages on the perceived usefulness of
     different search tools?
     RQ4. What are the influences of native languages on users’ engagements during
     search?


2   Methodology

A total of 112 participants participated in this year’s SBS interactive track experiment. The
search log data and questionnaire data include participants’ demographic information,
search activity log, and answers to the questionnaires regarding search experience and
engagements. We divided the participants into three groups according to their mother
tongues, and the distributions of the number of participants by their native languages are as
follows: 29 native Chinese speakers 27 native English speakers, and 56 native speakers of
other languages. In order to answer our research questions, we analyzed participants’
answers to the questionnaires and the activity log data. In each experiment, the participants
were required to perform at least one search task (task 1), and they also had the option to
perform a second task (task 2). For task 1, two types of search tasks were designed in SBS
interactive track to investigate the impact of task types on the participants’ search behaviors
in social book search: focused and open tasks. For search behaviors and usefulness
judgments on different search tools, we first compare the differences by three groups of
searchers for all search tasks; and then in order to compare the differences between two
types of tasks, we only selected task 1 for analysis, and compare the differences by three
groups of searchers under each type of tasks.

3   Results
    3.1    Search time
We first compared the task completion time among three groups of participants. The tests of
normality of task completion time in task 1 and task 2 showed neither of them was normal
distributed, therefore, we conducted Kruskal-Wallis Tests on task completion time. Results
show that there were significant differences among participants with different native
languages (p<0.001 for both task1 and task 2) as shown in Table 1.

              Table 1. Comparison of task completion time among three groups of participants (by task)

                                       Task completion time                          Comparison
                                 Median of each group (in                             (p value)
                Tasks
                                        minutes)
                               Chinese    English    Other
                 task 1         17.79       8.77      8.41                               <0.001
                 task 2          7.23       3.60      3.92                               <0.001

Then post-hoc analysis was conducted to compare the differences among them. As shown in
Figure 1, for both task 1 and task 2, Chinese speakers spent significantly longer time to
complete the tasks than the other two groups of participants. On average, Chinese
participants spent 17.79 minutes to complete task 1 on average, whereas English participants
spent 8.77 minutes and other language participants spent 8.41 minutes. For task 2, Chinese
participants spent 7.23 minutes to complete on average, whereas English participants spent
3.6 minutes on average, and other language participants spent 3.92 minutes on average.


         Figure 1. Boxplot for three groups of users in task completion time, for task1 (left) and task2 (right)
Since only task 1 contains task type information, we then focused on task 1 to further
examine whether different language participants have any differences in task completion
time in each of the two task types.

             Table 2. Comparison of task completion time among three groups of participants (by task type)


                                       Task completion time                           Comparison
                                      Median of each group (in                         (p value)
               Task type
                                             minutes)
                                    Chinese   English    Other
               Focused               24.62      10.56     10.16                             0.011
                tasks
              Open tasks              13.42               2.61              6.90             0.03


When only focused tasks were considered, users’ completion time also showed significant
difference among three groups of participants (p=0.011), as shown in Table 2.


  Figure 2. Boxplot for three groups of users in task completion time for      Figure 3. Boxplot for three groups of users in task completion time for “open
                             “focused tasks”                                                                      tasks”


The post hoc analysis (Figure 2) showed that native Chinese speakers (Median=24.62 min)
had significantly longer time completion time than native English speaker (Median=10.56
min) and other language speakers (Median=10.16 min). When only open tasks were
considered, users’ completion time also showed significant difference among three groups of
participants (p=0.03). The post hoc analysis (Figure 3) showed that native Chinese speakers
had significantly longer time completion time (Median=13.42 min) than native English
speakers (Median=2.61 min) and other language speakers (Median=6.9 min).


    3.2     Number of Interactions
From the activity log data, we extracted the following indicators of users’ book search,
browsing and annotation behavior:
      Task level: number of books selected, switching between the above modes
         (showlayout)
      Search: number of queries issued, reset search
      Browsing: browse, add a facet, remove a facet, show item, view metadata, similar
         books, paginate (next page)
      Book bag: add to bookbag, remove from bookbag, number of books selected,
         annotate item.
For each task, we extracted the number of times a user performed the above activities. We
tested the distribution of the above indicators, and results show that none of them are
normally distributed. So we conducted K-Wallis tests to see if any of the above variables
shows significant differences across the three language groups (native Chinese speaker,
native English speaker, and native speaker of other languages). Table 3 shows a summary of
the test results:

                    Table 3. K-Wallis H Test Results of search behaviors by Language Groups


                                                               Chi-Square           df             Sig.
                         numbook                                   14.303            2        0.001**
           Task
                         showlayout                                 7.299            2         0.026*
                         Query                                         5.1           2           0.078
           Search
                         resetsearch                                 1.49            2           0.475
                         Browse                                     0.405            2           0.817
                         addfacet                                   0.819            2           0.664
                         removefacet                                4.738            2           0.094
          Browsing       showitem                                   8.186            2         0.017*
                         metadata                                  14.745            2        0.001**
                         similarbooks                               7.187            2         0.028*
                         paginate                                   1.929            2           0.381
                        addtobookbag                                    4.834          2         0.089
        Bookbag         removefrombookbag                               5.401          2         0.067
                        annotateitem                                    8.781          2        0.012*


         3.2.1 Task level comparison
Number of books selected. Results show that there is a significant difference between the
three language groups in terms of how many books they selected for each task.


            Figure 4: Number of books selected by Language Group (1: Chinese; 2: English; 3: Other)


Figure 4 shows that the Chinese group selected the most number of books, while the English
group selected the least number of books for the tasks.

Switching between search, browsing and review. Results show that there is a significant
difference between the three language groups in terms of how many times they have switched
between search, browsing and review modes.
           Figure 5: Number of Switching Layout by Language Groups (1: Chinese; 2: English; 3: Other)


Figure 5 shows that the Chinese group seemed to have switched most between search,
browse and review mode of the system whereas the other two groups seemed similar.

         3.2.2 Search
Results showed that there was no significant difference across three language groups in terms
of number of queries issued and number of times users reset search. Browsing and bookbag
activities seemed more different among the three groups.

         3.2.3 Browsing
Show item and view metadata. Results show that there is a significant difference between
the three language groups in terms of how many times they viewed the metadata of a book.
       Figure 6: Number of Item and Metadata Viewing by Language Group (1: Chinese; 2: English; 3: Other)


Figure 6 shows that the Chinese group viewed more items and more metadata than the other
two groups. There is no significant difference in other browsing activities.

         3.2.4 Bookbag Use
Book annotation. Results show that there is a significant difference between the three
language groups in terms of how many books they annotated.
                 Figure 7: Number of Book Annotation by Language Group (1: Chinese; 2: English; 3: Other)


     Figure 7 shows that the Chinese group annotated more than the other two groups, while the
     other group annotated least number of books (although they selected more books than the
     native English speakers).


              3.2.5 Task Types
     We also compared task types (open vs. focused) in addition to native language groups. Table
     4 shows the results.
              Table 4. K-Wallis H Test Results of search behaviors by Language Groups in two types of tasks


                                                       Focused Task                                Open Task
                                                   Chi-Square df    Sig.                     Chi-Square   df     Sig.
             numbook                                   13.199   2 0.001                          12.721    2   0.002
  Task
             showlayout                                 9.538   2 0.008                           9.538    2   0.008
             query                                         7.4  2 0.025                           9.623    2   0.008
 Search
             resetsearch                                 1.05   2 0.592                            2.39    2   0.303
             browse                                     2.075   2 0.354                           0.747    2   0.688
             addfacet                                   2.014   2 0.365                           1.063    2   0.588
Browsing     removefacet                                 0.02   2   0.99                          4.507    2   0.105
             showitem                                   4.113   2 0.128                           4.766    2   0.092
             metadata                                   7.653   2 0.022                           8.454    2   0.015
            similarbooks                                  3.106          2      0.212               9.717   2   0.008
            paginate                                      0.786          2      0.675               2.345   2    0.31
            addtobookbag                                   4.92          2      0.085                5.57   2   0.062
Bookbag     removefrombookbag                             1.258          2      0.533                4.34   2   0.114
            annotateitem                                  1.389          2      0.499               3.727   2   0.155


    The results seemed to show that there is a different pattern in terms of number of queries
    issued by each language group for focused task and open task. Figure 8 shows the results.


                Figure 8: Number of Queries for Focused (left) and Open (right) Tasks by Language Group
                                           (1: Chinese; 2: English; 3: Other)
For the focused task, Chinese users issued a lot more queries than the other group whereas
English and other speakers issued similar number of queries. For the open task, English
speakers issued least number of queries whereas Chinese and other speakers issued similar
number of queries.
For the open task, Chinese users seemed to use more “similar books” feature, relying on
system recommendation whereas the other two groups did not use this feature as much.

    3.3    Users’ Perceptions of the Usefulness of Search Tools

After participants completed each of search tasks, there was a post-task questionnaire to ask
about the usefulness of each search tools used during searching. In this part, we compared
participants’ judgments of the usefulness among three native language groups. Since we are
interested to see if there is any difference between focused and open tasks, we only focused
in users’ evaluations in task 1 in this part. Similarly, Kruskal-Wallis Tests were conducted
for task 1, and then specifically for focused and open tasks.
As shown in Table 5, when searching for task 1, users have significant differences in two
tools among three groups: browse individual books and search results. The post-hoc
demonstrated that English searchers rated browse individual books significantly lower than
other language searchers, and Chinese searchers were not significantly different with the
other two groups of searchers, as shown in Figure 9. For search results page, the post-hoc
showed that English searchers rated significantly lower than the other two groups of
searchers, and there is no significant difference between Chinese and other language
searchers, as shown in Figure 10.

              Table 5. K-Wallis H Test Results of the usefulness of search tools by Language Groups

              Search tools            Kruskal-Wallis Tests (p value)
                                      Task task=focused task=open
                                      1
              bookbag.notes           0.677 0.963            0.543
              bookbag.similar_books   0.771 0.212            0.597
              browse.individual_books 0.049 0.161            0.073
              browse.topic_explorer   0.398 0.918            0.371
              meta_data.description   0.204 0.040            0.113
              meta_data.publication   0.479 0.152            0.892
              meta_data.reviews       0.054 0.257            0.350
              meta_data.tags          0.665 0.323            0.206
              search.search_box       0.084 0.039            0.777
              search.search_facets    0.633 0.246            0.719
              search.search_history   0.592 0.607            0.904
                  search.search_results                  0.007 0.063                       0.120
                  search.search_topic                    0.725 0.373                       0.440


Figure 9. Boxplot for three groups of users in the usefulness   Figure 10. Boxplot for three groups of users in the usefulness
               of browsing individual books                                           of search results


When only “focused” tasks were considered, two tools showed significant differences:
meta.data.description, and search.box. For meta.data.descrpition, the English searchers
rated significantly lower than the other two groups, as shown in Figure 11. For search box,
Chinese searcher rated significantly more useful than the other two groups of searchers, as
shown in Figure 12.
Figure 11. Boxplot for three groups of users in the usefulness   Figure 12. Boxplot for three groups of users in the usefulness
                  of meta.data.description                                               of search box


      3.4      Search Engagements

     After participants had completed both search tasks, they were asked to fill out a
questionnaire about their engagements for the search system. The engagement questionnaire
consisted of 31 questions representing six groups of engagement factors: aesthetics,
endurability, focused attention, felt involvement, novelty, and perceived usefulness. Since the
website was designed in English language only, we could hypothesize that participants with
different native languages, especially whether native English speakers, native Chinese
speakers and other speaker had engaged in searching using this system at different levels, due
to different language (English) proficiency levels.

     First of all, we tested the normality of the engagement variables, and found none of them
were normal distributed. Therefore, for the comparison among three groups of participants,
we used Kruskal-Wallis Tests. The results are shows in Table 6. Among all 31 engagement
items, 7 of them were found to be significantly different among the three groups of
participants: en1 (Exploring this website was worthwhile), en4 (My exploration experience
was rewarding), fa4 (When exploring, I lost track of the world around me), fa5 (The time I
spent exploring just slipped away), fa6 (I was absorbed in exploring), fi1 (I was really drawn
into my exploration task), pu1 (I felt frustrated while exploring this website).
               Table 6 K-Wallis H Test Results of engagements by Language Groups

Factors      Variables Items                                               Chi-Square Sig.
Aesthetic    ae1       This website is attractive                          3.166      0.205
             ae2       This website was aesthetically                      2.028      0.363
                       appealing
             ae3       I liked the graphics and                            0.257      0.880
                       images used on this websites
             ae4       This website appealed to my                         0.523      0.770
                       visual senses
             ae5       The screen layout of this                           2.969      0.227
                       website was visually pleasing
endurability en1       Exploring this website was                          9.119      0.010
                       worthwhile
             en2       I consider my experience a                          2.422      0.298
                       success
             en3       This experience did not work                        1.347      0.510
                       out as I had planned
             en4       My exploration experience                           10.401     0.006
                       was rewarding
             en5       I would recommend exploring                         4.453      0.108
                       this website to my friends and
                       family
Focused      fa1       I lost myself in this experience                    2.185      0.335
Attention    fa2       I was so involved in this                           3.939      0.140
                       experience I lost track of time
             fa3       I blocked out things around                         4.921      0.085
                       me when I was exploring this
                       website
             fa4       When exploring, I lost track                        6.436      0.040
                       of the world around me
             fa5       The time I spent exploring                          7.247      0.027
                       just slipped away
             fa6       I was absorbed in exploring                         9.796      0.007
                   fa7          During this experience I let       1.942          0.379
                                myself go
   felt        fi1              I was really drawn into my         14.280         0.001
   involvement                  exploration task
               fi2              I felt involved in this            3.732          0.155
                                exploration task
                   fi3          This exploration experience        5.284          0.071
                                was fun
   Novelty         no1          I continued to explore this        2.848          0.241
                                website out of curiosity
                   no2          The content of the website         1.296          0.523
                                incited my curiosity
                   no3          I felt interested in my            2.807          0.246
                                exploration task
   perceived       pu1          I felt frustrated while            8.805          0.012
   usability                    exploring this website
                   pu2          I found this website confusing     4.826          0.090
                                to use
                   pu3          I felt annoyed while visiting      2.412          0.299
                                this website
                   pu4          I found this website confusing     0.386          0.825
                                to use
                   pu5          Using this website was             2.267          0.322
                                mentally taxing
                   pu6          this      experience      was      0.647          0.724
                                demanding
                   pu7          I felt in control of my            2.508          0.285
                                exploration experience
                   pu8          I could not do some of the         1.691          0.429
                                things I needed to do

We then conducted Bonferroni tests for post-hoc analysis for pairwise comparisons.
For en1 (Exploring this website was worthwhile), the post-hoc analysis showed native
English speakers were significantly different from both Chinese and other language speakers.
Particularly, English speakers rated significantly lower (M=1.67) than Chinese speakers
(M=2.61) and other language speakers (M=2.34).
For en4 (My exploration experience was rewarding), the post-hoc analysis showed native
English speakers (M=1.63) rated significantly lower than Chinese speakers (M=2.68), and
other language speakers were not significantly different from English or Chinese speakers
(M=2.18).
For fa4 (When exploring, I lost track of the world around me), the post-hoc analysis
showed native English speakers (M=1.07) rated significantly lower than Chinese speakers
(M=1.86), and other language speakers were not significantly different from English or
Chinese speakers (M=1.32).
For fa5 (The time I spent exploring just slipped away), the post-hoc analysis showed native
English speakers (M=1.41) rated significantly lower than Chinese speakers (M=2.25), and
other language speakers were not significantly different from English or Chinese speakers
(M=1.59).
For fa6 (I was absorbed in exploring), the post-hoc analysis showed other language
speakers (M=1.86) rated significantly lower than Chinese speakers (M=2.79), and English
speakers were not significantly different from Chinese or other language speakers (M=2.11).
For fi1 (I was really drawn into my exploration task), the post-hoc analysis showed Chinese
speakers were significantly different from both English and other language speakers.
Particularly, Chinese speakers rated significantly higher (M=2.64) than English speakers
(M=1.48) and other language speakers (M=1.89).
For pu1 (I felt frustrated while exploring this website), the post-hoc analysis showed
Chinese speakers (M=1.21) rated significantly lower than English speakers (M=2.22), and
other language speakers were not significantly different from English or Chinese speakers
(M=1.57).

4    Discussion and Conclusion

This notepaper presents the preliminary results on the influences of native languages in
search behaviors and search experiences in the context of interactive social book search.
Earlier studies in the proceeding of CLEF SBS 2015 did not find many differences between
native and non-native speakers of English. This year, we joined in CLEF SBS interactive
track, and we think it is reasonable to examine the difference among native Chinese speakers
with native English speakers and other European languages. In general the results show a
series of differences among the three groups of participants. The results show that Chinese
searchers devoted more search efforts in searching, i.e. spent longest time to complete search
tasks, selected the most number of books, have switched most between search, browse and
review model of the search system, viewed more items and more metadata, and annotated
more than English and other language speakers. This is consistent with the results in [3].
Comparatively, English searchers had spent least search efforts among the three groups of
searchers. Besides language effects, another possible reason for this is that all the Chinese
speakers conducted searching in the lab mode in this experiment, and all other participants
conducted searching remotely. Since we do not have any Chinese speakers who participated
remotely, there is no way for us to filter out the effect of participation modes in this analysis.
We should consider having more Chinese participants remotely in the future to further
validate this result. With respect to the usefulness of search tools, few significant differences
were found. For the two search tools that showed significant differences, i.e.
browse_individual_books and search_results, English speakers had the lowest score of
usefulness among three groups of searchers. For the engagement comparison, seven
measures were found to be significantly different among three groups of searchers. In general,
English searchers had the lowest score in the engagement with the search system, and
Chinese searchers had the highest score in the engagement. We should further explore the
data to explain such phenomena. One possible explanation is that Chinese searchers have
devoted most efforts in searching, so they tended not to rate the system with the lowest
measure.


References

    1.   Steichen, B., Ghorab, M. R., O’Connor, A., Lawless, S., & Wade, V. (2014). Towards Personalized
         Multilingual Information Access-Exploring the Browsing and Search Behavior of Multilingual Users.
         In User Modeling, Adaptation, and Personalization (pp. 435-446). Springer International Publishing.
    2.   Bogers, T., Gäde, M., Hall, M. M., & Skov, M. (2016). Analyzing the influence of Language
         Proficiency on Interactive Book Search Behavior. Proceedings of Iconference 2016.
    3.   Chu, P., Jozsa, E., Komlodi, A., & Hercegfi, K. (2012, August). An exploratory study on search
         behavior in different languages. In Proceedings of the 4th Information Interaction in Context
         Symposium (pp. 318-321). ACM.
    4.   Rózsa, G., Komlodi, A., & Chu, P. (2015, May). Online Searching in English as a Foreign Language.
         In Proceedings of the 24th International Conference on World Wide Web Companion (pp. 875-880).
         International World Wide Web Conferences Steering Committee.
    5.   Hansen, P. and Karlgren, J. (2005). Effects of foreign language and task scenario on relevance
         assessment. Journal of Documentation, Vol. 61 (5), 2005, pp. 623-638.