Improving search experience on distributed leisure events
                  Richard Schaller                            Morgan Harvey                       David Elsweiler
           Computer Science (AI Group)                 Computer Science (AI Group)                     I:IMSK
            Uni of Erlangen-Nuremberg                   Uni of Erlangen-Nuremberg             University of Regensburg
           richard.schaller@cs.fau.de morgan.harvey@cs.fau.de                                david@elsweiler.co.uk


ABSTRACT                                                                  performance with over 40% of searches being unsuccessful.
This paper examines how simple changes to a search system                 A major conclusion of the paper is that people used the
can influence the user’s experience when using the system.                system as a tool for filtering and not for searching for new
In previous work, we evaluated user behaviour with a search               things. The paper gives suggestions for better tailoring the
tool designed to help people discover events distributed over             search system towards the observed user behaviour.
a city of interest to them personally. We established, con-                  In this paper we build upon these results and explore how
trary to our expectations, that users mostly searched for                 the proposed changes to the system influenced search charac-
events they already knew about, made several spelling er-                 teristics and performance with the overall aim of improving
rors and often achieved poor search performance. Taking                   search experience. We first describe the design of a study
these findings as inspiration, we made changes to how the                 to answer this question. We report analyses of interaction
system works. In this paper, we describe and motivate the                 logs of a variant of the search system including the proposed
changes and present a naturalistic log-based study (n=860)                changes which was tested on a similar event as was used in
to examine the effect on user search behaviour.                           [8]: The Long Night of Music 2012 opposed to the Long
                                                                          Night of Museums 2011, both located in Munich. We admit
1.    INTRODUCTION AND MOTIVATION                                         possible doubts as to the comparability of these Nights due
   When studying search behaviour most published work fo-                 to the different topics: however we are able to address these
cuses on analysis in the context of work tasks. Such tasks                doubts by showing that user behaviour is similar enough to
are not necessarily related to work but rather involve people             make valid comparisons of system changes. We then show
performing a sequence of activities in order to accomplish a              differences and similarities in search behaviour between the
goal [5]. A work task has a recognisable start and end, may               original and the improved search system and finally draw
consist of a series of sub-tasks, and results in a meaningful             conclusions as to the effectiveness of the analysed changes.
product [2]. Thus models developed tend to assume that
people look for information to close a gap in knowledge [1]               2.   DISTRIBUTED EVENTS
which prevents them from completing their current task.                      A distributed event is a collection of single events over
   In contrast we want to do search analysis in context of                the same time period and having the same general theme.
leisure activities where no clear focus on a concrete working             One such event is the Long Night of Munich Museums1 (LN-
task is given. Elsweiler and colleagues [4] proposed a model              Museums), an annual cultural event organised in the city of
for what they refer to as casual leisure search, which devi-              Munich2 , which was the context of the study performed in
ates from standard work-based models. According to their                  [8]. In addition to a diverse range of small and large muse-
model, in casual-leisure situations users are not focused on              ums, other cultural venues, such as the Hofbräuhaus and the
accomplishing a task but rather aim to be entertained or to               botanical garden open their doors for one evening in Octo-
pass time. These needs are influenced by emotional state,                 ber. Many venues organise special activities and exhibitions
physical state or the social context in which they live. Addi-            not otherwise available. A similar distributed event is the
tionally such needs differ from work tasks by weighting the               Long Night of Music (LNMusic) which also takes place in
emotions induced by the found content or even the search                  Munich. Aside from pubs, discotheques and clubs also some
process itself more than the raw informational content.                   cultural venues like churches and museums take part which
   Schaller et al. analysed in [7, 8] mobile search behaviour             leads to some overlap between both nights regarding the
in the context of a distributed event and compared search                 provided events.
characteristics and performance of a naturalistic user study                 Visitors to Long Night include both locals and tourists
to those of mobile web search. A main finding is that the                 and represent a broad range of age groups and social back-
analysed queries were much shorter than those of mobile                   grounds. In 2011 an estimated 20,000 people visited a total
web search. Also, most queries – in contrast to web search –              of 176 events at 91 distinct locations at the LNMuseums,
were for known-items: predominantly events names. Users                   including exhibitions, galleries and interactive events. The
made a huge amount of spelling mistakes perhaps due to the                LNMusic is about the same size with 206 events at 123 lo-
environmental context (e,g, typing on a bumpy bus) or due                 cations and approximately 20,000 visitors. Events on both
to the unfamiliarity with the correct spelling of the known-              nights take place all over the city, mostly in the city centre,
items. This is probably one of the causes of the poor search              but some, such as the Museum of the MTU Aero Engines
                                                                          1
Presented at EuroHCIR2012. Copyright c 2012 for the individual papers      Name in German: Lange Nacht der Münchner Museen
                                                                          2
by the papers’ authors. Copying permitted only for private and academic    The event is organised by Münchner Kultur GmbH
purposes. This volume is published and copyrighted by its editors.        (http://www.muenchner.de/museumsnacht/)
and the Potato Museum, are located in suburbs. Special bus        then to match topically similar words, each token is mapped
tours are set up to transport visitors between events.            to one or more topic groups (these groups are taken from [3]).
   Events can be discovered by means of the booklet that is       This way terms such as “dinner” and “food” are mapped to
distributed for free by the organisers and contains descrip-      the same groups, thus event descriptions containing one of
tions of all events in the order they lie along the bus tours.    these words could be found by the other. To speed up inter-
This booklet is necessarily large (110 A6 pages per Long          action with the system, queries were submitted after each
Night) and can be difficult to navigate.                          typed character (search-as-you-type). The presented result
                                                                  list contains the name and nearest bus stop for each of the
3.   SYSTEM                                                       retrieved events.
   An Android app was developed in [8] to help visitors of
the Long Night find events of interest to them personally.        4.     SEARCH SYSTEM CHANGES
Once they have found and selected the events they would             Based on insight gained from user interactions with the
most like to visit, the system can create a time plan for the     original search system, as described in [8], it was determined
evening, taking into account constraints such as start and        that the following improvements could be made:
end times of events, time to travel between events and pub-
lic transport routes and schedules. If the user chooses more           • Grep-Like Search:
events than would fit into the available time then the sys-              Since users used the system mainly as a filtering tool,
tem tries to maximise the number of scheduled events by                  the search-as-you-type feature might have led users to
leaving out those requiring long travel time. It is also possi-          give up early: the system tried to match whole (or
ble for the user to manually customise the plans by adding,              stemmed) words while the user faces an empty result
removing and re-ordering events to be visited. Based on                  list after typing in the first few characters but before
the created plan, the application can lead the user between              finishing the word. In our new system – used and eval-
chosen events using a map display and textual instructions.              uated on the LNMusic – we extended the search system
Figure 1 provides some screenshots of the app3 as was used               with a grep-like feature which would also match parts
on the LNMuseums and LNMusic.                                            of words and not just complete words. For example, if
                                                                         a user is looking for the event ”Lenbach” it is sufficient
                                                                         to type in just ”Lenb”. This means that users are not
                                                                         so often presented with an empty list of search results.
                                                                       • Fuzzy Search:
                                                                         It was noticeable from the user interactions that a huge
                                                                         number of spelling errors were being made. This was
                                                                         presumably due to environmental factors, e.g. typing
                                                                         on a bumpy bus or due to the a high number of named
                                                                         entities, the spelling of which people are not familiar.
                                                                         In either case the system was adapted to better sup-
                                                                         port the user by performing a fuzzy search according to
                                                                         [8]. Our system was improved by utilising the Lucene
                                                                         Fuzzy Search mechanism which uses the levenshtein
                                                                         edit distance to match term that differ only by a few
                                                                         characters. If looking for the event ”Lenbach” it will
                                                                         be found even if the user by mistake typed ”Lembach”.

                                                                  Both changes aim to allow users to more quickly and easily
                                                                  find the events they are interested in and improve the overall
                                                                  search experience.
Figure 1: The search screen with a query (left) and                  As the same naturalistic study was used to analyse other
the map screen with the planned route (right)                     parts of the system there were other small changes which
                                                                  are unrelated to the search system analysis: the number of
                                                                  tabs was increased due to the addition of a recommender
  The user has four ways to find events he would like to          tab which was beforehand combined with the list of already
visit, namely he can: receive recommendations based on a          selected events in one tab. Secondly, the tab position of the
pre-defined profile and collaborative filtering algorithm built   recommender tab was tested in an A/B test. Both changes
into the app; browse events by bus route; browse events by        are beyond the scope of this search behaviour paper and
genre or type or submit free-text queries, which search over      should not have any significant influence on it as the layout
the names and descriptions of the events.                         of the search tab itself wasn’t changed.
  As described in [8] the search functionality was imple-
mented in Lucene4 and documents were represented by titles        5.     DATA COLLECTION
and descriptions from the Long Night booklet. Lucene was            We examined the search behaviour of users by record-
extended to perform a search based on topics. Firstly the         ing user interactions with our refined app at the LNMusic
event descriptions and titles were tokenised and stemmed          2012 in the same manner as with the original app on the
3                                                                 LNMuseums2011. Again the app was available for down-
  a video demo of the application can be found on YouTube
(http://www.youtube.com/watch?v=qy1F8fZbowo)                      load from Google Play Store and advertised on the official
4
  Lucene version 3.1. (http://lucene.apache.org)                  Long Night of Music web page. In total the application
was downloaded approximately 1000 times and 860 users al-           provided are partially the same. We noticed that some of
lowed us to record their interaction data (In [8] approx. 500       the events on the LNMusic were also available on the LN-
downloads and 391 users are reported). We recorded all in-          Museums. We investigated further into how many events
teractions with the application including submitted queries,        are in common between both nights. To do so we looked
result click-throughs, all interactions with browsing and rec-      into how many LNMuseums events had, at the same loca-
ommendation interfaces, tours generated, modifications to           tion, an event on the LNMusic that were organised by the
tours, as well as all ratings submitted for events. Users           same museum, church, bar, etc.. Surprisingly 21.6% of the
interacted on average for 11.79 minutes5 with the system            LNMuseums2011 events had a matching event on the LN-
(median 6.46). 57.3% of users interacted for more than 5;           Music2012. The topic of the matched events might differ but
17.9% for more than 30.                                             as mentioned above the topic is only one of many relevant
   Since queries were submitted after every typed character,        aspects for visitors to choose an event.
it is necessary to pre-process the recorded queries to estab-          Secondly we looked into the app usage itself. Upon first
lish those that the users actually intended to submit. For          start-up of the app we ask our users to fill out a short ques-
example, if the user wanted to search for “food”, the system        tionnaire; among others we ask for the age of the user (below
logged “f”, “fo”, “foo”, as well as “food”. Furthermore, should     18, 18 to 29, 30 to 39, 40 to 49, 50 to 59 and above 60 years
the user wish to submit a new query, then he must first re-         old). Answering of these questions was optional but 246
move the old search terms from the search box, resulting            on the LNMuseums2011 and 495 users on the LNMusic2012
again in all prefixes but this time in decreasing length.           chose to do so. Based on these data (omitting the first and
   As in [8] we manually judged queries to be intended or not.      last age groups due to the small sample size) we compared
                                                                                                                             2
3 assessors separately annotated all of the 12,500 queries          the age distribution of users on both nights with a χ -Test
                                                                                  2
logged as being either intended or not-intended. A very high        revealing a χ of 1.459 and p = 0.6918. This result states
inter-assessor agreement was found (Fleiss’ kappa = 0.915,          that this is no significant difference between app users of
89.8% of queries which were labeled by at least 1 assessor          both nights.
were also labelled by at least one other). This process re-         To ascertain if the two studies are comparable it is important
sulted in a final list of 1,434 search queries, which is used       that system usage is similar on both nights. As described in
in the following analyses and compared against the results          Section 3 there were multiple ways (different tabs) of access-
reported in [8] which are based on 801 search queries.              ing the events. We looked into which of these tabs users were
                                                                    interested in. We therefore define a tab session to start when
6.   IS A COMPARISON OF NIGHTS FAIR?                                a user switches to a tab and to end when he switches to an-
   Undoubtedly the best way of comparing two version of a           other tab. Table 1 shows the number of tab sessions. Based
                                                                                      2                 2
system is to run experiments under the same external con-           on this data, a χ -Test shows a χ of 5.387 and p = 0.1456,
ditions. Unfortunately this was not possible with our app           again indicating no significant difference between users of
for the LNMusic as we work together with the organisers to          both nights.
provide a real system for “productive” use and cannot ex-                          by Tour    by Genre     Search   Rec.+Rated
periment with arbitrary system variations. The alternative           LNMuseums      28.0%       14.5%       15.4%      42.0%
                                                                      LNMusic       26.6%       15.6%       15.1%      42.7%
would be a lab study, however we consider this to be a less
preferable option given that we wish to record interaction
with the system in a real-world (i.e. non-simulated) set-                      Table 1: Number of tab sessions
ting. Therefore we looked into whether data obtained from              Lastly we considered properties of the search behaviour
different events could be fairly compared.                          itself that should be invariant to our changes to the search
   We learned from our experience with past Long Nights             system. One of the main findings of [8] was the huge num-
that user behaviour is to a huge extent independent from            ber of named-entity searches and we compared the reported
the actual type of Long Night. In [7] interviews with visitors      numbers to those of our own system. Using the same method
of two different Long Nights revealed that beside the topic         described in [8] we instructed 3 human accessors to label
of an event other characteristics – such as novelty, the time       all search queries into one of three categories: specific event
and location of the event or the possibility to take part in        name, not a specific event name or indeterminate. For 82.0%
the event – play a crucial role. It is precisely this that sets a   of all queries at least two of the assessors were able to agree
system for casual leisure activities apart from a system for        on one of the three categories (Fleiss Kappa of 0.32). 84.5%
solving a work task [4].                                            (LNMuseums2011: 59.4%) of the agreed on queries were
   In this section we want to give more insights into why our       marked as clearly named entities and 8.2% (34.6%) that
study on the LNMusic2012 and the study presented in [8] on          might be named entities. Only 7.2% (6.0%) were labeled
the LNMuseums2011 are comparable. First of all both dis-            as non named-entity searches. It is notable that the low
tributed events – although their topics are different – have        number of non named entity searches is similar to what is
a lot in common: They take place in the same city, are or-          described in [8].
ganised by the same company and hence have the same ads,               In [7] it is reported that the same system used on the LN-
booklet format, price tag and even the special bus routes           Museums2011 was also evaluated on the Long Night of Sci-
                                                                    ence in Erlangen-Nuremberg (LNScience2011), which also
5
 These figures were calculated by summing the time peri-            is a distributed event but dedicated to science. We looked
ods for which a user was active, discounting times where            into how search characteristics differ if the same system is
the system reported no interactions for more than 15 sec-           used on different nights. We compared query length with re-
onds. We further discounted any interaction sequence that
contains gaps of non-interaction longer than 30 minutes as          spect to the number of characters and the number of terms
these are likely due to logging problems caused by running          per query by performing a (non-parametric) Kruskal-Wallis
out of power, connection problems, app crashes, etc.                Rank Sum Test. No significant difference between the usage
on the LNMuseums2011 and the LNScience2011 could be                 2,157 interactions with events (viewing, rating, selecting)
found (for characters: p = 0.1169; for terms: p = 0.6039).          on the result list of the LNMusic2012 app (created by the
When performing the same test between queries on the LN-            improved search system). We then ran the corresponding
Museums2011 and the LNMusic2011 a highly significant dif-           search queries through the old search system and counted
ference can be found with p  0.01 for both characters and          whether these events would be on the result list had the orig-
terms. Thus changes to the search system have an influence          inal search system been used. Only 938 event interactions
on search behaviour but changes to the overall setting of the       would be possible with the previous search system, meaning
distributed event have not.                                         that 56.5% of the interactions performed by users wouldn’t
   In conclusion we believe a comparison of the app user            have been possible, simply because the events wouldn’t be
behaviour on both nights is appropriate, given the circum-          in the results.
stances and difficulty of obtaining real-world user data of           This analysis has revealed indicators of an improved search
such apps.                                                          experience which means that the changes proposed in [8] are
                                                                    useful in the context of distributed events assistance.
7.   INFLUENCES OF THE CHANGES                                      8.     DISCUSSION AND CONCLUSIONS
   In [8] many statistics on query characteristics and query
                                                                       In this paper we analysed the changes in query behaviour
performance are given based on analysis of search logs, a
                                                                    of users due to modifications of a search system used on
common technique in the literature [6]. In this section we re-
                                                                    distributed events. We first describe two studies performed
calculate these statistics based on the data logged with the
                                                                    during two such events, including a description of the system
new system and compare behaviour with both app variants
                                                                    used and what changes to the search system were tested. As
to determine what user behaviour changes the search system
                                                                    the LNMuseums and LNMusic have different topics we then
modifications caused. To do so we consider a number of
                                                                    showed that a comparison of user behaviour between both
different indicators of an improved search experience.
                                                                    nights is sensible and worthwhile. With this preparatory
   The average length of a search query on the LNMusic2012
                                                                    work we then analysed users’ search behaviour by compar-
was 5.6 characters (σ = 3.36) and 1.14 terms (σ = 0.41).
                                                                    ing search characteristics and search performance. Overall,
This is much shorter than what is reported [8] for the LN-
                                                                    users typed much shorter search queries, especially in the
Museums2011: 8.9 characters (σ = 5.31) and 1.21 terms
                                                                    case of a successful search. Also comparing query perfor-
(σ = 0.52). A Z-test performed between both nights reveals
                                                                    mance revealed a much higher success rate with the ratio
that this difference is highly significant for both metrics (i.e.
                                                                    of unsuccessful searches being almost halved. Finally we
p  0.01 in both cases). It seems that the grep-like search-
                                                                    presented a comparison of both search systems running on
ing – which matches also partial words – has influenced peo-
                                                                    the same search queries which showed that only half of the
ple to stop typing much earlier. We assume that there are
                                                                    interactions with events would have been possible with the
two main causes that users stopped typing: either they have
                                                                    old system.
found what they were looking for (successful search) or they
                                                                       The search system as it is now is designed for users to
gave up on the search because they couldn’t find what they
                                                                    find events they already know of in advance. But how can
were looking for (unsuccessful search). Users of our system
                                                                    users be assisted in finding events that are new to them?
had three options to interact with the entries in the result
                                                                    How can we better support the discovery of serendipitous
list: they could view details of an event, mark an event as
                                                                    events? Since the users seldom used the search system for
a candidate for tour inclusion or add the event to an pre-
                                                                    that purpose, a second tool like a recommender is necessary.
existing tour. We consider any of the three as an indicator
                                                                    The user could then decide if he wants to look for a concrete
for search success and the lack thereof as an indicator of an
                                                                    event he already knows of or if he would rather be inspired
unsuccessful search. Good abandonment wasn’t considered
                                                                    by the system. If such a split into two “orthogonal” tools is
since the result list contains no information beyond the event
                                                                    understood and accepted by users then it is worth investi-
name and nearest bus stop. The length of successful queries
                                                                    gating and would point the way to vastly better distributed
on the LNMusic2012 was 5.39 characters (σ = 3.27) and 1.12
                                                                    events assistance systems.
terms (σ = 0.38) which is highly significantly (p  0.01)
shorter than reported in [8]: 9.90 characters (σ = 5.42) and        Acknowledgments This work was supported by the Embedded
1.26 terms (σ = 0.57). This means users have to type on av-         Systems Initiative (http://www.esi-anwendungszentrum.de).
erage 45% less to find the events they are interested in. On
the other hand the query length of unsuccessful queries was         9.     REFERENCES
slightly reduced with 6.39 characters (σ = 3.56) opposed to         [1] N. J. Belkin, R. N. Oddy, and H. M. Brooks. ASK for information retrieval:
7.47 characters (σ = 4.80) but slightly longer with regard to           Part I. Background and theory. Journal of Documentation, 38(2):61–71, 1982.
                                                                    [2] K. Byström. Task complexity, information types and information sources.
terms: 1.20 (σ = 0.49) as opposed to 1.13 (σ = 0.42).                   Examination of relationships. PhD thesis, University of Tampere, Dep. of Inf.
                                                                        Studies, 1999.
   Of the 1,434 queries entered on the LNMusic 76.7% re-            [3] F. Dornseiff. Der deutsche Wortschatz nach Sachgruppen. DeGruyter, Berlin, New
sulted in an interaction of the user with an event, mean-               York, 2004.
                                                                    [4] D. Elsweiler, M. L. Wilson, and B. Kirkegaard Lunn. New Directions in
ing they were successful. 23.3% were unsuccessful, a much               Information Behaviour, chapter Understanding Casual-leisure Information
                                                                        Behaviour. Emerald Pub., 2011.
better conversion rate compared to the 40.3% unsuccessful           [5] P. Hansen. User interface design for IR interaction. a task-oriented
searches in [8]. This decrease is highly significant (p  0.01)         approach. In CoLIS 3, pages 191–205, 1999.
                                                                    [6] B. J. Jansen and A. Spink How are we searching the world wide web?: a
and demonstrates that the improved search system was able               comparison of nine search engine transaction logs In IPM, (1,42) pp.
                                                                        248–26, 2006.
to assist users in finding events they were looking for.            [7] R. Schaller, M. Harvey, and D. Elsweiler. Entertainment on the go: Finding
   In [8] a large ratio of 59.75% of unsuccessful search queries        things to do and see while visiting distributed events. In Proceedings of IIiX,
                                                                        2012.
had an empty result list. With the improved search system           [8] R. Schaller, M. Harvey, and D. Elsweiler. Out and about on museums night:
this was only the case in 12.57% of unsuccessful queries. But           Investigating mobile search behaviour for leisure events. In Proc. of
                                                                        Searching4Fun Wksp, ECIR, 2012.
how successful were those “added” entries? We looked at all