Context-Aware Recommendations for Mobile Shopping

                    Béatrice Lamche                      Yannick Rödl                   Claudius Hauptmann
                     TU München                           TU München                         TU München
                   Boltzmannstr. 3                      Boltzmannstr. 3                    Boltzmannstr. 3
               85748 Garching, Germany              85748 Garching, Germany            85748 Garching, Germany
                  lamche@in.tum.de                  yannick.roedl@tum.de                hauptmac@in.tum.de
                                                      Wolfgang Wörndl
                                                          TU München
                                                        Boltzmannstr. 3
                                                    85748 Garching, Germany
                                                     woerndl@in.tum.de

ABSTRACT                                                            or social environment to recommend items. A context-aware
This paper presents a context-aware mobile shopping recom-          recommender system could for example recommend the “Al-
mender system. A critique-based baseline recommender sys-           bertina” museum rather than visiting the “Prater” amuse-
tem is enhanced by the integration of context conditions like       ment park if the user spends a rainy day in Vienna. This
weather, time, temperature and the user’s company. These            paper evaluates which kind of context information is rele-
context conditions are embedded into the recommendation             vant in a mobile shopping recommender system and how this
algorithm via pre- and post-filtering. A nearest neighbor           information could be utilized to improve recommendations
algorithm, using the concept of an average selection con-           of clothing items in a context-aware recommender system.
text, calculates how contextually relevant a recommendation         By integrating contextual mobile information into the rec-
is. Out of 20 clothing items from the hybrid recommenda-            ommendations it is expected, that the recommended items
tion algorithm, context-aware post-filtering searches for the       better fit the customer’s needs and therefore customers are
nine best-fitting items. The resulting context-aware recom-         more satisfied with the recommender system. The paper is
mender system is evaluated in a user study with 100 test            organized as follows. We first start off with some definitions
participants. The answers of the user study show, that the          relevant for context-aware recommender systems and sum-
recommendations were perceived as being better than the             marize related work. The next section defines the context
recommendations of a non-context aware recommender sys-             factors and describes the system’s overall design. The user
tem.                                                                study evaluating the developed system is discussed in sec-
                                                                    tion 4. The paper concludes by summarizing its results and
                                                                    giving an outlook on future research topics.
Categories and Subject Descriptors
H.4.2 [Information Systems Applications]: Types of
Systems—Decision support
                                                                    2.   BACKGROUND AND RELATED WORK
                                                                       A widely used definition in the area of context-aware ap-
                                                                    plications is the definition by Dey:
General Terms
Design, Experimentation, Human Factors.                                  “Context is any information that can be used to
                                                                         characterize the situation of an entity. An entity
                                                                         is a person, place, or object that is considered
Keywords                                                                 relevant to the interaction between a user and an
context-awareness, mobile recommender systems, location-                 application, including the user and applications
based services, user interaction, critiquing, mobile shopping            themselves” [6, p. 5].

                                                                    They define context as relevant information for an interac-
1.     INTRODUCTION                                                 tion between a user and an application. Therefore, if the
  Context-aware recommender systems (CARS) are systems              context of an entity shall be defined, it is necessary to ask
utilizing the user’s context such as the user’s position, weather   which information is relevant to the situation.
                                                                       Context-aware recommender systems (CARS) integrate
                                                                    context into the recommendation process. This process can
                                                                    be described by this three dimensional recommendation func-
                                                                    tion [2]:
                                                                               R : U ser × Item × Context → Rating            (1)
                                                                    The rating function (R) considers the Context (which is de-
Copyright held by the author(s).
                                                                    fined by all the different Context Factors) and recommends
LocalRec’15, September 19, 2015, Vienna, Austria.                   items of the item set (Item) to a user by predicting the
rating that this user would give to an item. Context com-           as well as personal data, such as calendar appointments,
plicates the recommendation process as items can be rated           viewed documents and messages, to infer the user’s current
in different contexts. An umbrella for example can be rated         activity so that the user is not required to explicitly define
at good weather conditions very highly, due to the fact that        her profile or preferences. The recommendations include
it looks nice or is small. However, if it was raining the same      stores, restaurants, parks and movies. However, up till now,
umbrella could get a bad rating, due to the fact that it breaks     the techniques for automatic context detection are often un-
at the slightest wind. So the context in the rating function        reliable and immature and require further research [9]. We
brings additional complexity as the recommendation algo-            therefore decided to come up with a solution that takes the
rithm does not only have to match users with items, but             users’ explicit stated preferences into account.
also with the context.                                                 I’m feeling LoCo [10] is an ubiquitous mobile recommender
   Adomavicius and Tuzhilin [2] identified three different points   system that recommends places nearby the user’s current lo-
in the recommendation process where context might be in-            cation, e.g. restaurants and museums. Physical context such
corporated into the process:                                        as the user’s current transportation mode and location are
                                                                    automatically detected. This physical information is used
  1. Contextual Modeling - the recommendation algorithm             for a first filtering step: The user’s mode of transporta-
     is altered such that it includes the context and already       tion and location influences the radius within which places
     considers it when calculating recommendations                  for recommendations are considered. Moreover, the user’s
  2. Contextual Pre-Filtering - the current context is used         mood influences the recommendations: foursquare (a social
     to select only the most relevant data from the dataset         network app to save and share visited places with friends1 )
                                                                    assigns each place to a category, which is mapped by the
  3. Contextual Post-Filtering - the context information is         authors to a particular feeling (e.g. the system recommends
     ignored during the recommendation process, only the            events related to Arts & Entertainment when the user feels
     resulting set is contextualized                                “artsy”). As soon as the user states a mood, places assigned
                                                                    with the category to which the feeling is mapped to are rec-
All of these approaches have their specific strengths and
                                                                    ommended. The system is based on text classification. It
weaknesses. However, it is also possible to combine mul-
                                                                    considers the tags and categories associated with a place the
tiple context-based algorithms.
                                                                    user has visited. The user model is therefore a document,
   Since the consideration of context can enhance the use-
                                                                    which holds all the names, categories and tags associated
fulness of a recommendation for a user, CARS are recently
                                                                    with a visited place. A conducted user study shows that
receiving a lot of attention [1]. For instance Anand and
                                                                    I’m feeling LoCo enhances the user experience and that the
Mobasher [3] define a recommendation process that inte-
                                                                    recommended places were overall satisfying [10]. This mood-
grates context. They distinguish between a user’s short term
                                                                    based approach is in particular reasonable if a recommender
(STM) and long term memories (LTM). Contextual cues are
                                                                    system is aimed to suggest different types of leisure activi-
used to retrieve relevant preference models from LTM that
                                                                    ties since the user’s mood might highly influence the current
belong to the same context as the current interaction. This
                                                                    preferences. However, we consider the relevance of the user’s
information is merged with the current preference model
                                                                    mood as low in our mobile shopping scenario.
stored in STM for generating context-aware recommenda-
                                                                       So far, no research exists that analyzed all the contextual
tions [3]. However, the proposed framework is very general
                                                                    factors that might be useful when recommending clothing
and does not emphasize how it can be applied in a mobile
                                                                    items from different stores for mobile shoppers and investi-
scenario, where the context is different.
                                                                    gated how such a recommender system can be constructed
   Baltrunas et al. [4] investigated the relationship between
                                                                    and is perceived. Such an application could help the user
contextual factors and item ratings in a tourist scenario. The
                                                                    detecting new (formerly unknown) brands or stores and find
authors developed a web tool for acquiring subjective rat-
                                                                    clothes matching the user’s fashion style. Compared to ex-
ings regarding points of interest in a mobile scenario within
                                                                    isting mobile recommender systems, clothing items are dif-
a specific context. Users were asked if a specific context
                                                                    ferent in the way, that they frequently change. Such a rec-
factor (e.g. winter season) has a positive or negative influ-
                                                                    ommender system has to be frequently trained or being able
ence on the rating of a particular item. Second, users were
                                                                    to provide good recommendations on a sparse dataset. We
asked to rate example contexts and recommendations. The
                                                                    therefore first acquire the relevant context factors in a mo-
more influential a context factor seemed to be (according to
                                                                    bile shopping scenario and then come up with a promising
the results of the first step), the more contextual conditions
                                                                    approach how to integrate this context into the recommen-
specifying this factor were generated. These imagined rat-
                                                                    dation process.
ings could be used as initial ratings in the database, such
that the cold start problem is minimized. Based on these
results, a predictive model that can be trained offline, was        3.     DESIGNING THE PROTOTYPE
developed. Results show that influencing context factors for           We imagine a system that uses the user’s mobile con-
points of interests are inter alia distance, season, weather,       text to recommend clothing items available in shops close
time, mood and companion [4]. This methodology seems to             to the user’s position. However, as in our previously devel-
be a very promising approach to acquire contextual ratings,         oped baseline system (see Section 3.1), the new approach
however ratings were only acquired for a travel planning            should still allow critiquing of items. As described in Sec-
recommender system and the generated ratings of this work           tion 2, context can be integrated into the recommender sys-
can’t be directly applied to a mobile shopping scenario.            tem in three different ways: contextual pre-filtering, contex-
   Researches have also been done on automatically predict-         tual post-filtering and contextual modeling. In this work,
ing the user’s context. For example in [5], a mobile leisure
                                                                    1
recommender system was developed. It uses time, location,               https://foursquare.com
Figure 1: The context-aware shopping recommender process
                                                                     (a) Item view in CARS         (b) Map view in both systems

                                                                              Figure 3: Detailed Information View


                                                                  mendation algorithm. The algorithm uses a two-fold strat-
                                                                  egy: On a positive critique of an item (touching the thumbs
                                                                  up symbol) it shows items that more closely match the cri-
                                                                  tiqued item. On a negative critique (thumbs down symbol)
                                                                  more diverse items are shown. In both cases, the recommen-
                                                                  dation algorithm uses a k-nearest neighbor algorithm to find
                                                                  the k items that best fit the current requirements. In this
                                                                  case k is set to nine, meaning that in each cycle, the user
                                                                  is shown nine different recommendations. In the following
                                                                  screen, the user selects which of the properties (color, brand,
                                                                  price, type) of the item shall be critiqued. The recommender
                                                                  system then shows more or less items (depending on the
(a) Recommendations in CARS       (b) Critiquing View in CARS     critique) of the selected feature(s). By touching an item’s
                                                                  picture in the recommendations view, the system displays a
             Figure 2: User Interaction Design                    result screen, where the user can select the item. The appli-
                                                                  cation also shows the immediate surroundings of the user in
                                                                  a map (see Figure 3). The system described in this section
two approaches (contextual pre-filtering and contextual post-     without context-awareness is used as a baseline for testing
filtering) are combined to improve the recommendations (see       the context-aware recommender system. Furthermore we
Figure 1). Pre-filtering (Section 3.2) is used to determine       have made some adjustments to this content-based recom-
which items of the case base are relevant to the user. Rel-       mender system due to the changed dataset and performance
evance for example depends on the distance the user ac-           problems.
cepts to travel, or the opening hours of a shop. Post-filtering
(Section 3.4) is used to filter the items that shall be recom-    3.2    Contextual Pre-Filtering
mended according to their adequacy to the current context            In the contextual pre-filtering step, we make sure that
by using a nearest neighbor algorithm. In order to build a        only relevant data is loaded into the recommender system.
database of contextually tagged items, a pre-study was ex-        Therefore, the context factors distance to shop, shop crowd-
ecuted asking users to classify items according to contexts       edness, shop opening hours and item in stock are used to
(Section 3.3). This data ensures, that some items already are     restrict the case base and avoid unnecessary search in items
contextually tagged, which is needed for the post-filtering of    the user does not want to see. The user may state prefer-
the recommendations. The user interface and interaction           ences for each of these context factors. The user might state
design of our CARS is described in section 3.5.                   a different distance to the shops or that she wants to see
                                                                  crowded places as well.
3.1    The Baseline                                                  The case base is filtered in four steps. First, all shops
  The system presented in [7] forms the baseline for our          that are not within the specified distance, then shops that
CARS. It was developed for the Android platform and incor-        are not open at the specified time and shops that do not
porates an active learning algorithm. The user interfaces of      match the crowdedness criterion are excluded. Finally, it
the baseline system are very similar to our developed CARS        is verified that the item is in stock. After pre-filtering the
and can be seen in Figures 2 and 3. The active learning al-       items based on these conditions, it is verified that at least
gorithm, called adaptive selection, is a critique-based recom-    300 items are available in the case base, as our tests showed
                                                                 The average context specifies in which context an item is
                                                                 selected. If an item was not selected in any context, it can
                                                                 be assumed, that this item neither is liked by a lot of users
                                                                 nor in a specific context and can therefore receive a higher
                                                                 distance to the current context. Popular items, which are
                                                                 selected in many different contexts will receive a distance
                                                                 which is close to 0.5. However, as they are very popular,
                                                                 they should not receive a high distance and therefore their
                                                                 distance is reduced by a defined percentage of their distance.

                                                                                             P
                                                                                                   wi,b · dist(cf , b)
                                                                                            b∈ia                           N (f )
                                                                  avgContextDist(c, i) =                                 · P      (2)
                                                                                                      N (ia )                 wj
                                                                                                                          j∈ij

                                                                 Equation 2 defines the distance metric. It calculates the dis-
Figure 4: Tool for elicitation of item preferences in contexts   tance between an item’s (i) average context (in which the
                                                                 item is selected) and the current context (c). The first quo-
                                                                 tient calculates the average distance to the current context
that this is the minimum amount of data to adequately react      whereas the second quotient normalizes the distances.
on the user’s preferences. However, if there were not enough        The set of all context conditions in which an item has
items available in the case base, these conditions are relaxed   been chosen is defined by ia . An individual context condi-
and the user is notified about this step.                        tion in which an item has been chosen is defined by b. For
                                                                 each clothing type, the context factors are of different im-
3.3   Acquisition of Context Relevance                           portance. Hence, different weights (wi,b ) can be assigned to
   Before being able to recommend items based on context,        context conditions. We assigned the weights for each cloth-
the relevant context has to be defined. A promising ap-          ing type based on a previously conducted experiment [11].
proach to assess the context relevance for a tourism sce-        The distance function dist(cf , b) (Equation 3) calculates the
nario is presented by Baltrunas et al. [4] and is therefore      distance between the current context condition cf (f stands
adapted for our shopping scenario (see also subsection 2).       for the context factor) and a context condition b in which
Using this methodology we assess the following context fac-      the item was chosen. For an improved readability the vari-
tors as relevant for our context-aware mobile shopping rec-      ables were renamed to x and y in Equation 3. The number
ommender system: time of the day, day of the week, tem-          of context conditions in which an item has been chosen is
perature, weather, company, distance to shop, crowdedness,       defined by N (ia ). In this work N (ia ) always is a multiple
shop opening hours and item is in stock. In order to ac-         of five - the number of context factors - as we assume all
quire contextual ratings, a convenience sample of the target     context conditions to be set in our artificial environment.
population was asked to specify which items they are likely         In order to make different items (with different overall
to buy in a specified context. We developed a simple Java        weights) comparable, we normalize the distance between
tool (Figure 4) which shows nine pictures and descriptions       zero and one by multiplying with the second quotient of
of clothing items. The testers could specify if they would       the function. Here N (f ) defines the number of context fac-
consider buying the product depending on a randomly se-          tors (five) we use for post-filtering. The number of context
lected company, temperature or weather, which is specified       factors is divided by the sum of weights of all context factors
on the right side of the tool. Overall 747 contextual ratings    (wj ) for the specific item (j ∈ ij ).
for 674 different items were created by six users. This data                     (
forms the basis for the decision generation in the contextual                      graphDistance(x, y) if y is nominal
post-filtering algorithm.                                           dist(x, y) =    |x−y|                                    (3)
                                                                                   rangey
                                                                                                          otherwise

                                                                 If the context factor is ordinal, interval or ratio-scaled, the
3.4   Contextual Post-Filtering                                  distances are calculated based on the euclidean distance.
   Based on the pre-filtered item set the critique-based rec-    Otherwise the graphDistance, a pre-defined distance for
ommender selects 20 items. Out of these 20 items only            nominal attributes, is used. This graphDistance is similar
nine are actually displayed. Therefore, the contextual post-     to the distance used by Lee and Lee [8]. The context factors
filtering algorithm (illustrated in algorithm 1) has to elimi-   weather and company use this graphDistance and define an
nate eleven items in each cycle. The context factors time of     undirected graph with distances between all context condi-
the day, day of the week, company, temperature and weather       tions (e.g. the weather conditions Sunny and Rainy have
are used to post-filter the recommendations. For this pur-       a higher distance than Sunny and Cloudy). The assigned
pose, we use a k-nearest neighbor method because this tech-      distances are used as an input for the distance method. For
nique has proven to be adequate in different CARS. The           the context factor time of the day we use a cycle, as the
most important component in nearest neighbor algorithms          afternoon ends with the night, whereas the night is the first
is the used distance metric. In our approach, the user is        part of the day. For all other conditions it is expected that
not able to rate an item within a given context, but only to     the euclidean distance provides good results. Although we
select it (and therefore implicitly rating it as good). Based    want to achieve a high item frequency, we consider very pop-
on this consideration, we came up with a distance metric         ular items as being interesting for the user, especially in a
that defines an average context in which an item is selected.    shopping scenario. Therefore, we alter the resulting distance
(avgContextDist(c, i)) if the item was selected in more than
30% of all contexts: The item’s distance is reduced by 20 %
so that it is more likely to be displayed to the user. Every
item that is not selected in any context receives a distance of
0.51. We came up with this value because it is the average
distance at the second tertile when considering all distances
of items rated in a specific context to a randomly selected
context. This ensures that items which have not been rated
within a specific context in our pre-study (see Section 3.3)
are more likely to be presented to the user than items that
were considered as being uninteresting in that specific con-
text. The whole algorithm for contextual post-filtering is
presented as algorithm 1.

Algorithm 1 Post-filtering by current and item context
 1: procedure ContextPostFilter(items, context, k)
 2:    for all item in items do
 3:       avgContextDistance(context, item)
 4:       if itemDistance == null then
 5:           setDef aultDistance(item)                           Figure 5: Explicit context determination via questionnaire
 6:       end if
 7:    end for
                                                                  By clicking on an item’s picture, the user gets to another
 8:    decreaseDistanceF orP opularItems(items)
                                                                  screen with more detailed information about the item and
 9:    return kN earestN eighbors(items, k)
                                                                  the store. Here, the user can finally select the item (see
10: end procedure
                                                                  Figure 3a). This information should enhance the trust the
                                                                  user has in the recommendations as she can check whether
   The algorithm’s disadvantage is that it weights each fac-      the initial preferences (about distances to shop, crowded-
tor independently without taking into consideration possible      ness, etc.) were incorporated. Moreover, we implemented a
connections between the individual context factors. For ex-       map showing all available shops. On click of a shop we show
ample the connection of rain and being with a friend might        the shop’s opening hours, the crowdedness, the name, the
be more different from rain and being with the family, than       distance to the current position and how many items (out
the individual distances between being with the family and        of the current recommendations) are available at this shop
being with a friend. This detection of dependencies could         (see Figure 3b).
be done by decision trees or other machine learning tech-
niques. Nevertheless, we expect that the algorithm provides       4.    USER STUDY
reasonable recommendations for the user’s current context            The user study was designed in order to test the differ-
without these dependencies. The algorithm calculates the          ences in user perceptions between the baseline application
context distances in less than 100 ms on a Samsung Galaxy         and the context-aware recommender system. We want to
S3 mini for 20 items with the items being set in (overall) 200    find out whether the users perceive a difference in the ac-
different contexts. It allows weighting of context factors for    curacy of recommendations. A second goal of the study is
each clothing type separately and distances for nominal at-       to find out whether users are more satisfied with a recom-
tributes. The method kN earestN eighbors(items, k) sorts          mender system that takes the mobile context into account.
the items by their distance to the current context. In case       Therefore, the goal of the user study is to evaluate if the
of any ties it uses the similarity measure that has already       following hypotheses are true:
been applied in our baseline system (Section 3.1).
                                                                  Hypothesis 1: The integration of context-awareness leads
3.5    Navigation and Interface Design                               to better perceived accuracy compared to non-context-
   When starting the application, the user is asked to set the       aware recommendations.
following context conditions manually: preferred distance to      Hypothesis 2: The integration of context-awareness im-
the shop, opening hours, temperature, weather and com-               proves the overall user satisfaction.
pany. Moreover the user can specify if she wants to exclude
items that are not in stock and shops that are too crowded.       Hypothesis 1 is tested by comparing the ratings of recom-
The conditions for time of the day and day of the week, are       mendations in a context-aware system and a baseline sys-
not captured, as it is expected that the users are aware of       tem. The users should rate how they perceived the recom-
these conditions subconsciously. The context determination        mendations on a seven-point Likert-scale. Hypothesis 2 shall
interfaces can be seen in Figure 5.                               test whether the users are more likely to use, reuse or rec-
   Figure 2a shows an example of a calculated set of recom-       ommend the application. This is an indication on how well
mendations. With the thumbs up or thumbs down button the          the system adapts to the users and how satisfied they are.
user is able to critique the item’s attributes such as price,
brand, clothing type and color (see Figure 2b). Besides this      4.1   Setup
critiquing possibility, the user is able to see some expla-         The user study is designed as a supervised within-subjects
nations such as why the particular item is recommended.           user survey to minimize the number of survey participants
                                                                  in the dataset. The crowdedness was set randomly with
                                                                  probability of 20 % and an item is in stock with a probability
                                                                  of 90 %.

                                                                  4.2    Results
                                                                     All in all 100 participants (48 females, 52 males), between
                                                                  the ages of 17 and 30, took part in the user study. The an-
                                                                  swers to the Likert-statements (from 1 - strongly agree, to 7 -
                                                                  strongly disagree) in this work either followed a positively or
                                                                  negatively skewed distribution and are ordinal scaled instead
                                                                  of interval scaled. Therefore, a two-tailed paired Wilcoxon
                                                                  signed rank test is executed, rather than a paired t-test, to
                                                                  detect whether there are any significant differences between
                                                                  the distributions. The results of the two-sided tests are re-
                                                                  ported by stating a V and a p value. The V is the sum of
                                                                  ranks assigned to differences with a positive sign. Therefore,
                                                                  a higher V stands for higher differences in the user’s deci-
                                                                  sions. The p value defines how significant the results are.
                                                                  In general we evaluate whether the null hypothesis is likely
                                                                  to be true. The means, as well as the V and p values of
          Figure 6: Tool to generate a user’s scenario
                                                                  the most important metrics of the two systems are shown in
                                                                  Table 1.
                                                                     In order to test the user’s perceived prediction accuracy,
and improve the comparability between the applications.
                                                                  we asked if the recommended products fitted the individual
Each user tests both applications (the baseline system and
                                                                  preferences. The baseline application’s mean is 2.71 whereas
the CARS) and answers a questionnaire afterwards. Which
                                                                  the CARS mean is 2.34 (M edian = 2 for both systems).
system is tested first is flipped in between subjects so that a
                                                                  The Wilcoxon signed rank test reveals, that the recommen-
bias because of learning effects could be reduced. The par-
                                                                  dations of the CARS fitted significantly better to the user’s
ticipants are asked to imagine being in the scenario, the tool
                                                                  preferences than the baseline’s recommendations (V = 1807,
generated for them, whereby the location is always Munich.
                                                                  p < .01).
The participant’s task is to find one item only, which they
                                                                     The context-awareness of the applications is evaluated by
would like to try on. As soon as the users have found a
                                                                  asking whether the products were in line with the provided
suitable item, they are asked to select it, such that they can
                                                                  scenario. The baseline application’s mean is 2.82 whereas
finish the test and answer the corresponding questionnaire.
                                                                  the CARS mean is 2.66 (M edian = 2 for both applications).
The target population of this application are young smart-
                                                                  The Wilcoxon signed rank test shows V = 1346, p = .54.
phone users that like to go shopping. In the user survey qual-
                                                                  This means that the users did not perceive any of the sys-
itative and quantitative data are collected. Qualitative data
                                                                  tems as being more context-aware than the other.
is measured via a questionnaire. It mainly consists of state-
                                                                     When asking the users whether they are likely to use the
ments, the user should assess on a seven-point Likert-scale
                                                                  application again, the users stated that they are significantly
(from 1 - strongly agree, to 7 - strongly disagree), e.g. how
                                                                  more likely to use the CARS (M edian = 2, M ean = 2.64)
satisfied the user is with the recommendations and the appli-
                                                                  again, than the baseline (M edian = 3, M ean = 3.06) appli-
cation in general. The quantitative data is directly measured
                                                                  cation (V = 1563, p < .01).
within the application and includes the number of critiquing
                                                                     The maximum time needed to find an item in the base-
cycles, the time between viewing the first recommendations
                                                                  line application was 867 seconds (M edian = 142s, M ean =
and selecting an item, and the item diversity. Before the
                                                                  179s) and in the CARS application 697 seconds (M edian =
user starts using the application, a scenario describing the
                                                                  149s, M ean = 182s). The time needed to select an item
user’s location, weather and company is generated for her
                                                                  is not significantly different between the applications (V =
(see Figure 6).
                                                                  2302.5, p = .45).
   The participants are asked to actively select their context
                                                                     Another measure for the effectiveness of the recommen-
in the application and imagine it. This scenario is visually
                                                                  dation algorithm is the number of critiquing cycles until an
displayed to the users throughout the whole survey on a
                                                                  item was selected. Participants completed their task in aver-
computer screen directly in front of them. The context con-
                                                                  age 1.24 cycles less using CARS (M edian = 5, M ean = 6.1
ditions not mentioned in the scenario description, such as
                                                                  with CARS, M edian = 5, M ean = 7.34 with the baseline
the crowdedness, can be selected by the user based on her
                                                                  system). Again a Wilcoxon signed rank test was executed
own preferences.
                                                                  (V = 2393.5, p = .11). However, the result is not significant,
   The dataset used to test the application includes 5157
                                                                  meaning that the null hypothesis cannot be rejected.
randomly selected fashion items, that were extracted from
                                                                     One of the goals of the CARS was to reduce the number
the Zalando API2 of their UK-store in February and March
                                                                  of times an individual item is shown (item frequency) and
2015. Since our dataset is artificial, we distributed the items
                                                                  thus increase the number of different items (item coverage).
equally across all 129 shops and made realistic assumptions
                                                                  All in all the baseline application showed 7506 (1690 dif-
for our shops. The shop’s opening hours were set to realistic
                                                                  ferent; 22.5 % unique) and the CARS 6390 (1754 different;
values with moderate modifications to have some differences
                                                                  27.4 % unique) items. We measured every time that an item
2
    https://www.zalando.co.uk                                     was displayed to any user. The maximum number of times
Table 1: The means of some important measured values               vey also imagined these contexts, we expect no significant
comparing both variations of the system.                           differences between the classification of the items and the
                                                                   imagined scenario in the user study. This approach might
                         BASE     CARS p value        V value      help in narrowing down the problem of acquiring relevant
                         mean     mean                             context data as a quick start for a context-aware applica-
 Perceived accuracy      2.71     2.34 <.01           1807         tion. However, it has to be evaluated how close real contex-
 Perceived context-      2.82     2.66 .54            1346         tual ratings can be estimated with this method. In order
 awareness                                                         to adapt the existing approaches of estimating a rating to a
 Intention to return     3.06     2.64     <.01       1563         yes or no decision we had to develop the concept of an av-
 Time                    179 s    182 s    .45        2302.5       erage context, in which an item is selected. We believe that
 Cycles                  7.34     6.1      .11        2393.5       every context-aware recommender system relying on yes or
 Item frequency          4.39     3.62     <.01       285253.5     no decisions might have benefits from adapting its context
                                                                   incorporation by using our approach. We also aim to find
                                                                   out whether the results of this work can be transferred to
                                                                   other application scenarios, such as for grocery shopping or
an item was shown was 115 (M edian = 3, M ean = 4.392)
                                                                   leisure activity recommendation systems.
for the baseline application and 53 (M edian = 2, M ean =
3.622) for the CARS. A Wilcoxon signed rank test reveals
that there is a significant difference between the samples         6.   REFERENCES
(V = 285253.5, p < .01), meaning that the CARS showed               [1] G. Adomavicius, L. Baltrunas, E. W. De Luca,
items significantly less frequent than the baseline. Although           T. Hussein, and A. Tuzhilin. 4th workshop on
the CARS showed less items overall, more different items                context-aware recommender systems (cars 2012). In
have been shown. This indicates that the recommended                    RecSys, pages 349–350, 2012.
items have been more diverse.                                       [2] G. Adomavicius and A. Tuzhilin. Context-aware
   Overall, 59 participants reported that they prefer the con-          recommender systems. In F. Ricci, L. Rokach,
text-aware application (CARS). This are significantly more              B. Shapira, and P. B. Kantor, editors, Recommender
compared to a random distribution of answers as a chi-                  Systems Handbook, pages 217–253. Springer, 2011.
squared test reveals (X 2 = 30.38, with 2 df [degrees of            [3] S. S. Anand and B. Mobasher. Contextual
freedom], p < .001).                                                    recommendation. Lecture Notes in Artificial
   The test participants found that the CARS recommenda-                Intelligence, 4737:142–160, 2007.
tions fitted significantly better to their preferences. There-      [4] L. Baltrunas, B. Ludwig, S. Peer, and F. Ricci.
fore, hypothesis 1 that the recommendations by a context-               Context Relevance Assessment and Exploitation in
aware system are perceived as better is retained. Hypothe-              Mobile Recommender Systems. Personal Ubiquitous
sis 2 that the overall user satisfaction is improved can also           Comput., 16(5):507–526, June 2012.
be retained to a certain degree as users were more satisfied
                                                                    [5] V. Bellotti and et al. Activity-Based Serendipitous
with the CARS. The results might be less significant than
                                                                        Recommendations with the Magitti Mobile Leisure
expected as only six users rated items in context as an ini-
                                                                        Guide. Proceeding of the twenty-sixth annual CHI
tial dataset. However, we wanted the dataset to be sparse
                                                                        conference on Human factors in computing systems -
as there are frequent changes to fashion collections.
                                                                        CHI ’08, pages 1157–1166, 2008.
                                                                    [6] A. K. Dey. Understanding and using context. Personal
5.   CONCLUSION AND FUTURE WORK                                         and Ubiquitous Computing, 5:4–7, 2001.
   In this work, a context-aware recommender system was             [7] B. Lamche, U. Trottmann, and W. Wörndl. Active
developed and evaluated in a mobile shopping scenario. Our              learning strategies for exploratory mobile
CARS is based on an active learning algorithm and uses a                recommender systems. ACM International Conference
nearest neighbor algorithm. Compared to a system with-                  Proceeding Series, pages 10–17, 2014.
out context-awareness, the recommendations were perceived           [8] J. S. Lee and J. C. Lee. Context awareness by
as significantly better in the CARS. Interestingly, the users           case-based reasoning in a music recommendation
did not attribute the better recommendation quality to the              system. Lecture Notes in Computer Science,
more context-aware recommendations but to better adapt-                 4836:45–58, 2007.
ability to their preferences and their clothing style, although     [9] F. Ricci. Mobile recommender systems. Information
the only difference from an algorithmic perspective is the              Technology & Tourism, 12(3):205–231, 2010.
context-awareness. It should be investigated in more detail,       [10] S. Savage, M. Baranski, N. E. Chavez, and
whether context-awareness is only perceived subconsciously.             T. Hollerer. I’m feeling loco: A location based context
The next step for this application would be to test it in               aware recommendation system. In Advances in
an online-experiment where real context-aware information               Location-Based Services: 8th International Symposium
is elicited. In a first approach the clothing data of some              on Location-Based Services, Lecture Notes in
selected retailers would be enough to test this application             Geoinformation and Cartography. Springer, Vienna,
online. In the future, we plan to conduct a user study where            Austria, 2011.
real context-aware information is elicited. Still a major chal-    [11] W. Wörndl and B. Lamche. User interaction with
lenge for context-aware applications is to acquire context-             context-aware recommender systems on smartphones.
aware data to train or tweak a context-aware algorithm. For             In icom, volume 14, pages 19–28, 2015.
this user study, selected users classified the contexts in which
they would try the clothes on. As the users in the user sur-