=Paper=
{{Paper
|id=None
|storemode=property
|title=Context-Dependent Recommendations with Items Splitting
|pdfUrl=https://ceur-ws.org/Vol-560/paper16.pdf
|volume=Vol-560
|dblpUrl=https://dblp.org/rec/conf/iir/BaltrunasR10
}}
==Context-Dependent Recommendations with Items Splitting==
<pdf width="1500px">https://ceur-ws.org/Vol-560/paper16.pdf</pdf>
<pre>
Context-Dependent Recommendations with Items Splitting

                             Linas Baltrunas                                         Francesco Ricci
                  Free University of Bozen-Bolzano,                          Free University of Bozen-Bolzano,
                        Piazza Università 1,                                       Piazza Università 1,
                           Bolzano, Italy                                             Bolzano, Italy
                          lbaltrunas@unibz.it                                          fricci@unibz.it


ABSTRACT                                                              called “word-of-mouth” and is now largely applied in the
Recommender systems are intelligent applications that help            “social” web. For example, amazon.com recommends items
on-line users to tackle information overload by providing             that user could be interested to buy or delicious.com recom-
recommendations of relevant items. Collaborative Filter-              mends the links that were tagged by alike users with com-
ing (CF) is a recommendation technique that exploits users’           monly used tags. CF recommendations are computed by
explicit feedbacks on items to predict the relevance of items         leveraging historical log data of users’ online behavior [12].
not evaluated yet. In classical CF users’ ratings are not             The relevance of an item is usually expressed and modeled
specifying in which contextual conditions the item was eval-          by the explicit user’s rating. The higher is the rating that a
uated (e.g., the time when the item was rated or the goal of          user assigned to an item, the more relevant is the item for
the consumption). But, in some domains the context could              the user. CF assumes that the user’s recorded ratings for
heavily influence the relevance of the item and this must be          items can help in predicting the ratings of like-minded users.
taken into account. This paper analyzes the behavior of a             We want to stress that this assumption is valid only to some
technique which deals with context by generating new items            extent. In fact, the user’s general interests can be relatively
that are restricted to a contextual situation. The ratings’           stable, but, the exact evaluation of an item can be influenced
vectors of some items are split in two vectors containing the         by many additional and varying factors. In certain domains
ratings collected in two alternative contextual conditions.           the consumption of the same item can lead to extremely dif-
Hence, each split generates two fictitious items that are used        ferent experiences when the context changes [1, 4]. There-
in the prediction algorithm instead of the original one. We           fore, relevance of an item can depend on several contextual
evaluated this approach on semi-synthetic data sets measur-           conditions. For instance, in a tourism application the visit-
ing precision and recall while using a matrix-factorization           ing experience to a beach in summer is strikingly different
algorithm for generating rating predictions. We compared              from the same visit in winter (e.g., during a conference meet-
our approach to the previously introduced reduction based             ing). Here context plays the role of query refinement, i.e.,
method. We show that item splitting can improve system                a context-aware recommender system must try to retrieve
accuracy. Moreover, item splitting leads to a better recall           the most relevant items for a user, given the knowledge of
than the reduction based approach.                                    the current context. However, most CF recommender sys-
                                                                      tems do not distinguish between these two experiences, thus
                                                                      providing a poor recommendation in certain situations, i.e.,
1.   INTRODUCTION                                                     when the context really matters.
   The Internet, interconnecting information and business                Context-aware recommender systems is a new area of re-
services, has made available to on-line users an over abun-           search [1]. The classical context-aware reduction based ap-
dance of information and very large product catalogues.               proach [1] extended the classical CF method adding to the
Hence, users trying to decide what information to consult or          standard dimensions of users and items new ones represent-
what products to choose may be overwhelmed by the num-                ing contextual information. Here recommendations are com-
ber of options. Recommender systems are intelligent appli-            puted using only the ratings made in the same context as
cations that try to solve information overload problem by             the target one. For each contextual segment, i.e., sunny
recommending relevant items to a user [2, 11]. Here an item           weekend, algorithm checks (using cross validation) if gener-
is usually a descriptive information about a product such as          ated predictions using only the ratings of this segment are
a movie, a book or a place of interest. Recommender sys-              more accurate than using full data set. The authors use
tems are personalized Information Retrieval systems where             a hierarchical representation of context, therefore, the ex-
users make generic queries, such as, ”suggest a movie to be           act granularity of the used context is searched (optimized)
watched with my family this night”.                                   among those that improve the accuracy of the prediction.
   Collaborative Filtering (CF) is a recommendation tech-             Similarly, in our approach we enrich the simple 2-dim. CF
nique that emulates a simple and effective social strategy            matrix with a model of the context comprising a set of fea-
                                                                      tures either of the user, or the item, or the evaluation. We
                                                                      adopt the definition of context introduced by Dey, where
                                                                      “Context is any information that can be used to character-
                                                                      ize the situation of an entity” [8]. Here, the entity is an item
Appears in the Proceedings of the 1st Italian Information Retrieval
                                                                      consumption that can be influenced by contextual variables
Workshop (IIR’10), January 27–28, 2010, Padova, Italy.
http://ims.dei.unipd.it/websites/iir10/index.html
Copyright owned by the authors.
                                                 i1     i2              a new item is created. This step can be repeated for all the
              i
                                                                        items having a significant dependency of their ratings on the
       u1     1                             u1         1                value of one contextual variable. In this paper we focus on
       u2     4                             u2   4                      a simple application of this method where an item is split
                               Splitting
       u3     2               of a single   u3         2                only into two items, using only one selected contextual vari-
                                 item i
       u4     5                             u4   5                      able. A more aggressive split of an item into several items,
       u5     2                             u5         2                using a combination of features, could produce even more
                                                                        “specialized” items, but potentially increasing data sparsity.
                  n items                             n+1 items
                                                                        We note again, that for the same user, and different items,
                      Figure 1: Item splitting                          one can in principle obtain ratings in different contexts, as in
                                                                        our context model context depends on the rating. Therefore,
                                                                        items i1 and i2 could overlap, i.e., could be rated both by
describing the state of the user and the item. In this paper            the same user in different contextual conditions. However,
we propose a new approach for using these contextual di-                such situation are not very common.
mensions to pre-filter items’ ratings. Actually, to be precise,            We conjecture that the splitting could be beneficial if the
the set of ratings for an item is not filtered but it is split into     ratings within each newly obtained item are more homoge-
two subsets according to the value of a contextual variable,            nous, or if they are significantly different in the new items
e.g., ratings collected in “winter” or in “summer” (the contex-         coming from a split. One way to accomplish this task is to
tual variable is the season of the rating/evaluation). These            define an impurity criteria t [7]. So, if there are some can-
two sets of ratings are then assigned to two new fictitious             didate splits s ∈ S, which divide i into i1 and i2 , we choose
items (e.g. beach in winter and in summer).                             the split s that maximizes t(i, s) over all possible splits in
   This paper extends the results presented in [5, 6]. Here             S. A split is determined by selecting a contextual variable
we evaluate the same item splitting technique in a differ-              and a partition of its values in two sets. Thus, the space of
ent set of experiments, namely we measure precision and                 all possible splits of item i is defined by the context model
recall, whereas previously we used MAE. Also the nine semi-             C. In this work we analyzed tIG impurity criteria. tIG (i, s)
synthetical data sets are generated differently. Moreover, we           measures the information gain (IG), also known as Kullback-
extended our analyzes by studying the behavior of item split-           Leibler divergence [10], given by s to the knowledge of the
ting with respect to the various Information Gain thresholds.           item i rating: tIG = H(i) − H(i1 )Pi1 + H(i2 )Pi2 where H(i)
                                                                        is the Shannon Entropy of the item i rating distribution and
2.    ITEM SPLITTING                                                    Pi1 is the proportion of ratings that i1 receives from item
                                                                        i. To ensure reliability of this statistic we compute it only
   Our approach extends the traditional CF data model by
                                                                        for a split S that could potentially generate items each con-
assuming that each rating rui in a m × n users-items matrix,
                                                                        taining 4 or more ratings. Thus, algorithm never generates
is stored (tagged) together with some contextual information
                                                                        items with less than 4 ratings in the profile.
c(u, i) = (c1 , . . . , cn ), cj ∈ Cj , describing the conditions un-
der which the user experience was collected (cj is a nominal
variable). The proposed method identifies items having sig-             3.     EXPERIMENTAL EVALUATION
nificant differences in the ratings (see later the exact test              We tested the proposed method on nine semi-synthetic
criteria). For each one of these items, our algorithm splits            data sets with ratings in {1, 2, 3, 4, 5}. The data sets were
its ratings into two subsets, creating two new artificial items         generated using Yahoo!1 Webscope movies data set contains
with ratings belonging to these two subsets. The split is               221K ratings, for 11,915 movies by 7,642 users. The semi-
determined by the value of one contextual variable cj , i.e.,           synthetic data sets were used to analyze item splitting when
all the ratings in a subset have been acquired in a context             varying the influence of the context on the user ratings. The
where the contextual feature cj took a certain value. So,               original Yahoo! data set contains user age and gender fea-
for each item the algorithm seeks for a contextual feature cj           tures. We used 3 age groups: users below 18 (u18), between
that can be used to split the item. Then it checks if the two           18 and 50 (18to50), and above 50 (a50). We modified the
subsets of ratings have some (statistical significant) differ-          original Yahoo! data set by replacing the gender feature
ence, e.g., in the mean. If this is the case, the split is done         with a new artificial feature c ∈ {0, 1} that was assigned
and the original item in the ratings matrix is replaced by the          randomly to the value 1 or 0 for each rating. This feature c
two newly generated items. In the testing phase, the rating             is representing a contextual condition that could affect the
predictions for the split item are computed for one of the              rating. We randomly choose α ∗ 100% items from the data
newly generated item. For example, assume that an item                  set and then from these items we randomly chose β ∗100% of
i has generated two new items i1 and i2, where i1 contains              the ratings to modify. We increased (decreased) the rating
ratings for item i acquired in the contextual condition cj =            value by one if c = 1 (c = 0) and if the rating value was
v, and i2 the ratings acquired in context cj v̄, hence the two          not already 5 (1). For example, if α = 0.9 and β = 0.5 the
sets partition the original set of ratings. Now assume that             corresponding synthetic data set has 90% of altered items’
the system needs to compute a rating prediction for the item            profiles that contains 50% of changed ratings. We gener-
i and user u in a context where cj = x. Then the prediction             ated nine semi-synthetic data sets varying α ∈ {0.1, 0.5, 0.9}
is computed for the item i1 if x = v, or i2 if x 6= v, and is           and β ∈ {0.1, 0.5, 0.9}. So, in these data set the contextual
returned as the prediction for i.                                       condition is more “influencing” the rating value as α and β
   Figure 1 illustrates the splitting of one item. As input,            increase.
the item splitting step takes a m × n rating matrix of m                   In this paper we used matrix factorization (FACT ) as the
users and n items and outputs a m × (n + 1) matrix. The
                                                                        1
total number of ratings in the matrix does not change, but                  Webscope v1.0, http://research.yahoo.com/
            10% Items 90% Ratings 50% Items 90% Ratings 90% Items 90% Ratings        10% Items 90% Ratings 50% Items 90% Ratings 90% Items 90% Ratings
          0.92                    0.92                  0.92
          0.90                    0.90                  0.90                       0.68                  0.68                  0.68
          0.88                    0.88                  0.88                       0.66                  0.66                  0.66
          0.86                    0.86                  0.86                       0.64                  0.64                  0.64
                                                                                   0.62                  0.62                  0.62
          0.84                    0.84                  0.84                       0.60                  0.60                  0.60
          0.82                    0.82                  0.82                       0.58                  0.58                  0.58
          0.80                    0.80                  0.80                       0.56                  0.56                  0.56
             10% Items 50% Ratings 50% Items 50% Ratings 90% Items 50% Ratings       10% Items 50% Ratings 50% Items 50% Ratings 90% Items 50% Ratings
          0.92                    0.92                  0.92
          0.90                    0.90                  0.90                       0.68                  0.68                  0.68
          0.88                    0.88                  0.88                       0.66                  0.66                  0.66
          0.86                    0.86                  0.86                       0.64                  0.64                  0.64
                                                                                   0.62                  0.62                  0.62
          0.84                    0.84                  0.84                       0.60                  0.60                  0.60
          0.82                    0.82                  0.82                       0.58                  0.58                  0.58
          0.80                    0.80                  0.80                       0.56                  0.56                  0.56
             10% Items 10% Ratings 50% Items 10% Ratings 90% Items 10% Ratings       10% Items 10% Ratings 50% Items 10% Ratings 90% Items 10% Ratings
          0.92                    0.92                  0.92
          0.90                    0.90                  0.90                       0.68                  0.68                  0.68
          0.88                    0.88                  0.88                       0.66                  0.66                  0.66
          0.86                    0.86                  0.86                       0.64                  0.64                  0.64
                                                                                   0.62                  0.62                  0.62
          0.84                    0.84                  0.84       No Context      0.60                  0.60                  0.60        No Context
          0.82                    0.82                  0.82                       0.58                  0.58                  0.58
          0.80                    0.80                  0.80       Reduction       0.56                  0.56                  0.56        Reduction
                                                                   Item-Split                                                              Item-Split
                                (a) Precision                                                              (b) Recall
                                   Figure 2: Comparison of contextual pre-filtering methods.

rating prediction technique. We used the algorithm imple-                        Subsection 3.3 we report result while choosing bigger values
mented and provided by Timely Development2 . FACT uses                           that typically decrease the impact of item splitting. As we
60 factors and the other parameters are set to the same                          expected, the smaller is the impact of the contextual fea-
values optimized for another data set (Netflix), so it might                     ture c, the smaller is the improvement of the performance
not be the best setting, but all the system variants that we                     measure obtained by the methods that do use the context.
compared used the same settings. To evaluate the described                       In fact, item splitting improved the performance of baseline
methods we used 5-fold cross-validation and measured pre-                        method for 4 data sets: α ∈ {0.5, 0.9}, β ∈ {0.5, 0.9}. The
cision and recall. The usage of precision and recall in recom-                   highest improvement for precision of 9.9% was observed for
mender systems needs some clarification. These measures,                         the data set α = 0.9, β = 0.9 where most items and most
in its purest sense, are impossible to measure as they would                     ratings were influenced by the artificial contextual feature.
require the knowledge of the rating (relevance) of each item                     Increasing the value of α and β, i.e., increasing the number
and user combination [9]. Usually there are thousands of                         of items and ratings that are correlated to the value of the
candidate items to recommend (11K in our case) and just                          context feature, decreased the overall precision and recall
for a small percentage of them we know the true user’s evalu-                    of the baseline method. We conjecture, that the contextual
ation (typically less than 1%) . Herlocker et al. [9] proposed                   condition plays the role of noise added to the data, even if
to approximate these measures by computing the prediction                        this is clearly not noise but a simple functional dependency
just for user × item pairs that are present in the ratings                       from a hidden variable. In fact, FACT cannot exploit the
data set, and consider items worth recommending (relevant                        additional information brought by this feature and cannot
items) only if the user rated them 4 or 5. We computed the                       effectively deal with the influence of this variable.
measures on full test set (of each fold), while trained the                         Reduction based approach increased precision by 1.3%
models on the train set. Please refer to [5] for additional                      only for α = 0.9, β = 0.9 data set. This is the data set,
experiments. These include the evaluation of other impu-                         where artificial contextual feature has highest influence on
rity criteria, the performance of the proposed method on                         the ratings and 90% of items are modified. In [1] the authors
the original Yahoo! data set, and experiments using other                        optimized MAE when searching for the contextual segments
prediction methods such as user-based CF while computing                         where the context-dependent prediction improves the de-
Mean Absolute Error (MAE).                                                       fault one (no context). Here, we searched for the segments
                                                                                 where precision and recall is improved and we used all better
3.1     Context-aware Prediction Methods                                         performing segments to make the predictions. For example,
   To understanding the potential of item splitting in a context-                Figure 2(a) reports the precision of reduction based. To con-
dependent set of ratings we tested this approach on the                          duct this experiment, the algorithm first sought (optimized)
semi-synthetical data sets described earlier, i.e., replacing                    the contextual segments where precision is improved (using a
the gender feature with a new contextual variable that does                      particular split of train and test data). Then, when it has to
influence the ratings. The baseline method is FACT when                          make a rating prediction, used either only the data in one of
no contextual information is considered. It is compared                          these segments, i.e., if the prediction is for a item-user com-
with the context-aware reduction based approach [1], and                         bination in one of the found segments, or all the data, i.e., if
our item splitting technique. Figure 2 shows comparison of                       the item-rating is in one contextual conditions where no im-
three methods for the nine semi-synthetic data sets. For                         provements can be found with respect to the baseline. Note,
each data set we computed precision and recall. We con-                          that in all three data sets where α = 0.5, β ∈ {0.1, 0.5, 0.9}
sidered item as worth recommending if algorithm made a                           the results are similar to the baseline approach. In these
prediction greater or equal to 4. For all the nine data sets                     cased the reduction base approach does consider the seg-
the algorithm splits an item if any split leads to an IG bigger                  ments generated using the artificial feature. However, the
than 0.01. The small IG threshold value led to a good re-                        data set was constructed in such a way that half of the items
sults in our previous experiments [6] and it allows algorithm                    do not have ratings’ dependencies on the artificial feature,
to split up to 15% of items (depending on the data set). In                      and no benefit is observed.
                                                                                    These experiments show that both context-aware pre-filtering
2
    http://www.timelydevelopment.com
                   1.00
                            90%i-50%r        1.00
                                                       90%i-90%r            gives valuable insights into the behavior of reduction base
                                                                            approach. We see, that at each level of the recommendation
                   0.95                      0.95                           threshold it shows a higher recall value than the other two
                   0.90                      0.90                           methods. At the highest level of precision, reduction based
       Precision
                                                                            approach is close to item splitting and gives improvement of
                   0.85                      0.85
                                                                            6.1% in precision for α = 0.9, β = 0.9 data set and 1.3% for
                   0.80       No Context     0.80        No Context         α = 0.9, β = 0.5 data set. But, the precision/recall curve of
                   0.75       Reduction      0.75        Reduction          reduction based is always below than that of item split.
                              Item-Split                 Item-Split            In conclusion we want to note that considering both pre-
                   0.700.0 0.2 0.4 0.6 0.8 1.00.700.0 0.2 0.4 0.6 0.8 1.0
                                                                            cision and recall, we see that both context-aware recommen-
                                Recall                     Recall           dation methods yields quite similar results. More noticeably,
Figure 3: Precision/recall curves for two data sets.                        both methods outperforms baseline CF which does not take
                                                                            context into account.
approaches can outperform the base line FACT CF method,
when the context influences the ratings. It is worth noting                 3.3    Item Splitting for Various IG Thresholds
that item splitting is computationally cheaper and it per-                     To better understand the item splitting method we fur-
formed better than reduction based. Note also that, accu-                   ther analyzed the prediction processes. We looked at the
racy could depend on the particular baseline prediction al-                 number of items the algorithm splits and also on which at-
gorithm, i.e., FACT in our experiments. However, we choose                  tribute the split was performed. For this purpose we var-
FACT as it is now currently largely used, and in our previ-                 ied the item splitting threshold parameter. For this experi-
ous experiments it outperformed traditional user-based CF                   ment we used tIG impurity measure and the three data sets:
method [5].                                                                 α = 0.9β ∈ {0.1, 0.5, 0.9}. The summary of the results are
                                                                            shown in Figure 4. Figures 4(a), 4(b), 4(c) show the number
3.2    Precision Versus Recall                                              of splits that the item split algorithm performs varying the
   In this section we illustrate the precision/recall curves for            IG threshold for the three considered data sets. When using
the three selected methods. For this experiments we reused                  α = 0.9, β = 0.1 the algorithm chooses the artificial fea-
the three data sets: α = 0.1, β ∈ {0.1, 0.5, 0.9}. As was done              ture approximately twice as often as the age feature. More
in the previous experiment, we set the IG threshold to 0.01.                precisely, when the threshold is IG = 0.2 item split splits
For the reduction based approach we optimized precision.                    101.8 items (on average in 5 folds); the artificial feature was
The results can be seen in Figure 3. The left figure shows                  chosen 69.8 and age feature was chosen 32 times. When
results for α = 0.9, β = 0.5 data set and the right figure for              the influence of artificial feature increases, a higher propor-
α = 0.9, β = 0.9. We skip the α = 0.9, β = 0.1 data set, as                 tion of items are split using the artificial feature. For the
for this data set all three methods perform similarly to each               α = 0.9, β = 0.9 data set and IG = 0.2 it splits 576.8 items
other. Each curve was computed by varying the threshold                     using the artificial feature and 29.8 using the age feature.
at which a recommendation is done. For example, all meth-                   Note, that despite IG favors attributes with many possible
ods obtained the highest precision when recommending the                    values [10] item splitting chooses the attribute having larger
items that were predicted as rating 5. In this case, we do                  influence on the rating. We further observe that the number
not recommend the items that were predicted with a lower                    of split items is not large. For all three data sets we split
rating. Note that we always count recommendation as rel-                    no more than 2050 items (17%). This low number can be
evant if user rated the item 4 or 5. We set the threshold                   explained by looking at the size of items’ profiles. Note that
to values equal to {1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5}. Note,               in the considered data sets the average number of ratings
that previous experiment (see. Figure 2) was done with the                  per item is 18.5. Algorithm splits item only if the newly
recommendation threshold equal to 4. The recall is equal to                 generated item has at least 4 ratings. Therefore, item must
1 if we recommend all the items, i.e., those predicted with                 have a minimum of 8 ratings to be considered for splitting.
a rating of 1 and higher. Even at this level of recall, the                 Lowering the minimum number of ratings in the item pro-
precision is more than 70%. This can be explained by the                    file, could cause unreliable computation of statistics and was
high fraction of high ratings in the data set.                              observed to decrease the overall performance.
   Recommender systems usually try to improve precision.                       Figures 4(d), 4(e) shows precision and recall accuracy
Having recall as small as 0.01, we could still be able to rec-              measures for three data sets. We observe, that item split-
ommend too many items for user to consume, i.e., approx-                    ting is only beneficial when context (i.e., artificial feature
imately 119 items in our data set. Interestingly, as we can                 here) has an high influence on the rating. The best perfor-
see it is also much harder to make precise recommendations                  mance for the α = 0.9, β = 0.1 data set, both for recall and
than to obtain high recall. The curves for all three meth-                  precision, is obtained when no items are split. Each split
ods get flat when approaching precision 0.97. At this point                 of an item affects also the prediction for the items that are
we recommend only the items that were predicted with rat-                   not split. Splitting an item is equivalent to create two new
ing 5. This is the maximum possible predicted rating by                     items and deleting one, therefore, it causes a modification of
FACT and precision can not be improved by varying the                       the data set. When CF generates a prediction for a target
threshold at which recommendation is done. We also ob-                      user-item pair all the other items’ ratings, including those
serve, that we can achieve higher maximum precision for                     in the new items coming from some split, are used to build
item splitting method comparing to other methods. When                      that prediction. In [5] we observed that we can increase
α = 0.9 and β = 0.9, the highest precision value for item                   the performance on split items, but at the same time the
split improves by 7% the baseline method. The improve-                      decrease of performance on the untouched items can cancel
ment when α = 0.9 and β = 0.5 is 2.7%. This experiment                      any benefit. When α = 0.9, β = 0.5 the situation is dif-
                    1800                                          1800                                          1800                                       0.92                                   0.68
                    1600              Artificial                  1600              Artificial                  1600              Artificial               0.91                                   0.66
                    1400              Age                         1400              Age                         1400              Age                      0.90
                    1200                                          1200                                          1200                                       0.89                                   0.64
     #Items Split


                                                   #Items Split


                                                                                                 #Items Split


                                                                                                                                               precision
                    1000                                          1000                                          1000                                       0.88


                                                                                                                                                                                         recall
                     800                                           800                                           800                                       0.87                                   0.62
                     600                                           600                                           600                                       0.86         90%i-10%r                 0.60         90%i-10%r
                     400                                           400                                           400                                       0.85         90%i-50%r                 0.58         90%i-50%r
                     200                                           200                                           200                                       0.84         90%i-90%r                              90%i-90%r
                       00.0 0.2 0.4 0.6 0.8 1.0                      00.0 0.2 0.4 0.6 0.8 1.0                      00.0 0.2 0.4 0.6 0.8 1.0                0.830.0 0.2 0.4 0.6 0.8 1.0            0.560.0 0.2 0.4 0.6 0.8 1.0
                               treshold (IG)                                 treshold (IG)                                 treshold (IG)                              treshold (IG)                          treshold (IG)
                      (a) 90%i-10%r                                 (b) 90%i-50%r                                 (c) 90%i-90%r                               (d) precision                            (e) recall

                                                                  Figure 4: Item splitting behavior for different thresholds.

ferent. We observe, that here splitting more items leads to                                                                     5.     REFERENCES
an increase in precision and decrease in recall. Finally, for                                                                    [1] G. Adomavicius, R. Sankaranarayanan, S. Sen, and
α = 0.9, β = 0.9 splitting more items increase the precision                                                                         A. Tuzhilin. Incorporating contextual information in
and recall, and this is maximum when the IG threshold is                                                                             recommender systems using a multidimensional
equal to 0.1. In conclusion, we could regard item split as a                                                                         approach. ACM Transactins on Information Systems,
more dynamical version of reduction based. Here the split                                                                            23(1):103–145, 2005.
is done for each item separately and using an external mea-                                                                      [2] G. Adomavicius and A. Tuzhilin. Toward the next
sure (such as IG) to decide if the split is needed. Using the                                                                        generation of recommender systems: A survey of the
IG criteria, splitting items is beneficial when context highly                                                                       state-of-the-art and possible extensions. IEEE
influences the ratings.                                                                                                              Transactions on Knowledge and Data Engineering,
                                                                                                                                     17(6):734–749, 2005.
                                                                                                                                 [3] G. Adomavicius and A. Tuzhilin. Context-aware
4.                  CONCLUSIONS AND FUTURE WORK                                                                                      recommender systems. In P. Pu, D. G. Bridge,
   This paper evaluates a contextual pre-filtering technique                                                                         B. Mobasher, and F. Ricci, editors, RecSys, pages
for CF, called item splitting. Based on the assumption                                                                               335–336. ACM, 2008.
that certain items may have different evaluations in dif-                                                                        [4] S. S. Anand and B. Mobasher. Contextual
ferent contexts, we proposed to use item splitting to cope                                                                           recommendation. In Lecture Notes In Artificial
with this. The method is compared with a classical context-                                                                          Intelligence, volume 4737, pages 142–160.
aware pre-filtering approach [1] which uses extensive search-                                                                        Springer-Verlag, Berlin, Heidelberg, 2007.
ing to find the contextual segments that improve the base-                                                                       [5] L. Baltrunas and F. Ricci. Context-based splitting of
line prediction. As a result we observed that despite the                                                                            item ratings in collaborative filtering. In L. D.
increased data sparsity, item splitting is beneficial, when                                                                          Bergman, A. Tuzhilin, R. Burke, A. Felfernig, and
some contextual feature separates the item ratings into two                                                                          L. Schmidt-Thieme, editors, RecSys, pages 245–248.
more homogeneous rating groups. However, if the contex-                                                                              ACM, 2009.
tual feature is not influential the splitting technique some-
                                                                                                                                 [6] L. Baltrunas and F. Ricci. Context-dependent items
times produced a minor decrease of the precision and re-
                                                                                                                                     generation in collaborative filtering. In
call. Item-splitting outperforms reduction based context-
                                                                                                                                     G. Adomavicius and F. Ricci, editors, Proceedings of
aware approach when FACT CF method is used. Moreover,
                                                                                                                                     the 2009 Workshop on Context-Aware Recommender
the method is more time and space efficient and could be
                                                                                                                                     Systems, 2009.
used with large context-enriched data bases.
                                                                                                                                 [7] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J.
   The method we proposed can be extended in several ways.
                                                                                                                                     Stone. Classification and Regression Trees.
For instance one can try to split the users (not the items)
                                                                                                                                     Statistics/Probability Series. Wadsworth Publishing
according to the contextual features in order to represent
                                                                                                                                     Company, Belmont, California, U.S.A., 1984.
the preferences of a user in different contexts by using vari-
ous parts of the user profile. Another interesting problem is                                                                    [8] A. K. Dey. Understanding and using context. Personal
to find a meaningful item splitting in continuous contextual                                                                         Ubiquitous Comput., 5(1):4–7, February 2001.
domains such as time or temperature. Here, the splitting                                                                         [9] J. L. Herlocker, J. A. Konstan, L. G. Terveen, John,
is not easily predefined but have to be searched in the con-                                                                         and T. Riedl. Evaluating collaborative filtering
tinuous space. Finally, item splitting could ease the task of                                                                        recommender systems. ACM Transactions on
explaining recommendations. The recommendation can be                                                                                Information Systems, 22:5–53, 2004.
made for the same item in different context. The contextual                                                                     [10] J. R. Quinlan. C4.5: Programs for Machine Learning
condition on which the item was split could be mentioned                                                                             (Morgan Kaufmann Series in Machine Learning).
as justifications of the recommendations. For example, we                                                                            Morgan Kaufmann, 1 edition, January 1993.
recommend you to go to the museum instead of going to the                                                                       [11] P. Resnick and H. R. Varian. Recommender systems.
beach as it will be raining today. We would also like to ex-                                                                         Communications of the ACM, 40(3):56–58, 1997.
tend our evaluation of the proposed algorithm. First of all,                                                                    [12] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen.
we want to use real world context-enriched data. Moreover,                                                                           Collaborative filtering recommender systems. In The
we want to evaluate precision and recall at top-N recommen-                                                                          Adaptive Web, pages 291–324. Springer Berlin /
dation list. At the end, we want to develop a solution to be                                                                         Heidelberg, 2007.
able to deal with missing contextual values.

</pre>