                  Exploiting Distributional Semantics Models for
                  Natural Language Context-aware Justifications
                            for Recommender Systems
               Cataldo Musto                                         Giuseppe Spillo                Marco de Gemmis
           University of Bari, Italy                             University of Bari, Italy        University of Bari, Italy
           cataldo.musto@uniba.it                             giuseppe.spillo@studenti.uniba.it marco.degemmis@uniba.it.it

                                           Pasquale Lops                                 Giovanni Semeraro
                                       University of Bari, Italy                        University of Bari, Italy
                                       pasquale.lops@uniba.it                         giovanni.semeraro@uniba.it

ABSTRACT                                                                              To this end, several attempts have been recently devoted to
In this paper we present a methodology to generate context-                           investigate how to introduce explanation facilities in RSs [16]
aware natural language justifications supporting the sugges-                          and to identify the most suitable explanation styles [4]. De-
tions produced by a recommendation algorithm. Our approach                            spite such a huge research effort, none of the methodologies
relies on a natural language processing pipeline that exploits                        currently presented in literature diversifies the justifications
distributional semantics models to identify the most relevant                         based on the different contextual situations in which the item
aspects for each different context of consumption of the item.                        will be consumed. This is a clear issue, since context plays
Next, these aspects are used to identify the most suitable pieces                     a key role in every decision-making task, and RSs are no ex-
of information to be combined in a natural language justifi-                          ception. Indeed, as the mood or the company (friends, family,
cation. As information source, we used a corpus of reviews.                           children) can direct the choice of the movie to be watched,
Accordingly, our justifications are based on a combination of                         so a justification that aims to convince a user to enjoy a rec-
reviews’ excerpts that discuss the aspects that are particularly                      ommendation should contain different concepts depending on
relevant for a certain context.                                                       whether the user is planning to watch a movie with her friends
                                                                                      or with her children.
In the experimental evaluation, we carried out a user study
in the movies domain in order to investigate the validity of                          In this paper we fill in this gap by proposing an approach to
the idea of adapting the justifications to the different contexts                     generate a context-aware justification that supports a recom-
of usage. As shown by the results, all these claims were                              mendation. Our methodology exploits distributional semantics
supported by the data we collected.                                                   models [5] to build a term-context matrix that encodes the im-
                                                                                      portance of terms and concepts in each context of consumption.
Author Keywords                                                                       Such a matrix is used to obtain a vector space representation
Recommender Systems, Explanation, Natural Language                                    of each context, which is in turn used to identify the most
Processing, Opinion Mining                                                            suitable pieces of information to be combined in a justification.
                                                                                      As information source, we used a corpus of reviews. Accord-
                                                                                      ingly, our justifications are based on a combination of reviews’
                                                                                      excerpts that discuss with a positive sentiment the aspects
Recommender Systems (RSs) [19] are now recognised as a
                                                                                      that are particularly relevant for a certain context. Beyond its
very effective mean to support the users in decision-making
                                                                                      context-aware nature, another distinctive trait of our methodol-
tasks [20]. However, as the importance of such technology
                                                                                      ogy is the fact that we generate post-hoc justifications that are
in our everyday lives grows, it is fundamental that these al-
                                                                                      completely independent from the underlying recommendation
gorithms support each suggestion through a justification that
                                                                                      models and completely separated from the step of generating
allows the user to understand the internal mechanisms of the
                                                                                      the recommendations.
recommendation process and to more easily discern among
the available alternatives.                                                           To sum up, we can summarize the contributions of the article
                                                                                      as follows: (i) we propose a methodology based on distribu-
                                                                                      tional semantics models and natural language processing to
                                                                                      automatically learn a vector space representation of the differ-
                                                                                      ent contexts in which an item can be consumed; (ii) We design
                                                                                      a pipeline that exploits distributional semantics models to gen-
                                                                                      erate context-aware natural language justifications supporting
Copyright (c) 2020 for this paper by its authors. Use permitted under Creative Com-   the suggestions returned by any recommendation algorithm;
mons License Attribution 4.0 International (CC BY 4.0).
IntRS ’20 - Joint Workshop on Interfaces and Human Decision Making for Recom-
mender Systems, September 26, 2020, Virtual Event
The rest of the paper is organized as follows: first, in Section    METHODOLOGY
2 we provide an overview of related work. Next, Section 3           Our workflow to generate context-aware justifications based
describes the main components of our workflow and Section           on users’ reviews is shown in Figure 1. In the following, we
4 discusses the outcomes of the experimental evaluation. Fi-        will describe all the modules that compose the workflow.
nally, conclusions and future work of the current research are
provided in Section 5.                                              Context Learner. The first step is carried out by the C ON -
                                                                    TEXT L EARNER module, which exploits DSMs to learn a
                                                                    vector space representation of the contexts. Formally, given a
RELATED WORK                                                        set reviews R and a set of k contextual settings C = {c1 . . . ck },
The current research borrows concepts from review-based             this module generates as output a matrix Cn,k that encodes the
explanation strategies and distributional semantics models. In      importance of each term ti in each contextual setting c j . In or-
the following, we will try to discuss relevant related work and     der to build such a representation, we first split all the reviews
to emphasize the hallmarks of our methodology.                      r ∈ R in sentences. Next, let S be the set of previously obtained
Review-based Explanations. According to the taxonomy dis-           sentences, we manually annotated a subset of these sentences
cussed in [3], our approach can be classified as a content-based    in order to obtain a set S0 = {s1 . . . sm }, where each si is labeled
explanation strategy, since the justifications we generate are      with one or more contextual settings, based on the concepts
based on descriptive features of the item. Early attempts in the    mentioned in the review. Of course, each si can be annotated
area rely on the exploitation of tags [24] and features gathered    with more than one context. As an example, a review includ-
from knowledge graphs [11]. With respect to classic content-        ing the sentence ’a very romantic movie’ is annotated with the
based strategies, the novelty of the current work lies in the       contexts company=partner, while the sentence ’perfect for a
use of review data to build a natural language justification. In    night at home’ is annotated with the contexts day=weekday.
this research line, [2] Chen et al. analyze users’ reviews to       After the annotation step, a sentence-context matrix Am,k is
identify relevant features of the items, which are presented on     built, where each asi ,c j is equal to 1 if the sentence si is anno-
an explanation interface. Differently from this work, we did        tated with the context c j (that is to say, it mentions concepts
not bound on a fixed set of static aspects and we left the expla-   that are relevant for that context), 0 otherwise.
nation algorithm deciding and identifying the most relevant         Next, we run tokenization and lemmatization algorithms [7]
concepts and aspects for each contextual setting. A similar         over the sentences in S to obtain a lemma-sentence matrix Vn,m .
attempt was also proposed in [1]. Moreover, as previously em-       In this case, vti ,s j is equal to the TF/IDF of the term ti in the
phasized, a trait that distinguishes our approach with respect      sentence s j . Of course, IDF is calculated over all the annotated
to such literature is the adaptation of the justification based     sentences. In order to filter out non-relevant lemmas, we
on the different setting in which the item is consumed. The         maintained in the matrix V just nouns and adjectives. Nouns
only work exploiting context in the justification process has       were chosen due to previous research [15], which showed that
been proposed by Misztal et al. in [9]. However, differently        descriptive features of an item are usually represented using
from our work, they did not diversify the justifications of the     nouns (e.g., service, meal, location, etc.). Similarly, adjectives
same items on varying of different contextual settings in which     were included since they play a key role in the task of catching
the item is consumed, since they just adopt features inspired       the characteristics of the different contextual situations (e.g.,
by context (e.g., "I suggest you this movie since you like this     romantic, quick, etc.). Moreover, we also decided to take
genre in rainy days") to explain a recommendation.                  into account and extract combinations of nouns and adjectives
Distributional Semantics Models. Another distinctive trait          (bigrams) such as romantic location, since they can be very
of the current work is the adoption of distributional seman-        useful to highlight specific characteristics of the item.
tics models (DMSs) to build a vector space representation of        In the last step of the process annotation matrix An,k and vocab-
the different contextual situations in which an item can be         ulary matrix Vm,n are multiplied to obtain our lemma-context
consumed. Typically, DSMs rely on a term-context matrix,            matrix Cn,k , which represents the final output returned by the
where rows represent the terms in the corpus and columns            C ONTEXT L EARNER module. Of course, each ci, j encodes
represents contexts of usage. For the sake of simplicity, we        the importance of term ti in the context c j . The whole process
can imagine a context as a fragment of text in which the term       carried out by this component is described in Figure 2.
appears, as a sentence, a paragraph or a document. Every
time a particular term is used in a particular context, such an     Given such a representation, two different outputs are obtained.
information is encoded in this matrix. One of the advantages        First, we can directly extract column vectors ~c j from matrix C,
that follows the adoption of DSMs is that they can learn a          which represents the vector space representation of the context
vector space representation of terms in a totally unsupervised      c j based on DSMs. It should be pointed out that such a repre-
way. These methods, recently inspired methods in the area           sentation perfectly fits the principles of DSMs since contexts
of word embeddings, such as W ORD 2V EC [8] and contextual          discussed through the same lemmas will share a very similar
word representations [21]. Even if some attempts evaluating         vector space representation. Conversely, a poor overlap will
RSs based on DSMs already exists [13, 12, 14], in our attempt       result in very different vectors. Moreover, for each column,
we used DSMs to build a vector-space representation of the          lemmas may be ranked and those having the highest TF-IDF
different contextual dimensions. Up to our knowledge, the           scores may be extracted. In this way, we obtain a lexicon of
usage of DSMs for justification purposes this is a completely       lemmas that are relevant for a particular contextual setting,
new research direction in the area of explanation.                  and this can be useful to empirically validate the effectiveness
                           Figure 1: Workflow to generate Context-aware Justifications by Exploiting DSMs

                             v1,1 v1,2 . . . v1,m     a1,1 a1,2 . . . a1,k     c1,1 c1,2 . . . c1,k
                                                                                               
                            v2,1 v2,2 . . . v2,m   a2,1 a2,2 . . . a2,k  c2,1 c2,2 . . . c2,k 
                             .
                             ..   ..    ..    .. x .     ..    ..   ..  = .     ..    ..   .. 
                                    .     .     .   ..     .     .    .   ..      .     .    . 
                             vn,1 vn,2 . . . vn,m    am,1 am,2 . . . am,k      cn,1 cn,2 . . . cn,k
                                         Vn,m                               Am,k                           Cn,k

                     Figure 2: Building a lemma-context matrix C by exploiting distributional semantics models

of the approach. In Table 1, we anticipate some details of                         EXPERIMENTAL EVALUATION
our experimental session and we report the top-3 lemmas for                        The experimental evaluation was designed to identify the
two different contextual settings starting from a set of movie                     best-performing configuration of our strategy, on varying
reviews.                                                                           of different combinations of the parameters of the workflow
                                                                                   (Research Question 1), and to assess how our approach per-
Ranker. Given a recommended item (along with its reviews)
                                                                                   forms in comparison to other methods (both context-aware
and given the context in which the item will be consumed
                                                                                   and non-contextual) to generate post-hoc justifications (Re-
(from now on, defined as ’current context’), this module has
                                                                                   search Question 2). To this end, we designed a user study
to identify the most relevant review excerpts to be included in
                                                                                   involving 273 subjects (male=50%, degree or PhD=26.04%,
the justification. To this end, we designed a ranking strategy
                                                                                   age≥35=49,48%, already used a RS=85.4%) in the movies
that exploits DSMs and similarity measures in vector spaces to
                                                                                   domain. Interest in movies was indicated as medium or high
identify suitable excerpts: given a set of n reviews discussing
                                                                                   by 62.78% of the sample. Our sample was obtained through
the item i, Ri = {ri,1 . . . ri,n }, we first split each ri in sentences.
                                                                                   the availability sampling strategy, and it includes students,
Next, we processed the sentences through a sentiment anal-
                                                                                   researchers in the area and people not skilled with computer
ysis algorithm [6, 17] in order to filter out those expressing
                                                                                   science and recommender systems.
a negative or neutral opinions about the item. The choice is
justified by our focus on review excerpts discussing positive                      Experimental Design. To run the experiment, we deployed a
characteristics of the item. Next, let c j be the current con-                     web application1 implementing the methodology described in
textual situation (e.g., company=partner), we calculate the                        Section 3. Next, as a first step, we identified the relevant con-
cosine similarity between the context vector ~c j returned by                      textual dimensions for each domain. Contexts were selected by
the C ONTEXT L EARNER and a vector space representation                            carrying out an analysis of related work of context-aware rec-
of each sentence ~si . The sentences having the highest cosine                     ommender systems in the M OVIE domain. In total, we defined
similarity w.r.t. to the context of usage c j are selected as the                  3 contextual dimensions, that is to say, mood (great, normal),
most suitable excerpts and are passed to the G ENERATOR.                           company (family, friends, partner) and level of attention (high,
                                                                                   low). To collect the data necessary to feed our web applica-
Generator. Finally, the goal of G ENERATOR is to put together
                                                                                   tion, we selected a subset of 300 popular movies (according to
the compliant excerpts in a single natural language justifica-
                                                                                   IMDB data) discussed in more than 50 reviews in the Amazon
tion. In particular, we defined a slot-filling strategy based on
                                                                                   Reviews dataset 2 . This choice is motivated by our need of a
the principles of Natural Language Generation [18]. Such a
                                                                                   large set of sentences discussing the item in each contextual
strategy is based on the combination of a fixed part, which
                                                                                   setting. These data were processed by exploiting lemmatiza-
is common to all the justifications, and a dynamic part that
                                                                                   tion, POS-tagging and sentiment analysis algorithms available
depends on the outputs returned by the previous steps. In our
                                                                                   in CoreNLP3 and Stanford Sentiment Analysis algorithm4 .
case, the top-1 sentence for each current contextual dimension
is selected, and the different excerpts are merged by exploiting                   1
simple connectives, such as adverbs and conjunctions. An                           2 http://jmcauley.ucsd.edu/data/amazon/links.html - Only the
example of the resulting justifications is provided in Table 2.                    reviews available in the ’Movies and TV’ category were downloaded.
                                                                                   3 https://stanfordnlp.github.io/CoreNLP/
                                                                                   4 https://nlp.stanford.edu/sentiment/
                                           Attention=high                             Attention=low
                Unigrams             engaging, attentive, intense                  simple, smooth, easy
                Bigrams        intense plot, slow movie, life metaphor    easy vision, simple movie, simple plot

Table 1: Top-3 lemmas returned by the C ONTEXT L EARNER module for two couples of different contextual settings in the M OVIE

        Restaurant                                                     Justification
                                    You should watch ’Stranger than Fiction’. It is a good movie to watch with your
                                       partner because it has a very romantic end. Moreover, plot is very intense.
                                       You should watch ’Stranger than Fiction’. It is a good movie to watch with
                           friends since the film crackles with laughther and pathos and it is a classy sweet and funny movie.

Table 2: Context-aware justifications for the R ESTAURANT domain. Automatically extracted review excerpts are reported in

tool. Some statistics about the final dataset are provided in          persuasiveness, engagement and trust of the recommenda-
Table 3.                                                               tion process through a five-point scale (1=strongly disagree,
                                                                       5=strongly agree). The questions the users had to answer
In order to compare different configurations of the workflow,          follow those proposed in [23]. Due to space reasons, we
we designed several variant obtained by varying the vocabu-
                                                                       can’t report the questions and we suggest to interact with
lary of lemmas. In particular, we compared the effectiveness
                                                                       the web application to fill in the missing details.
of simple unigrams, of bigrams and their merge. In the first
case, we encoded in our matrix just single lemmas (e.g., ser-       4. Comparison to baselines. Finally, we compared our method
vice, meal, romantic, etc.) while in the second we stored              to two different baselines in a within-subject experiment.
combination of nouns and adjectives (e.g., romantic location).         In this case, all the users were provided with two different
Due to space reasons, we can’t provide more details about the          justifications styles (i.e., our context-aware justifications
lexicons we learnt, and we suggest to refer again to Table 1           and a baseline) and we asked the users to choose the one
for a qualitative evaluation of some of the resulting representa-      they preferred. As for the baselines, we focused on other
tions. Our representations based on DSMs were obtained by              methodologies to generate post-hoc justifications and we se-
starting from a set of 1,905 annotations for the movie domain,         lected (i) a context-aware strategy to generate justifications,
annotated by three annotators by adopting a majority vote              which is based on a set of manually defined relevant terms
strategy. To conclude, each user involved in the experiment            for each context; (ii) a method to generate non-contextual
carried out the following steps:                                       review-based justifications that relies on the automatic iden-
1. Training, Context Selection and Generation of the Recom-            tification of relevant aspects and on the selection of compli-
   mendation. First, we asked the users to provide some basic          ant reviews excerpts containing such terms. Such approach
   demographic data and to indicate their interest in movies.          partially replicates that presented in [10].
   Next, each user indicated the context of consumption of the      Discussions of the Results Results of the first experiment,
   recommendation, by selecting a context among the different       that allows to answer to Research Question 1, are presented in
   contextual settings we previously indicated (see Figure 3-a).    Table 4. The values in the tables represent the average scores
   Given the current context, a suitable recommendation was         provided by the users for each of the previously mentioned
   identified and presented to the user. As recommendation al-      questions. As for the movie domain, results show that the over-
   gorithm we used a content-based recommendation strategy          all best results are obtained by using a vocabulary based on
   exploiting users’ reviews.                                       unigrams and bigrams. This first finding provides us with an
                                                                    interesting outcome, since most of the strategies to generate ex-
2. Generation of the Justification. Given the recommendation        planations are currently based on single keywords and aspects.
   and the current context of consumption, we run our pipeline      Conversely, our experiment showed that both adjectives as
   to generate a context-aware justification of the item adapted    well as couples of co-occurring terms are worth to be encoded,
   to that context. In this case, we designed a between-subject     since they catch more fine-grained characteristics of the item
   protocol. In particular, each user was randomly assigned to      that are relevant in a particular contextual setting. Overall,
   one of the three configurations of our pipeline and the output   the results we obtained confirmed the validity of the approach.
   was presented to the user along with the recommendation          Beyond the increase in T RANSPARENCY, high evaluations
   (see Figure 3-b). Clearly, the user was not aware of the         were also noted for P ERSUASION and E NGAGEMENT metrics.
   specific configuration he was interacting with.                  This outcome confirms how the identification of relevant re-
                                                                    views’ excerpts can lead to satisfying justifications. Indeed,
3. Evaluation through Questionnaires. Once the justification        differently from feature-based justifications, that typically rely
   was shown, we asked the users to fill in a post-usage ques-      on very popular and well-known characteristics of the movie,
   tionnaire. Each user was asked to evaluate transparency,         as the actors or the director, more specific aspects of the items
                       #Items      #Reviews    #Sentences     #Positive Sent.      Avg. Sent./Item       Avg. Pos. Sent./Item
           M OVIES      307         153,398     1,464,593        560,817              4,770.66                 1,826.76

                                                  Table 3: Statistics of the dataset

                (a) Context Selection                                           (b) Recommendation and Justification

                                           Figure 3: Interaction with the web application.

emerge from users’ reviews.                                           preferred by users. This confirms the effectiveness of our ap-
                                                                      proach and paves the way to several future research directions,
Next, in order to answer to Research Question 2, we com-              such as the definition of personalized justification as well as
pared the best-performing configurations emerging from Ex-            the generation of hybrid justifications that combine elements
periment 1 to two different baselines. The results of these           gathered from user-generated content (as the reviews) with
experiments are reported in Table 5 which show the percent-
                                                                      descriptive characteristics of the items. Finally, we will also
age of users who preferred our context-aware methodology
                                                                      evaluate to what extent these justifications can explain the
based on DSMs to both the baselines. In particular, the first
                                                                      behavior of complex and non-scrutable models such as those
comparison allowed us to assess the effectiveness of a vector
                                                                      based on complex deep learning techniques [22].
space representation of contexts based on DSMs with respect
to a simple context-aware justification method based on a fixed       REFERENCES
lexicon of relevant terms, while the second comparison inves-          [1] Shuo Chang, F Maxwell Harper, and Loren Gilbert
tigated how valid was the idea of diversifying the justifications          Terveen. 2016. Crowd-based Personalized Natural
based on the different contextual settings in which the items              Language Explanations for Recommendations. In
is consumed. As shown in the table, our approach was the                   Proceedings of the 10th ACM Conference on
preferred one in both the comparisons. It should be pointed out            Recommender Systems. ACM, 175–182.
that the gaps are particularly large when our methodology is
compared to a non-contextual baseline. In this case, we noted          [2] Li Chen and Feng Wang. 2017. Explaining
a statistically significant gap (p ≤ 0.05) for all the metrics,            Recommendations based on Feature Sentiments in
with the exception of trust. This suggests that diversifying the           Product Reviews. In Proceedings of the 22nd
justifications based on the context of consumption is particu-             International Conference on Intelligent User Interfaces.
larly appreciated by the users. This confirms the validity of              ACM, 17–28.
our intuition, which led to a completely new research direction        [3] Gerhard Friedrich and Markus Zanker. 2011. A
in the area of justifications for recommender systems.                     taxonomy for generating explanations in recommender
                                                                           systems. AI Magazine 32, 3 (2011), 90–98.
CONCLUSIONS AND FUTURE WORK                                            [4] Fatih Gedikli, Dietmar Jannach, and Mouzhi Ge. 2014.
In this paper we presented a methodology that exploits DSMs                How should I explain? A comparison of different
to build post-hoc context-aware natural language justifications            explanation types for recommender systems.
supporting the suggestions generated by a RS. The hallmark                 International Journal of Human-Computer Studies 72, 4
of this work is the diversification of the justifications based            (2014), 367–382.
on the different contextual settings in which the items will           [5] Alessandro Lenci. 2008. Distributional semantics in
be consumed, which is a new research direction in the area.                linguistic and cognitive research. Italian journal of
As shown in our experiments, our justifications were largely               linguistics 20, 1 (2008), 1–31.
                               Metrics / Configuration     Unigrams     Bigrams     Uni+Bigrams
                                   Transparency              3.38         3.81          3.64
                                     Persuasion              3.56         3.62          3.54
                                    Engagement               3.54         3.72          3.70
                                         Trust               3.44         3.66          3.61

Table 4: Results of Experiment 1 for the M OVIE domain. The best-performing configuration is reported in bold and underlined
                                     vs. Context-aware Static Baseline        vs. Non-Contextual Baseline
                 Metrics / Choice
                                     CA+DSMs Baseline Indifferent           CA+DSMs Baseline Indifferent
                   Transparency       52.38%      38.10%      19.52%         53.21%     34.47%     12.32%
                    Persuasion        54.10%      36.33%      19.57%         55.17%     32.33%     12.50%
                   Engagement         49.31%      39.23%      11.56%         44.51%     32.75%     22.74%
                       Trust          42.86%      39.31%      17.83%         42.90%     42.11%     14.99%

Table 5: Results of Experiment 2, comparing our approach (CA+DSMs) to a context-aware baseline that does not exploit DSMs
(CA Static) and to a non-contextual baseline that exploit users’ reviews (review-based). The configuration preferred by the higher
percentage of users is reported in bold.

