An Investigation on the Impact of Natural Language
on Conversational Recommendations
Discussion Paper

Andrea Iovine1 , Fedelucio Narducci2 , Marco de Gemmis1 and Giovanni Semeraro1
1
    University of Bari Aldo Moro, Via E. Orabona 4, 70125, Bari, Italy
2
    Politecnico di Bari, Via E. Orabona 4, 70125, Bari, Italy


                                         Abstract
                                         In this paper, we investigate the combination of Virtual Assistants and Conversational Recommender
                                         Systems (CoRSs) by designing and implementing a framework named ConveRSE, for building chatbots
                                         that can recommend items from different domains and interact with the user through natural language.
                                         An user experiment was carried out to understand how natural language influences both the cost of
                                         interaction and recommendation accuracy of a CoRS. Experimental results show that natural language
                                         can indeed improve user experience, but some critical aspects of the interaction should be mitigated
                                         appropriately.

                                         Keywords
                                         Conversational Recommender Systems, Natural Language Processing, Conversational Agents, Informa-
                                         tion Retrieval


1. Introduction
The rise of Virtual Assistants (VA) has generated a significant shift in the way people interact
with their devices. VAs allow users to perform everyday tasks (e.g. booking tickets, memorizing
appointments) using only voice or text. They have also been proposed as a platform for
delivering personalized recommendations [1] through the development of a Conversational
Recommender System (CoRS). CoRSs acquire the user profile via an interactive, human-like
dialogue [2], which can be easily adapted to a natural language interface.
   In this paper, we present an investigation on the introduction of natural language in a CoRS,
and its effect in terms of quality of the recommendations and user experience [3]. For this
purpose we developed ConveRSE, a domain-independent framework for the development of
CoRSs. We conducted a user study involving three different domains (movies, books, music), in
which participants interacted with the system and evaluated the recommendations.
   Our contributions are summarized as follows: (i) we propose a domain-independent frame-
work for the development of Conversational Recommender Systems that supports natural
language interaction; (ii) we perform a user study on the effect of different interaction modal-
ities on the interaction cost and recommendation quality of a CoRS. This extended abstract

IIR 2021 – 11th Italian Information Retrieval Workshop, September 13–15, 2021, Bari, Italy
" andrea.iovine@uniba.it (A. Iovine); fedelucio.narducci@poliba.it (F. Narducci); marco.degemmis@uniba.it
(M. d. Gemmis); giovanni.semeraro@uniba.it (G. Semeraro)
 0000-0002-4169-6724 (A. Iovine)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
presents a work that was previously published in Decision Support Systems [4].

2. Related Work
A Conversational Recommender System (CoRS) is defined as a system that provides recommen-
dations to users via a multi-turn dialogue [5]. CoRSs are characterized by the fact that they
acquire the user profile in an iterative fashion. The system can interact with users by asking
them to rate some items, and in turn they can influence the outcome of the recommendation by
providing feedback on the suggested items. Traditional recommender systems, on the other
hand, require that all user information is provided before generating a recommendation [6].
   The idea of combining together Virtual Assistants and recommender systems has been
introduced in [7], which highlights the technological gap between the two systems. The authors
sustain that a VA can improve the recommendation process because it can learn the users’
evolving, diverse and multi-aspect preferences. We investigate this claim by proposing the
integration of a chat-based interface into a CoRS.
   CoRSs have been developed using many different input and output modalities, such as forms
[8, 9, 6, 10] and voice/text [11, 12, 13, 14]. CoRSs also differ based on the preference elicitation
strategy, such as constraint-based [8, 11], critiquing-based [6, 15, 16], or strategies that rely on
acquiring pairwise preferences [17]. In this paper, we propose a system that features a user-
driven preference elicitation strategy, which can build the profile via natural language messages
that are directly provided by users. We also compare its effectiveness against a system-driven,
question-answer elicitation strategy.

3. Workflow and System Architecture
Interaction with ConveRSE is divided into three main steps, which are also found in [5]: (i)
preference elicitation; (ii) generation of recommendations, (iii) acquisition of user feedback.
These steps are repeated until a satisfactory recommendation is generated. Preference elicitation
is the most important step in any CoRS [5], and this is no exception in ConveRSE. In particular,
this step is largely driven by users, who interact with the CoRS by talking about the items
and the properties that they like or dislike (e.g. "I like The Matrix", "I love Sylvester Stallone, but
I hate Rocky"), as shown in Figure 2. This increases flexibility and allows the recommender
system to focus on the features that matter to users, compared to a system-driven interface
that proactively proposes the items to evaluate. After a recommendation has been generated,
users can provide additional feedback on it (e.g. "I don’t like this movie", "I like it but I don’t like
the genre"). This feedback is integrated into the user profile, and exploited to improve further
recommendations.
   The architecture of ConveRSE is shown in Figure 1. In particular, ConveRSE implements the
following components:
   Natural Language Understanding (NLU): It is in charge of understanding the user’s ut-
terance. It performs three tasks: (i) intent recognition, which classifies the action or request
expressed in the message (e.g. providing a preference, requesting a recommendation), (ii) entity
recognition, which extracts items (e.g. movies, actors, directors) that are mentioned in the
message, and (iii) sentiment analysis, which then assigns a sentiment score to each item. Intent
                                     NLU
                               Intent Recognition
               User
                               Entity Recognition
                               Sentiment Analysis


                            Dialogue Manager
                             Dialogue State Tracking
                                 Dialogue Policy
                              Response Generation


            Response      Recommender System


Figure 1: ConveRSE architecture                        Figure 2: Interaction with ConveRSE


recognition is implemented using Google Dialogflow1 , Sentiment Analysis is performed using
the Stanford CoreNLP2 Sentiment Tagger, while Entity Recognition is developed in-house.
   Dialogue Manager (DM): It supervises the interaction process, by coordinating the activity
of all other components. It performs three tasks: Dialogue State Tracking, which keeps track
of all information exchanged with the user, and updates it accordingly, Dialogue Policy, which
selects the best action to perform based on the current intent and the dialogue state (e.g.
generate a recommendation, ask for clarification), and Response Generation, that generates a
textual feedback by filling a template with contextual information. This component is developed
in-house.
   Recommender System: It handles all functions related to profile building and generation
of suggestions. In particular, ConveRSE uses a graph-based recommendation algorithm based
on the PageRank with Priors [18]. It exploits a knowledge graph extracted from Wikidata
[19], in which both items and their properties are represented as nodes in the graph. This
component is also responsible for generating explanations, exploiting the connections between
the recommended item and the items in the user profile.

4. Experimental Evaluation
We created three instances of ConveRSE that are able to generate recommendations for different
domains: movies, books and music, with respectively 15, 954, 7, 592 and 12, 926 recommendable
items. Aside from this, we also implemented three different interaction modes for the CoRS:
(i) Natural Language (NL), in which users express their preferences in the form of short text
messages, and all the steps described in Section 3 are performed via text; (ii) Buttons, which uses
a traditional system-driven approach for profile elicitation, in which the CoRS proposes a set of
popular items, and users express a rating by pressing buttons; (iii) Mixed, which is an extension
of the NL interface, in which users can answer certain system questions using buttons (e.g.

    1
        https://dialogflow.cloud.google.com/
    2
        https://stanfordnlp.github.io/CoreNLP/
                                Movies                   Books                       Music
                                NL     Buttons   Mixed   NL       Buttons   Mixed    NL       Buttons   Mixed
 Number of Questions (NQ)       2.86   24.49     2.8     2.09     15.25     1.29     4.52     18.39     2.63
 Interaction Time (sec) (IT)    681.69 659.78    444.7   437.44   272.35    237.69   545.87   688.41    318.52
 Time per Question (sec) (TQ)   28.35  14.06     16.24   17.13    10.2      10.2     19.04    24.28     12.36
 Query Density (QD)             0.84   0.74      0.75    0.65     0.75      0.78     0.57     0.77      0.72
 Accuracy                       0.45   0.56      0.63    0.57     0.46      0.69     0.55     0.54      0.65
 Mean Average Precision         0.36   0.43      0.55    0.52     0.4       0.69     0.46     0.44      0.58

Table 1
Results of the user experiment (best results for each metric are in bold)


when multiple entities in the knowledge base match an item mentioned in the user utterance).
Therefore, there are nine configurations in total.
   We performed three within-subjects experiments (one for each domain), which involved
50 people for the movie domain, 55 for the book domain, and 54 for the music domain. The
results of the experiment will answer the following Research Questions: RQ1: Can natural
language improve a Conversational Recommender System in terms of cost of interaction?; RQ2:
Can natural language improve a Conversational Recommender System in terms of quality of the
recommendations?
   During the experiment, participants were briefly instructed on how to use the system. Then,
they performed all steps described in Section 3. After providing at least three preferences,
they received a set of five recommended movies, each of which could be accepted, rejected,
or more complex feedback could be provided. During the experiment, we collected several
metrics related to the interaction cost and recommendation accuracy. For the interaction cost,
we recorded the number of questions (NQ) asked by the system, the time needed to answer
those questions (TPQ), the total interaction Time (IT), and the Query Density (QD) [20], which
measures the average number of new concepts (i.e. entities) introduced in each utterance. For
the recommendation quality we measured the Accuracy and the Mean Average Precision (MAP).
   Results are shown in Table 1. We can observe that the NL and Mixed configurations ask a lower
number of questions compared to the button-based one. Also, the NL configuration requires
longer IT and TPQ, while the mixed configuration obtained the lowest values. This suggests
that an interaction based entirely on natural language can become inefficient in specific cases,
for example, when a disambiguation of the user input is required. Integrating a button-based
interface for these cases leads to reduced typing time and less mistakes, which dramatically
reduces the interaction cost. We also observe that the mixed interaction mode obtains the best
recommendation accuracy results in all domains, which means that it allows users to express
their preferences more effectively, thus improving the quality of the suggestions.

5. Conclusion
In this paper, we presented an experimental study on the effect of introducing natural language
interaction into a CoRS. Although a dialogue in natural language has the potential to improve
interaction cost and quality of recommendations of a CoRS, a purely NL-based interface poses
some issues that need to be addressed. Specifically, when the user has to choose among a set of
possible options, the integration of buttons drastically improves user experience.
References
 [1] D. Rafailidis, The technological gap between virtual assistants and recommendation
     systems, arXiv preprint arXiv:1901.00431 (2018).
 [2] M. Jugovac, D. Jannach, Interacting with recommenders: Overview and research directions,
     ACM Trans. Interact. Intell. Syst. 7 (2017) 10:1–10:46.
 [3] S. Moller, K.-P. Engelbrecht, C. Kuhnel, I. Wechsung, B. Weiss, A taxonomy of quality of
     service and quality of experience of multimodal human-machine interaction, in: Quality
     of Multimedia Experience, 2009. QoMEx 2009. International Workshop on, IEEE, 2009, pp.
     7–12.
 [4] A. Iovine, F. Narducci, G. Semeraro, Conversational recommender systems and natural
     language: A study through the converse framework, Decision Support Systems (2020)
     113250. doi:10.1016/j.dss.2020.113250.
 [5] D. Jannach, A. Manzoor, W. Cai, L. Chen, A Survey on Conversational Recommender
     Systems, arXiv preprint arXiv:2004.00646 (2020).
 [6] T. Mahmood, F. Ricci, Improving recommender systems with adaptive conversational
     strategies, in: Proceedings of the 20th ACM conference on Hypertext and hypermedia,
     ACM, 2009, pp. 73–82.
 [7] D. Rafailidis, Y. Manolopoulos, Can Virtual Assistants Produce Recommendations?, in: Pro-
     ceedings of the 9th International Conference on Web Intelligence, Mining and Semantics,
     2019, pp. 1–6.
 [8] M. Goker, C. Thompson, The adaptive place advisor: A conversational recommendation
     system, in: Proceedings of the 8th German Workshop on Case Based Reasoning, Citeseer,
     2000, pp. 187–198.
 [9] J. Zhang, P. Pu, A Comparative Study of Compound Critique Generation in Conversational
     Recommender Systems, in: V. P. Wade, H. Ashman, B. Smyth (Eds.), Adaptive Hypermedia
     and Adaptive Web-Based Systems, Lecture Notes in Computer Science, Springer, Berlin,
     Heidelberg, 2006, pp. 234–243. doi:10.1007/11768012_25.
[10] L. W. Dietz, S. Myftija, W. Wörndl, Designing a conversational travel recommender
     system based on data-driven destination characterization, in: ACM RecSys workshop on
     recommenders in tourism, 2019, pp. 17–21.
[11] C. A. Thompson, M. H. Goker, P. Langley, A Personalized System for Conversational
     Recommendations, Journal of Artificial Intelligence Research 21 (2004) 393–428. URL:
     https://jair.org/index.php/jair/article/view/10374. doi:10.1613/jair.1318.
[12] Y. Sun, Y. Zhang, Conversational Recommender System, arXiv:1806.03277 [cs] (2018). URL:
     http://arxiv.org/abs/1806.03277, arXiv: 1806.03277.
[13] H. Mori, Y. Chiba, T. Nose, A. Ito, Dialog-Based Interactive Movie Recommendation:
     Comparison of Dialog Strategies, volume 82, Springer International Publishing, 2018, pp.
     77––83.
[14] J. Habib, S. Zhang, K. Balog, IAI MovieBot: A Conversational Movie Recommender System,
     Proceedings of the 29th ACM International Conference on Information & Knowledge Man-
     agement (2020) 3405–3408. URL: http://arxiv.org/abs/2009.03668. doi:10.1145/3340531.
     3417433, arXiv: 2009.03668.
[15] L. Chen, P. Pu, Critiquing-based recommenders: survey and emerging trends, User
     Modeling and User-Adapted Interaction 22 (2012) 125–150. URL: http://link.springer.com/
     10.1007/s11257-011-9108-6. doi:10.1007/s11257-011-9108-6.
[16] G. Wu, K. Luo, S. Sanner, H. Soh, Deep Language-based Critiquing for Recommender
     Systems, in: Proceedings of the 13th ACM Conference on Recommender Systems, RecSys
     ’19, ACM, New York, NY, USA, 2019, pp. 137–145. URL: http://doi.acm.org/10.1145/3298689.
     3347009. doi:10.1145/3298689.3347009, event-place: Copenhagen, Denmark.
[17] K. Christakopoulou, F. Radlinski, K. Hofmann, Towards Conversational Recommender
     Systems, in: Proceedings of the 22nd ACM SIGKDD International Conference on
     Knowledge Discovery and Data Mining - KDD ’16, ACM Press, San Francisco, Califor-
     nia, USA, 2016, pp. 815–824. URL: http://dl.acm.org/citation.cfm?doid=2939672.2939746.
     doi:10.1145/2939672.2939746.
[18] T. H. Haveliwala, Topic-sensitive pagerank: A context-sensitive ranking algorithm for
     web search, IEEE transactions on knowledge and data engineering 15 (2003) 784–796.
[19] D. Vrandečić, M. Krötzsch, Wikidata: a free collaborative knowledgebase, Communications
     of the ACM 57 (2014) 78–85. Publisher: ACM New York, NY, USA.
[20] J. Glass, J. Polifroni, S. Seneff, V. Zue, Data collection and performance evaluation of
     spoken dialogue systems: The mit experience, in: Sixth International Conference on
     Spoken Language Processing, 2000.