Towards a theoretical formalization of conversational
recommendation
Tommaso Di Noia1 , Francesco Maria Donini2 , Dietmar Jannach3 , Fedelucio Narducci1 and
Claudio Pomo1
1
  Politecnico di Bari, via Orabona, 4, 70125 Bari, Italy
2
  Università degli Studi della Tuscia, via Santa Maria in Gradi, 4, 01100 Viterbo, Italy
3
  University of Klagenfurt, Universitätsstraße, 65-67, 9020 Klagenfurt am Wörthersee, Austria


                                             Abstract
                                             Tools that interact vocally with users are becoming increasingly popular in the market, boosting industry and academia
                                             interest in them. In such environments, conversational recommender systems succeed in guiding users in situations of
                                             information overload. Through multiple interactions with users, such systems ask questions, filter the catalog in a personalized
                                             manner, and suggest items that are of potential interest to the consumer. In this context, conversational efficiency in terms
                                             of the number of required interactions often plays a fundamental role. This work introduces a theoretical and domain
                                             independent approach to support the efficiency analysis of a conversational recommendation engine. Observations from an
                                             empirical analysis align with our theoretical findings.


1. Introduction and Motivation                                                                                                        increase the efficiency of the dialog [8, 9, 10, 11, 12, 13, 14].
                                                                                                                                         Today, research in the general area of recommender
System-generated recommendations have become a                                                                                        systems, and specifically area of 𝐶𝑅𝑆, is almost entirely
common feature of modern online services such as e-                                                                                   empirical [15, 16, 17]. Such empirical studies are cer-
commerce sites, media streaming platforms, and social                                                                                 tainly important and insightful. However, little is known
networks. In many cases, the suggestions made by the                                                                                  about the theoretical aspects of the underlying interactive
underlying recommender systems are personalized ac-                                                                                   recommendation processes. Unfortunately, theoretical
cording to the user’s tastes, needs, and preferences. In                                                                              questions regarding, e.g., the computational complexity
the most prominent applications of recommender sys-                                                                                   of determining a good or the best interaction strategy,
tems, user preferences are estimated based on past user                                                                               can not be answered without a formal characterization
behaviors. However, there are several application do-                                                                                 of the overall problem.
mains where no past interaction logs are available or                                                                                    With this work, we address this research gap and pro-
where the user’s needs and preferences might differ each                                                                              vide a theoretical model of conversational recommenda-
time the user interacts with the service (e.g., restaurant                                                                            tion. The model is designed in a domain-independent
recommendation for a party or a romantic dinner). In                                                                                  way and aims to cover a wide range of realistic applica-
such application settings, a multi-turn, interactive rec-                                                                             tion scenarios. A conversational recommendation pro-
ommendation process is required, where the system’s                                                                                   cess is modeled as a sequence of states, where state tran-
goal is to learn about the user preferences to the extent                                                                             sitions correspond to common user intents and conversa-
that appropriate recommendations can be made. Con-                                                                                    tional moves [18, 19, 20] that can be found in the literature.
versational Recommender Systems (𝐶𝑅𝑆) support such                                                                                       Since our model is agnostic about the application do-
processes and these systems received increased attention                                                                              main and the algorithm that is used to select and rank
in recent years [1, 2, 3, 4, 5, 6, 7].                                                                                                the objects for recommendation (i.e., the recommenda-
   The preference elicitation process in such settings can                                                                            tion algorithm) it serves as a basis to analyze important
be implemented in different ways, ranging from prede-                                                                                 theoretical properties of 𝐶𝑅𝑆.
fined fill-out forms to natural language interfaces—see                                                                                  The main contribution of this work1 is the study of the
Jannach et al. [5] for an overview. In that context, a spe-                                                                           computational complexity for finding an efficient conver-
cific goal when designing a 𝐶𝑅𝑆 is to minimize the effort                                                                             sational strategy in terms of number of dialog turns. In
for users by asking as few questions as possible, i.e., to                                                                            particular, we demonstrate that: (i) the problem of finding
MICROS@CIKM’22: Proceedings of CIKM Workshop on Mixed-                                                                                an efficient conversational strategy in terms of number
Initiative ConveRsatiOnal Systems, October, 21, 2022, Atlanta, USA                                                                    of dialog turns is NP-hard, but in PSPACE; (ii) some spe-
$ tommaso.dinoia@poliba.it (T. Di Noia);                                                                                              cific factors of the item catalog influence the complexity
tommaso.dinoia@poliba.it (F. M. Donini);                                                                                              of the problem; (iii) for a special class of catalogs, the
tommaso.dinoia@poliba.it (D. Jannach); tommaso.dinoia@poliba.it
                                                                                                                                      upper bound lowers to POLYLOGSPACE. From a prac-
(F. Narducci); tommaso.dinoia@poliba.it (C. Pomo)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).
                                       CEUR Workshop Proceedings (CEUR-WS.org)                                                        1
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                                                                          An extended version of this work is available in Di Noia et al. [21].
tical perspective, our analysis leads to the observation                       4. upon rejection of one or more items, ask the user
that the efficiency of a conversation strategy is tied to                         whether if there is a specific feature value of these
the characteristics of the catalog. Observations from an                          items she dislikes (e.g., a specific color) .
empirical analysis on datasets based on MovieLens-1M
support these theoretical considerations.                                  The user can react to the above system prompts with one
                                                                           of the following interactions:
2. Model Description                                                          (a) given one or more recommendations, the user
In our theoretical framework we assume a retrieval-based                          can accept one of them, or reject them;
item filtering approach, which is commonly used in                            (b) the user can state that she dislikes every item
critiquing-based and constraint-based approaches to rec-                          where a features is filled with a particular value
ommendation [22, 23]. In critiquing approaches, users are                         (e.g., “I don’t like green cellphones”)
presented with a recommendation soon in the dialogue
and can then apply pre-defined critiques on the recom-                       The formalization of the interaction process described
mendations, e.g., (“less $$”) [5]. In analogy to database-                 by Di Noia et al. [21] allow us to establish the results
oriented approaches, we therefore use the term “query”                     mentioned above.
when referring to positive user’s preferences. Negative
preferences are modeled as constraints on disliked item                    3. Empirical Analysis
features. The retrieved items are then ranked according
to any type of information, e.g., the popularity of cer-                   3.1. Experimental Design
tain items. In order to carry a general analysis, in our                   One main theoretical result is that the chosen conversa-
approach we abstract from the details of this ranking.                     tion strategy (protocol) not only impacts the efficiency of
   To model the conversational recommendation process,                     a CRS, but that the efficiency also depends on the charac-
we rely on the notion of state of a conversation, and                      teristics of the item catalog, e.g., in terms of the number
what transformations this state can be subject to, de-                     of available item features and the number of distinct val-
pending on the interaction. For example, each preference                   ues. We devised an in-vitro (offline) experiment using
expressed by the user leads to a change of the state of the                two protocols to empirically validate this result.
conversation and may also imply a change in the set of                        Protocols. A CRS may support two different ways
recommendable items. This formalization through con-                       (protocols) of how users can reject a recommendation
versation states ultimately serves as a basis to study the                 made the system:
efficiency of conversational strategies. The most efficient
conversational strategies will minimize the number of                         P1 - the user rejects the recommendation and the
states which must pass through to reach an end.                                  CRS does not ask the user to provide a specific
   In our model, we mainly deal with the system-driven                           reason, i.e., a reason that refers to a disliked fea-
part of a conversation, where a conversation consists of                         ture value. Examples of such more unspecific
a sequence of interactions.2 The system can perform one                          feedback—if any feedback is given at all—could
of the following actions:3                                                       be, “I don’t want to go to the Green Smoke restau-
                                                                                 rant” or “I don’t want to see the movie American
     1. ask the user to fill in (provide) a value for a partic-
                                                                                 Beauty” (for some reason, but I cannot explain
        ular feature under-specified so far (e.g., the item
                                                                                 this to a system);
        color);
                                                                              P2 - the user rejects the recommendation and the
     2. ask the user to enlarge a too narrow choice for a
                                                                                 CRS asks for a specific item characteristic (i.e.,
        feature value (e.g., to change the price limit);
                                                                                 feature value) she does not like at all. For example,
     3. ask for changing a feature value (e.g., from green                       green color for cellphones, sea-view restaurants,
        to red for the color feature);                                           a particular movie director, etc. We assume that
                                                                                 a user will truthfully answer such questions.

2
  In constraint-based and critiquing-based systems the recommender           Hypotheses. Based on our theoretical results, we
  system usually drives the conversation in an initial preference elici-   formulate two hypotheses, where the difference lies in
  tation phase. In typical implementations of such systems, the user       the characteristics of the item catalog.
  can however also take the initiative and, for example, request rec-
  ommendations at any time or proactively revise their preferences.           H1 We do not expect a strong difference in terms of
3
  We note that in the area of Knowledge Representation, slot unfilling
  (e.g., retract any requirement about colors) could be considered a             efficiency between P1 and P2 when the items in
  special case of knowledge contraction [24], while slot change (e.g.,           the catalog have few features with a large number
  from green to red) is a form of revision [25].                                 of distinct values.
      H2 We do expect a strong difference in terms of effi-                existing preferences regarding item features and truth-
         ciency between P1 and P2 when the items in the                    fully responds to system questions about these prefer-
         catalog have several features with few distinct                   ences. When provided with a recommendation, the user
         values.                                                           either rejects it, which means that the dialog continues,
                                                                           or accepts it, and the dialog ends. The 𝐶𝑅𝑆 in our sim-
   Experiment Specifics. In our experiment, we simu-                       ulation implements one of the described conversation
late the above-mentioned protocols and vary the under-                     strategies, P1 or P2.
lying item catalog as an independent variable.                                 Note that in our experiment we simulate a user cold-
                                                                           start situation, i.e., we are not taking any long-term
Efficiency Metric. As commonly done in the litera-                         user profile into account during the dialog. In order to
ture [14, 17], we use the number of questions (NQ) the                     simulate the response of a user, we first select a set of
𝐶𝑅𝑆 asks before the user accepts a recommendation as                       positively-rated items (PRI) for each user. This set con-
an efficiency measure. Fewer questions indicate higher                     sists of those items in the dataset that the user has rated
interaction efficiency.                                                    with a value that is greater or equal to their average rat-
                                                                           ing in the MovieLens dataset. We use this set PRI for two
Dataset and Catalog Description. We rely on the                            purposes. First, we simulate a dialog for each element
widely used MovieLens-1M (ML-1M) dataset for our ex-                       𝐼 of PRI as an “ideal” item (that the user will accept).
periment, which we enrich with item features using DB-                     Second, we use the items in PRI to determine the pre-
pedia [26]. The resulting dataset comprises 3,308 items                    existing preferences of a user and simulate their answers
with 279 unique features. From this dataset, we create                     to the questions posed by the system. Therefore, if the
two versions to test our hypotheses.                                       user previously liked action and romantic movies, the set
                                                                           of pre-existing preferences contains only these values.
         • Itemset1 (IS1) has only a few features but with a                   When the simulated dialog with a defined ideal item
           larger number of distinct values. It is designed to             𝐼ˆ starts, the system will ask a question on a feature e.g.,
           support H1 (we do not expect a strong difference                “What is your favorite genre?”. The simulated user will
           in terms of efficiency between P1 and P2 when                   then respond by choosing a value from the set of values
           the items in the catalog have few features with a               for that feature occurring in PRI. In our simulation, the
           large number of distinct values).                               user cannot answer with a value that is not present in
                                                                           any recommendable object. After each user answer, the
         • Itemset2 (IS2), in contrast, has a larger number                set of recommendable items 𝒮 is updated by the 𝐶𝑅𝑆
           of features, but each of them only has a few dis-               according to her answer. A recommendation is shown
           tinct values and is designed to support H2 (we                  when the system has no more questions to ask. This situ-
           expect a strong difference in terms of efficiency               ation may occur when: (i) preferences on all the features
           between P1 and P2 when the items in the catalog                 have been expressed, (ii) only one item on the catalog is
           have several features with few distinct values).                consistent with the user preferences. The user rejects the
                                                                           recommendation when 𝐼 is not present in the list of rec-
Specifically, to make the datasets sufficiently different
                                                                           ommended items. If the recommendation is rejected, the
and such that they reflect the characteristics described
                                                                           recommended items are removed from the catalog and
in our research hypotheses, IS1 and IS2 have 4 and 10
                                                                           the system starts again posing questions to the user. It is
features respectively for each item. For each feature of
                                                                           noteworthy that, since we may have more than one item
IS1 there are from 1,500 to 2,500 distinct values, whereas
                                                                           in the catalog described by the feature values selected
there are about 100 values onn average for IS2. To better
                                                                           by the user during the dialog, the final recommendation
focus on the main goals of our experiment, we make the
                                                                           may contain also a set of items.
simplifying assumption that all features can have only
                                                                               In case of rejection, protocols P1 and P2 lead to dif-
one value. Accordingly, we replaced set-valued features
                                                                           ferent system reactions. In case of P1, the system starts
(e.g., the movie cast) with a single value randomly chosen
                                                                           querying the user again, beginning with a first randomly
among them.
                                                                           selected feature. The values selected during the previous
                                                                           interactions are discared. In protocol P2, in contrast, the
Simulation Procedure. We simulate the part of a con-                       user declares one of the feature values as disliked for the
versation between a user and a 𝐶𝑅𝑆 where the sys-                          recommended items.
tem drives the interaction by asking the user about pre-                       When the recommendation succeeds—i.e., when the
ferred item features and making recommendations for                        ideal item 𝐼 is in the list of recommendations—the dialog
items4 . We assume that the simulated user has certain pre-                is successfully ended and the simulation continues with
4
    Other parts of the conversations may include greetings or chit-chat.   a new dialog for another user and/or target item. The
    For a catalog of common user intents in CRS, see [18].                 simulation ends when a dialog was simulated for each
                                                                                      6,671
                   Protocol 1
                                                                                                     Protocol 1
                   Protocol 2
                                         1,023.75
                                                                                                     Protocol 2


                                                166.34
                                                                     1,588


                                                                             1,152

                      72.64 67.45

                                                                                               524


                        Itemset1           Itemset2                   Itemset1         Itemset2
                                   (a)                                                        (b)
Figure 1: (a) the average number of Questions per Configuration; (b) the maximum number of Questions per Configuration

element in PRI for every user.                                simulation confirmed what was foreseen by the theo-
                                                              retical analysis: the difference between protocol P1 and
3.2. Results and Discussion                                   protocol P2 shows up clearly only in a dataset with many
We applied protocols P1 and P2 both for itemset IS1 and       features with a small set of different values.
IS2, and we counted the Number of Questions (NQ) re-
quired to reach the test item in each configuration. Fig-     4. Summary and Outlook
ure 1 summarizes the results. As expected from the theo-
retical analysis, with IS1 we observe minor differences       With this work, we contribute to a better understanding
between the two protocols in terms of number of re-           of theoretical properties of conversational recommen-
quird questions made by the system. More specifically,        dation problems and we specifically address questions
P1 needs 72.64 as the average number of questions with        related to the computational complexity of finding effi-
IS1, whereas P2 needs 67.45. The difference is however        cient dialog strategies. One main insight of our theoret-
huge for IS2 where P1 needs more than 1,000 questions on      ical analysis—which was also confirmed by an in-vitro
average to reach the test item, while P2 requires around      experiment—is that when designing an efficient conversa-
166 questions (Fig. 1a). Hence, we can confirm that when      tion strategy, we must always consider the characteristics
the items in the catalog have many features with a smaller    of the item catalog. More specifically, we demonstrated
number of distinct values, the efficiency of P2 grows dras-   that when a few features characterize the items in the
tically compared to P1. Also the maximum number of            catalog with a large number of distinct values, the cri-
questions confirms this different efficiency for IS1 and      tiquing strategy based on asking the user about a disliked
IS2 (Fig. 1b). We note that NQ in absolute terms is very      characteristic of the recommended item does not give any
large—even in the best combination (P2 with IS1), the         significant advantage in terms of user effort. Conversely,
number of questions is close to 70, which sounds too high     when the catalog is composed of items with several fea-
for practical applications. Recall, however, that in this     tures with a few distinct values, a critique strategy based
experiment we implemented a worst-case scenario and           on item features can drastically reduce the user effort for
our experiment used an unrealistic setting on purpose.        reaching a liked recommendation. On a more general
In our scenario the recommendation task is deliberately       level, we hope that our work might help to stimulate
difficult:                                                    more theory-oriented research in this area, leading us to
                                                              a better understanding of the foundational properties of
     • for each dialog, there is only one test item (true     this essential class of interactive AI-based systems. In
       positive);                                             future research, we will investigate the explicit consid-
                                                              eration of individual long-term user preferences in the
     • the CRS works in cold-start condition without          interactive recommendation process.
       any user profile;
     • the CRS does not implement a cut-off on the num-       Acknowledgement
       ber of questions to ask the user.
                                                              The authors acknowledge partial support from the
In conclusion, our experimental evaluation results align
                                                              projects PASSEPARTOUT, ServiziLocali2.0, Smart Rights
with our theoretical findings, thus providing support for
                                                              Management Platform, BIO-D, and ERP4.0.
our research hypotheses H1 and H2. In other words, our
References                                                        tional recommender systems and natural language:
                                                                  A study through the converse framework, Decision
 [1] K. Christakopoulou, F. Radlinski, K. Hofmann, To-            Support Systems 131 (2020) 113250.
     wards conversational recommender systems, in:           [15] A. Iovine, P. Lops, F. Narducci, M. de Gemmis, G. Se-
     Proceedings of the 22nd ACM SIGKDD Interna-                  meraro, An empirical evaluation of active learning
     tional Conference on Knowledge Discovery and                 strategies for profile elicitation in a conversational
     Data Mining, 2016, pp. 815–824.                              recommender system, Journal of Intelligent Infor-
 [2] T. Shen, Z. Mai, G. Wu, S. Sanner, Distributional            mation Systems (2021) 1–26.
     contrastive embedding for clarification-based con-      [16] V. Bellini, G. M. Biancofiore, T. D. Noia, E. D. Scias-
     versational critiquing, in: Proceedings WWW ’22,             cio, F. Narducci, C. Pomo, Guapp: A conversational
     2022, pp. 2422–2432.                                         agent for job recommendation for the italian public
 [3] K. Zhou, W. X. Zhao, S. Bian, Y. Zhou, J.-R. Wen,            administration, in: EAIS, 2020, pp. 1–7.
     J. Yu, Improving conversational recommender sys-        [17] D. Jannach,        Evaluating conversational rec-
     tems via knowledge graph based semantic fusion,              ommender systems, Artificial Intelligence Re-
     in: Proceedings KDD ’20, 2020, pp. 1006–1014.                view (2022). doi:https://doi.org/10.1007/
 [4] C. Gao, W. Lei, X. He, M. de Rijke, T.-S. Chua,              s10462-022-10229-x.
     Advances and challenges in conversational recom-        [18] W. Cai, L. Chen, Predicting user intents and satis-
     mender systems: A survey, AI Open 2 (2021) 100–              faction with dialogue-based conversational recom-
     126.                                                         mendations, in: UMAP ’20, 2020, p. 33–42.
 [5] D. Jannach, A. Manzoor, W. Cai, L. Chen, A survey       [19] R. Reichman, Getting computers to talk like you and
     on conversational recommender systems, ACM                   me, MIT Press, Cambridge, Massachusetts, 1985.
     Comput. Surv. 54 (2021) 1–26.                           [20] D. Jannach, Finding preferred query relaxations
 [6] W. Cai, Y. Jin, L. Chen, Impacts of personal char-           in content-based recommenders, in: Intelligent
     acteristics on user trust in conversational recom-           techniques and tools for novel system architectures,
     mender systems, in: Proceedings CHI ’22, 2022, pp.           Springer, 2008, pp. 81–97.
     489:1–489:14.                                           [21] T. Di Noia, F. M. Donini, D. Jannach, F. Narducci,
 [7] F. Radlinski, K. Balog, F. Diaz, L. Dixon, B. Wedin,         C. Pomo, Conversational recommendation: The-
     On natural language user profiles for transparent            oretical model and complexity analysis, Informa-
     and scrutable recommendation, in: Proceedings                tion Sciences —in press— (2022). doi:https://doi.
     SIGIR ’22, 2022, pp. 2863–2874.                              org/10.1016/j.ins.2022.07.169.
 [8] G. Adomavicius, Y. Kwon, New Recommendation             [22] A. Felfernig, G. Friedrich, D. Jannach, M. Zanker,
     Techniques for Multicriteria Rating Systems, IEEE            Constraint-based recommender systems, in: Rec-
     Intelligent Systems 22 (2007) 48–55. doi:10.1109/            ommender systems handbook, Springer, 2015, pp.
     MIS.2007.58.                                                 161–190.
 [9] P. Viappiani, C. Boutilier, Regret-based optimal rec-   [23] D. Jannach, Advisor Suite – A knowledge-based
     ommendation sets in conversational recommender               sales advisory system, in: ECAI ’04, 2004, pp. 720–
     systems, in: RecSys ’11, 2009, pp. 101–108.                  724.
[10] T. Mahmood, F. Ricci, Improving recommender             [24] S. Colucci, T. Di Noia, E. Di Sciascio, F. M. Donini,
     systems with adaptive conversational strategies,             M. Mongiello, Concept abduction and contraction
     in: Proceedings of the 20th ACM conference on                for semantic-based discovery of matches and nego-
     Hypertext and hypermedia, 2009, pp. 73–82.                   tiation spaces in an e-marketplace, Electron. Com-
[11] J. Reilly, J. Zhang, L. McGinty, P. Pu, B. Smyth, A          mer. Res. Appl. 4 (2005) 345–361.
     comparison of two compound critiquing systems,          [25] P. Gärdenfors, Belief revision and knowledge repre-
     in: IUI ’07, 2007, pp. 317–320.                              sentation, in: Proceedings of the Sixth Conference
[12] P. Grasch, A. Felfernig, F. Reinfrank, Recomment:            on Theoretical Aspects of Rationality and Knowl-
     Towards critiquing-based recommendation with                 edge, 1996, p. 117.
     speech interaction, in: Proceedings of the 7th ACM      [26] T. D. Noia, V. C. Ostuni, P. Tomeo, E. D. Sciascio,
     Conference on Recommender Systems, RecSys ’13,               Sprank: Semantic path-based ranking for top-N
     2013, pp. 157–164.                                           recommendations using linked open data, ACM
[13] F. Narducci, M. de Gemmis, P. Lops, G. Semeraro,             Trans. Intell. Syst. Technol. 8 (2016) 9:1–9:34.
     Improving the user experience with a conversa-
     tional recommender system, in: International Con-
     ference of the Italian Association for Artificial In-
     telligence, Springer, 2018, pp. 528–538.
[14] A. Iovine, F. Narducci, G. Semeraro, Conversa-