=Paper= {{Paper |id=Vol-2467/paper-03 |storemode=property |title=Conversational Recommendations Utilizing Model-based Reasoning |pdfUrl=https://ceur-ws.org/Vol-2467/paper-03.pdf |volume=Vol-2467 |authors=Oliver Tazl,Alexander Perko,Franz Wotawa |dblpUrl=https://dblp.org/rec/conf/confws/TazlPW19 }} ==Conversational Recommendations Utilizing Model-based Reasoning== https://ceur-ws.org/Vol-2467/paper-03.pdf
                               Conversational Recommendations
                                Using Model-based Reasoning
                                  Oliver A. Tazl and Alexander Perko and Franz Wotawa1


Abstract. Chatbots as conversational recommender have gained in-           and finally introduce the results of an evaluation of the implementa-
creasing importance over the years. The chatbot market offers a va-        tion. The evaluation is based on a case study from the tourism do-
riety of applications for research and industry alike. In this paper, we   main, i.e., a scenario where a user wants to book a hotel in a certain
discuss an implementation that supports the use of our recommen-           city. The obtained results show that the proposed chatbot approach is
dation algorithm during chatbot communication. The program eases           applicable and beneficial for the intended purpose. Furthermore, we
communication and improves the underlying recommendation flow.             gained experiences about the limitations of the approach. For exam-
In particular, the implementation makes use of our model-based rea-        ple, it seems that entropy is not always the best measure for selecting
soning approach for improving user experience during a chat, i.e., in      questions to be answered, and further research is needed.
cases where user configurations cause inconsistencies. The approach           The main contributions of this paper can be summarized as the
deals with such issues by removing inconsistencies in order to gener-      follows:
ate a valid recommendation. In addition to the underlying definitions,
we demonstrate our implementation along use cases from the tourism         1. An implementation of an algorithm that is based on model-based
domain.                                                                       diagnosis and Shannon’s information entropy to solve recommen-
                                                                              dation problems and
                                                                           2. the evaluation of the system with synthetic and real-world data
1     INTRODUCTION                                                            sets.
Recommender systems aim to lead users in a helpful and individu-
                                                                             The remainder of this paper is organized as follows: In the next
alized way to interesting or useful items drawn from a large space
                                                                           section we give an overview of our algorithmic approach. After-
of possible options. Recommender systems may utilize knowledge-
                                                                           wards, we present the implementation of the algorithms and show
bases for guiding the users through the whole process of finding the
                                                                           evaluation results in greater detail. Finally, we discuss related re-
right recommendation, i.e., a recommendation that satisfies the user’s
                                                                           search and conclude the paper.
requirements, needs, or expectations. Most recently, conversational
agents like chatbots have gained importance because of the fact that
they – in principle – offer a well-known and ideally more intuitive        2   FOUNDATIONS AND ALGORITHM
interface for human users, i.e., either textual or speech interaction.
                                                                           In our previous work [17], we introduced the algorithm EntRecom,
   In previous work [17] we introduced the basic foundations and
                                                                           which utilizes model-based diagnosis and in particular the ConDiag
principles behind a chatbot-based recommender system that allows
                                                                           algorithm [18], and on a method that applies Shannon’s information
to interact with users in a smart way being capable of finding contra-
                                                                           entropy [23]. To be self-contained, we briefly recapitulate the under-
dictions during observation and efficiently pruning the overall rec-
                                                                           lying definitions and EntRecom. We first formalize the inconsistent
ommendation process via selecting the right questions to be asked
                                                                           requirements problem by exploiting the concepts of Model-Based
to the user. The basic principles behind our approach rely on classi-
                                                                           Diagnosis (MBD) [1, 20] and constraint solving [2].
cal model-based reasoning. In case, the conversation leads to an in-
                                                                              The inconsistent requirements problem requires information on
consistent state, e.g., caused by contradictions between user require-
                                                                           the item catalog (i.e., the knowledge-base of the recommendation
ments and the recommendation knowledge-base, the chatbot is able
                                                                           system) and the current customer’s requirements. Note that the
to react and to resolve this issue. For this purpose, the chatbot asks
                                                                           knowledge-base of the recommender may be consistent with the cus-
the user which requirements to retract in order to eliminate incon-
                                                                           tomer’s requirements (i.e., the customer’s query) and an appropriate
sistencies. In the case where the chatbot has far to many solutions
                                                                           number of recommendations can be offered. In this case, the recom-
to be presented effectively to the user, the system makes use of an
                                                                           mendation system shows the recommendations to the customer and
entropy-based approach for selecting those requirements or attributes
                                                                           no further algorithms have to be applied. Otherwise, if no solutions
that have to be fixed in order to reduce the number of possible solu-
                                                                           to the recommendation problem are available, then the minimal set
tions. When using entropy the number of steps necessary to reach to
                                                                           of requirements, which determined the inconsistency with the knowl-
a solution can be substantially reduced.
                                                                           edge base, have to be identified and consequently offered to the user
   This paper is a direct successor of our previous work, where we
                                                                           as explanation for not finding any recommendation. The user can in
report on an implementation of our chatbot approach. In particular,
                                                                           this case adapt the requirement(s) (relax it/them). Here we borrow the
we discuss the implementation details, present experiences gained,
                                                                           idea from MBD and introduce abnormal modes for the given require-
1   Graz University of Technology, Austria, email: {oliver.tazl, perko,    ments, i.e., we use Ab predicates stating whether a requirement i is
    wotawa}@ist.tugraz.at                                                  should be assumed valid (¬Abi ) or not (Abi ) in a particular context.




Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
The Ab values for the requirements are set by the model-based di-         Algorithm 1 EntRecom(KBD , REQ, n)
agnosis algorithm so that the assumptions together with the require-
ments and the knowledge-base are consistent. In the following, we         Input: A modified knowledge base KBD , a set of customer
define the inconsistent requirements problem and its solutions.           requirements REQ and the maximum number of recommendations
   More formally, we stating the inconsistent requirements problem        n
as follows:                                                               Output: All recommendations S

Definition 1 (Inconsistent Requirements Problem). Given a tuple            1: Generate the constraint model CM from KBD and REQ
(KB, REQ) where KB denotes the knowledge base of the recom-                2: Call CSolver(CM ) to check consistency and store the result
mender system, i.e., the item catalog, and REQ denotes the cus-                 in S
tomer requirements. The Inconsistent Requirements Problem arises           3: if S = ∅ then
when KB together with REQ is inconsistent. In this case we are             4:    Call MI REQ(CM, |REQ|) and store the inconsistent re-
interested in identifying those requirements that are responsible for           quirements in IncReqs
the inconsistency.                                                         5:   Call askUser(IncReqs) and store the answer in
                                                                                AdaptedReqs
   A solution or explanation to the inconsistent requirements prob-        6:   CM = KB ∪ (REQ \ IncReqs ∪ AdaptedReqs)
lem can be easily formalized using the analogy with the definition of      7:   go to Step 2
diagnosis from Reiter [20]. We first introduce a modified representa-      8: end if
tion of (KB, REQ) comprising (KBD , REQ) where KBD com-                    9: while |S| > n do
prises KB together with rules of the form AbR for each requirement        10:   Call GetBestEntrAttr(AS ) and store the result in a
R in REQ. The solution to the Inconsistent Requirements Problem           11:   AS = AS \ a
can now be defined using the modified representation as follows:          12:   Call askUser(a) and store the answer in va
                                                                          13:   S = R (S, va ))
Definition 2 (Inconsistent Requirements). Given a modified recom-         14: end while
mendation model (KBD , REQ). A subset Γ ⊆ REQ is a valid set              15: return S
of inconsistent requirements iff KBD ∪ {¬AbR |R ∈ REQ \ Γ} ∪
{AbR |R ∈ Γ} is satisfiable.                                              3     IMPLEMENTATION AND EVALUATION
   A set of inconsistent requirements Γ is minimal iff no other set
of inconsistent requirements Γ0 ⊂ Γ exists. A set of inconsistent         For the user, the interaction with the recommender system within
requirements Γ is minimal with respect to cardinality iff no other set    the chatbot starts, when he or she formulates a query for a search
of inconsistent requirements Γ0 with |Γ0 | < |Γ| exists. From here        within the domain. The natural language processing framework
on we assume minimal cardinality sets when using the term minimal         Rasa2 parses this query and passes it on to the EntRecom-algorithm.
sets.                                                                     The recommender algorithm then searches a previously conducted
   The second problem occurring during a recommendation session           and preprocessed knowledge base. Depending on the satisfiability of
is the availability of too large number of recommendations, which         the generated constraint model, results are presented to the user. If
have to be narrowed down to a reasonable number. The too many             necessary the user is asked follow up questions regarding his or her
recommendation problem can be solved again using ideas bor-               requirements until we have results to return to the user.
rowed from model-based diagnosis. In diagnosis, we have the similar
problem of coming up with too many diagnoses because of too less
observations known. The corresponding problem in case of recom-
mendation is that we do have far too less requirements from the user.
Hence, we have to ask the user about adding more information in
order to reduce the number of available solutions. In model-based
diagnosis Shannon’s information entropy [23] is used to come up
with observations and in our case requirements that should be known
in order to reduce the recommendations as fast as possible.
   Algorithm 1 provides recommendations in the context of chatbots
making use of diagnosis and Shannon’s information entropy com-
putation. EntRecom converts the available knowledge into a corre-         Figure 1: A Chatbot-Based Recommender as Bridge between User
sponding constraint model and checks its consistency. If the knowl-       Input and Results from a Database
edge is inconsistent, the algorithm tries to find a requirements that
can be retracted by the user in order to get rid of the inconsistency.
                                                                             When the query can be answered, EntRecom exits and the user
Afterwards, EntRecom searches for the best requirement to be set by
                                                                          can continue interacting with the Rasa-based chatbot.
the user in order to reduce the number of solutions if necessary. The
                                                                             As mentioned before, we choose a tourism domain, searching for
algorithm stops when reaching a set of recommendations that has a
                                                                          a hotel to be more specific. The process of searching a hotel in an it-
cardinality of less than n.
                                                                          erative conversation helps us to see the capabilities and shortcomings
   With the provide algorithms a chatbot for recommendations can
                                                                          of the algorithm.
be build that is able to deal with inconsistent requirements as well as
missing requirements in a more or less straightforward way making
                                                                          2 see https://rasa.com/
use of previously invented algorithms.
3.1    Framework and Data Preprocessing                                       every attribute in the domain and the second one being string trans-
                                                                              lation for every value, regardless of the specific class of an attribute.
For our tests, we chose the community-curated data from Open Street           In our implementation, every attribute and value is translated to a
Maps (OSM). We use the python bindings for Microsofts’ Z3 frame-              Z3 string. While this may result in slower runtimes, we ensure flex-
work as a constraint solver. For language processing and basic inter-         ibility, as data types may vary across attributes. Usually, the user is
action with the user, we utilize the Rasa framework. Rasa is used for         asked to select a region of interest. For our tests, we use a data set
natural language understanding by extrating the users’ intend and re-         of a specific size. This is a very critical and, depending of the size
quirements but can also respond immediatly with an answer (without            of the test set, time consuming step. Therefore, we try to do it only
calling EntRecom) when it detects the users’ intend to chitchat.              once when the user initially specifies an area. This Z3 constraint
   In a first step, we query the OSM-API for hotels in a predefined           model, consisting of OR-connected uniform clauses of Z3 strings is
region. The size of the region is a major factor for performance. Be-         our main knwoledge base within EntRecom and shall be refered to
cause of this, it is reasonable to let the user select a region on initial-   as kb and subsets therof as S in the remainder of this paper.
ization.                                                                         The first interaction the user has with our chatbot implementa-
                                                                              tion is handled via the trained natural language understanding (NLU)
                                                                              model within Rasa. If the user’s intent to search within the domain
                                                                              ”hotels”, our recommender is called via a webhook, and the param-
                                                                              eters are passed over. This call to our internal API is the only inter-
                                                                              action between EntRecom and the chatbot framework, Rasa, in our
                                                                              case, which leads to very low coupling and a high degree of flexibil-
                                                                              ity. The parameters represent the NLUs interpretation of the users’
                                                                              query and give us our initial set of requirements.


                                                                              3.2    Recommender Algorithm
                                                                              In this section, we are describing the implementation of the previ-
                                                                              ously introduced algorithm EntRecom.
                                                                                 Before the algorithm ready to use, we have to prepare our data set,
                Figure 2: Data from Open Street Maps                          generate the constraint model and interpret the user’s intent. Then
                                                                              we enter the EntRecom-implementation. As described before, if the
                                                                              query is satisfiable for our knowledge base and the given maximum
    To conduct our internal knowledge base, we process the exported           number of results, we return our recommendations and exit EntRe-
data set in the following way: As some attributes do not add human-           com.A maximum of n hotels is presented to the user, we retrieved
readable information or tend to mislead users, we have to filter them         from the knowledge base, with n representing a preselected maxi-
out. This is especially necessary because some classes of attributes          mum number of results.
have a severe impact on later steps in our recommender system. For               Though, as stated in [17], we are confronted with two potential
example, categories like ”fixme” (annotation from an OSM user)                problems at this point. Given a knowledge base kb, a maximum num-
would give the user no advantage in a real-world scenario and in this         ber of results n and a user-defined set of requirements REQ:
case, would lead to unwanted recommendations. We use a whitelist-
filter for the attributes of every entry in the data set. Furthermore, not    • We could get too many results to present in a meaningful way. In
every entry in the OSM data has the same fields, why we maintain a              this case, the function GetBestEntrAttr is called.
list of all attributes in a data set. If an entry does not include a cer-     • We could get no results at all. In this scenario, we call the function
tain attribute, we add it with a value of False. This leaves us with a          M iREQ.
”normalized” data set.
                                                                              3.2.1 GetBestEntrAttr
                                                                              This part of the algorithm is called, when the query is satisfiable, but
                                                                              the result does not lie within [n]. Therefore, we have to add further
                                                                              constraints to our model. Because we want to occupy as little of the
                                                                              user’s time as possible, we have to efficiently select additional con-
                                                                              straints. This is done by choosing a category out of the domain which
                    Figure 3: Preprocessed Data Set                           best splits the current subset S of the data set. The criterion for our
                                                                              selection is Shannon’s information entropy [22]:
   After our preprocessing step, every entry in the dictionary has
                                                                                                             X
                                                                                               H(X) = −           P (xi )log(P (xi ))
the same number of attributes we add uniform clauses to our con-
                                                                                                              i
straint model. An exemplary normalized clause looks like this:
(amenity = ”none” ∧ addr : city = ”Graz” ∧ name =                                To apply entropy as a measure, we have to restructure the
”P arkhotelGraz” ∧ cuisine = ”none” ∧ . . . ∧ smoking =                       data slightly. AS represents this restructured version of S, which
”isolated” ∧ wheelchair = ”limited” ∧ swimming − pool =                       maps every attribute in the domain to all values of its occur-
”none” ∧ stars = ”4” ∧ tourism = ”hotel”). Regarding data                     rences in S. This is realized with python dictionaries of the form:
types, we choose between two different approaches, when creating              ”attribute1 ” : [”value1 ”, ”value2 ”], ”attribute2 ” : [”value1 ”], ....
the Z3 constraint model. The first one being data type selection for          After computing the number of occurrences of every value for a
                 (a) SAT
                                                                   (b) MiREQ                                      (c) GetBestEntrAttr


                                                              Figure 4: User Interface

specific attribute, we now calculate the entropy, for every attribute.      consisting of our knowledge base and requirements with varying val-
The attribute with the highest entropy splits the data set S most           ues for AB. Given the cardinality of Γ, we iterate over all possible
effectively. As mentioned above, some attributes may lead to                distributions of AB with |REQ| − |Γ| considered requirements. Be-
unwanted recommendations. One reason for this is the appearance of          cause we want to preserve as much of the user’s initial query as pos-
seemingly random values when looked at them lacking context. This           sible, we want to find a minimal set of inconsistent constraints. As
could be attributes internally used within OSM, for instance. Fields        proposed in [17], to find such a minimal set of inconsistencies, we
like ”source” often have seemingly random values like ”survey” or           have to obtain a constraint model that is satisfiable for the smallest
”Kleine Zeitung”. The same pseudo-randomness can be observed                cardinality of Γ possible. For this, we repeat the process of check-
with attributes like ”housenumber” which can take an arbitrary              ing on combinatoric variations of subsets, with increasing numbers
integer value. Those values appear in many data points and, when            of assumed inconsistencies, starting with one. This we do until we
observed isolated, do not contribute any information with regard            reach |REQ|, in which case there are no consistent requirements.
to splitting the data set. When computing the information entropy           When a satisfiable constraint model is found, we are able to retrieve
based on attributes lacking spatial ordering or clustering properties,      all inconsistent requirements in the form of all unconsidered require-
we are not dividing the data set strategically but randomly. This is        ments. With this subset of REQ, we return from M iREQ. After the
why we chose to exclude them from our knowledge base beforehand.            M iREQ-function returns, we ask the user for his or her preference
If several attributes occur with equal entropies in the data set, we        for dropping one of his or her previously defined requirements within
can randomly select one of these categories, as all of them split the       the minimal set of inconsistencies. While the current implementa-
data equally well. In the last step, the user is asked for selecting one    tions assure us to find minimal sets of inconsistent requirements, in
value for this category. The user’s selection is added to the set of        future implementations, we hope to be able to make use of the unsat-
requirements and EntRecom gets called again.                                core functionality of Z3 for this task. This is expected to significantly
                                                                            improve performance.

3.2.2 M iREQ
                                                                            3.3    Evaluation
This part of the algorithm is called, when the query is not satisfiable
for our knowledge base KB with the user-defined set of require-             We developed our tests concentrating on the algorithm itself and the
ments REQ. Following [17], we state the Inconsistent Requirements           data structures needed for execution. The data preprocessing was not
Problem. As a counter measurement to the Inconsistent Requirement           in the focus of this test. For our tests with synthetic test data this is
Problem, we have to soften the query to get results. To achieve this,       especially true, as they use a adapted version of EntRecom without
we have to find inconsistencies in REQ, being a subset Γ thereof. ∀         user interaction, as depicted in Figure 5. The goal of our experiments
inconsistent subsets Γ, KB ∪{¬AbR | R ∈ REQ\Γ}∪{AbR |R ∈                    is to show potentials for optimizations of the implementation, as well
Γ} is satisfiable, with AB being a Boolean variable for selecting           as proofing the versatility of the algorithm proposed in [17].
and deselecting a requirement R to be considered [18] This, we im-              In our tests, we used both, synthetic and real-world data. For
plemented by checking on models with combinatoric variations of             benchmarking and basic experiments, we mainly used synthesized
subsets of REQ. This means, that we evaluate a constraint model             data, while real-world data, specifically from Open Street Maps is
                                                                              h) Ranges (e.g. distance-to-the-center, price-category)

                                                                          ad a) Boolean values appear often in real-world data and are easy to
                                                                                 process. This class represents attributes like internet-access or
                                                                                 payment-credit-card in the domain tourism and esp or abs in the
                                                                                 automotive domain.
                                                                          ad b) Arbitrary integer values appear often as counter variables like vis-
                                                                                 itors, likes or 5-star-reviews
                                                                          ad c) Floating-point numbers occur within both domains, tourism, and
                                                                                 cars in various forms. This class covers all attributes in the context
               Figure 5: Focus of Tests with Synthetic Data                      of distance, like distance-to-the-center for a hotel or mileage for
                                                                                 a car. It also stands for location attributes given in coordinates
 important to ensure the flexibility and robustness of the implementa-           (longitude, latitude) and mean values like user-rating.
 tion in real-world scenarios.                                            ad d) Strings are very versatile and therefore used in many ways in dif-
                                                                                 ferent sources. In many cases, strings contain informational value
 3.3.1 Real-World Test Data                                                      beyond what a number may cover. But often strings are used in
                                                                                 places, where the informative content is not higher than number-
 As described before, we use data from Open Street Maps for our                  valued attributes and could be represented with Boolean values or
 tests. First, we want to observe, how the algorithm copes on real-              numbers instead. This is true for several classes of attributes like
 world data and which degree of data preprocessing and filtering of              stars with values of ”4-star”.
 the data is necessary. Furthermore, we developed the rules for gener- ad e) The name attribute is always a string and does not have to but is
 ating our synthetic test data based on the exported data from OSM.              likely to include a domain-specific term in its value. For hotels,
    For this part of our evaluation, we empirically tested the algorithm         this would be ”Hotel” or a synonym thereof. Because of these re-
 for usable results as well as automated tests for exhaustive testing on         curring terms and the omnipresence over all domain-specific data
 the data.                                                                       sources, we treat this attribute separately from the more general
                                                                                 string class. When generating data sets, we introduce these terms
 3.3.2 Synthetic Test Data                                                       into our samples regularly.
                                                                          ad f) Fields like city or manufacturer contain frequently reappearing
 Our modus operandi for generating and conducting tests with syn-                values, which makes it special regarding information entropy and
 thetic data can be described as follows:                                        therefore interesting to us.
                                                                          ad g) Stars of a hotel can be represented with 1 to 5, which makes a
1.) Definition of attribute classes and generation of data sets
                                                                                 reappearance in this category very probable.
2.) Definition of requirement sets and classifying them within a test-
                                                                          ad h) As we want to simplify the selection process, we reduce the date
    ing matrix
                                                                                 to the year. This results in an integer value often constrained by an
3.) Performing speed tests following the testing matrix
                                                                                 upper boundary being the current year and a case dependent lower
    To evaluate our results for the performance test, we order tests             boundary. Depending on the specific attribute, these values may
 within a three-dimensional testing matrix. The first axis of this ma-           recur frequently.
 trix represents the size of the test set. The second axis represents the  ad i) Ranges are special because they consist not only of one, but two
 number of attributes and the last axis represents the number of al-             boundary values. For simplification reasons, we categorize ranges
 lowed results n. The value at every position in the matrix represents           within the relative classes: low, med and high. Of course, these
 the called subfunction within our recommender algorithm.                        range classes are also very likely to reoccur.
    We randomly generated data sets of different sizes and with dif-
                                                                                 Our synthetic data is randomly generated with respect to those
 ferent numbers of attributes for our performance tests. Because data
                                                                              classes of attributes and with varying occurrences of the different
 within the domains ”hotels” and ”cars”, have several distinct proper-
                                                                              types.
 ties, we set up our test data following certain rules. To achieve this,
 we categorized the attributes dependent on the data type of their cor-
 responding values. Furthermore, we added complementary classes               3.4     Results
 based on certain characteristics of the domain.
                                                                              The results are splited in two parts. First, we discuss the results of
    This results in the following basic classes of attributes based on
                                                                              the synthetic data, followed by the proof of concept results with real-
 their data type:
                                                                              world data.
a) Boolean values                                                                Using the synthetic data, we define sets of requirements to test
b) Integer values                                                             the algorithm on. These sets belong to one of the following classes
c) Float values                                                               relative to the knowledge base e.g. the test data and the given n, they
d) String values                                                              are tested with:

    Additionally, we added the following supplementary classes:               i) Satisfiable with n or less results. This leads to a direct return from
                                                                                 the EntRecom-algorithm, presenting us the results S the con-
e) Often reappearing parts in string values (e.g. names)                         straint solver found.
f) Frequently recurring string values (e.g. city, manufacturer)              ii) Satisfiable with more than n results. In this case, we have to call
g) Restricted numbers (e.g. star-rating, number of seats)                        GetBestEntrAttr, which calculates entropies for all attributes.
h) Dates (e.g. registration date)                                                Another interaction with the algorithm is needed, as we have to
     select a value for the attribute with the highest informational con-    4   RELATED WORK
     tent.
iii) Not satisfiable. In this case, we have to call M iREQ which iter-       The application of model-based reasoning, and especially model-
     atively chooses subsets of REQ until it finds the largest satisfiable   based diagnosis, in the field of recommender systems is not novel.
     subset. Its complement is the set of inconsistent requirements δs.      For example, papers like [4, 11, 19] compute the minimal sets of
     Again, another interaction through AskU ser is necessary. Now           faulty requirements. These requirements should be changed in or-
     we have to select a requirement out of δs we want to keep in REQ        der to find a solution. In these papers, the authors rely on the ex-
     the other constraints are dropped.                                      istence of minimal conflict sets computing the diagnosis for incon-
                                                                             sistent requirements. Felfernig et al. [4] present an algorithm that
    To classify the requirement sets for every test set according to the     calculates personalized repairs for inconsistent requirements. The al-
 three classes from above, we use an adapted version of EntRecom,            gorithm combines concepts of MBD with a collaborative problem
 which does not return results or perform any recursive calls. The pur-      solving approach to improve the quality of repairs in terms of pre-
 pose of this classification is, to be able to identify test results with    diction accuracy. In [19], the concept of representative explanations
 regard to the sub-functions called within EntRecom. The class of            is introduced. This concept follows the idea of generating diversity
 REQ - either i, ii, or iii - for a certain test configuration represents    in alternative diagnoses informally, constraints that occur in con-
 the value of the three-dimensional testing matrix.                          flicts should as well be included in diagnoses presented to the user.
    After classification, we perform our tests on the generated data.        Jannach [11] proposes to determine preferred conflicts ”on demand”
 A typical test-scenario starts with the users first interaction with the    and use a general-purpose and fast conflict detection algorithm for
 chatbot environment, which, in turn, results in setting up the knowl-       this task, instead of computing all minimal conflicts within the user
 edge base. Regularly the user is asked to select a region of interest.      requirements in advance.
 For our tests, we use a data set of a specific size. Then a set of re-         Papers that deal with the integration of diagnosis and constraint
 quirements and a maximum for the expected results are chosen. We            solving are [3] and [24, 25], who proposed a diagnosis algorithm for
 now perform a test for every class in the testing matrix and get results    tree-structured models. The approach is generally applicable due to
 in the form of execution times.                                             the fact that all general constraint models can be converted into an
    For a fixed n of 5 and tests performed with one to five re-              equivalent tree-structured model using decomposition methods, e.g.,
 quirements in the sets, we plot the function calls of M iREQ and            hyper tree decomposition [7, 8]. [26] provides more details regarding
 GetBestEntrAttr on increasingly sized test sets. In case, we ob-            the coupling of decomposition methods and the diagnosis algorithms
 tain a result for our query which lies within n, our constraint solver      for tree-structured models. In addition to that, [21] generalized the
 CSolver is the only function call made.                                     algorithms of [3] and [24]. In [15] the authors also propose the use of
                                                                             constraints for diagnosis where conflicts are used to drive the com-
                                                                             putation. In [6], which is maybe the earliest work that describes the
                                                                             use of constraints for diagnosis, the authors introduce the use of con-
                                                                             straints for computing conflicts under the correctness assumptions.
                                                                             For this purpose they developed the concept of constraint propaga-
                                                                             tion. Despite of the fact that all of these algorithms use constraints for
                                                                             modeling, they mainly focus on the integration of constraint solving
                                                                             for conflict generation, which is different to our approach. For pre-
                                                                             senting recommendation tasks as constraint satisfaction problem, we
                                                                             refer to [12].
                                                                                Human-chatbot communication represents a broad domain. It cov-
                                                                             ers technical aspects as well as psychological and human perspec-
                                                                             tives. Contributions like [9, 30] show several ways of implementing
  Figure 6: Classification Based on Function Calls within EntRecom           chatbots in different domains. Wallace [30] demonstrates an artificial
                                                                             intelligence robot based on a natural language interface (A.L.I.C.E.)
                                                                             that extends ELIZA [31], which is based on an experiment of Alan
    As you see, the function calls of M iREQ decrease, while the calls       M. Turing in 1950 [29]. This work describes how to create a robot
 of GetBestEntrAttr increase for larger test sets. While this basic          personality using AIML, an artificial intelligence modelling lan-
 assumption holds for real data, the impact is not as drastic, as users      guage, to pretend intelligence and self-awareness.
 tend to perform queries with less than five requirements initially.            Sun et al. [27] introduced a conversational recommendation sys-
    The algorithm was also used with real-world data. Therefore, we          tem based on unsupervised learning techniques. The bot was trained
 inserted the data from OSM into the knowledge base. We interact             by successful order conversations between user and real human
 with the textual chat-like web interface in the intended way and get        agents.
 the correct results from the algorithm. The algorithm returned a result        Papers like [5, 10, 13, 32] address the topics user acceptance and
 set within a few iterations and these results fit to the user specifica-    experience. In [32] a pre-study shows that users infer the authentic-
 tion.                                                                       ity of a chat agent by two different categories of cues: agent-related
    These experiments show also that there is a problem of relevance         cues and conversational-related cues. To get an optimal conversa-
 of attributes. The attributes, which have a high entropy to reduce the      tional result, the bot should provide a human-like interaction. Ques-
 result set, are not nessesarily relevant for users. Our tests show that     tions of conversational UX design raised by [5] and [16] demonstrate
 i.e. the attribute wheelchair, which is part of accessibility, has a high   the need to rethink user interaction at all.
 entropy but will not be interesting for a large amount of users. This          The topic of recommender systems with conversational interfaces
 issue has to be addressed in an upcoming version of the algorithm.          is shown in [14], where an adaptive recommendation strategy was
shown based on reinforcement learning methods. In the paper [28],               [15] Jakob Mauss and Martin Sachenbacher, ‘Conflict-driven diagnosis us-
the authors proposed a deep reinforcment learning framework to                       ing relational aggregations’, in Working Papers of the 10th Interna-
                                                                                     tional Workshop on Principles of Diagnosis (DX-99), Loch Awe, Scot-
build personalized conversational recommendation agents. In this                     land, (1999).
work, a recommendation model trained from conversational sessions               [16] R.J. Moore, R. Arar, G.-J. Ren, and M.H. Szymanski, ‘Conversational
and rankings is also presented.                                                      ux design’, volume Part F127655, pp. 492–497, (2017).
                                                                                [17] Iulia Nica, Oliver A. Tazl, and Franz Wotawa, ‘Chatbot-based tourist
                                                                                     recommendations using model-based reasoning’, in ConfWS, (2018).
5    CONCLUSION AND FUTURE WORK                                                 [18] Iulia Nica and Franz Wotawa, ‘Condiag - computing minimal diagnoses
                                                                                     using a constraint solver’, (2012). International Workshop on Principles
In this paper, we showed an implementation and its evaluation of En-                 of Diagnosis ; Conference date: 31-07-2012 Through 03-08-2012.
tRecomm, an algorithm using model-based diagnosis and Shanon’s                  [19] Barry O’Sullivan, Alexandre Papadopoulos, Boi Faltings, and Pearl
information entropy. In our tests, we used both, synthetic and real-                 Pu, ‘Representative explanations for over-constrained problems’, 1, (07
world data. For benchmarking and basic experiments, we mainly                        2007).
                                                                                [20] Raymond Reiter, ‘A theory of diagnosis from first principles’, Artificial
used synthesized data, while real-world data, specifically from Open                 Intelligence, 32(1), 57–95, (1987).
Street Maps is important to ensure the flexibility and robustness of            [21] Martin Sachenbacher and Brian C. Williams, ‘Diagnosis as semiring-
the implementation in real-world scenarios. We also showed the per-                  based constraint optimization’, in European Conference on Artificial
formance for different data sets and also revealed open issues, like                 Intelligence, pp. 873–877, (2004).
                                                                                [22] C. E. Shannon, ‘A Mathematical Theory of Communication’, The Bell
the relevance of chosen attributes, which is already a starting point
                                                                                     System Technical Journal, 27(3), 379–423, (1948).
for future work. Another important step will be a user study to eva-            [23] C. E. Shannon, ‘A mathematical theory of communication’, Bell System
lute the acceptance of the algorithm.                                                Technical Journal, 27, 379–623, (1948).
                                                                                [24] Markus Stumptner and Franz Wotawa, ‘Diagnosing Tree-Structured
                                                                                     Systems’, in Proceedings 15th International Joint Conf. on Artificial
ACKNOWLEDGEMENTS                                                                     Intelligence, Nagoya, Japan, (1997).
                                                                                [25] Markus Stumptner and Franz Wotawa, ‘Diagnosing tree-structured sys-
Research presented in this paper was carried out as part of the AS-                  tems’, Artificial Intelligence, 127(1), 1–29, (2001).
IT-IC project that is co-financed by the Cooperation Programme In-              [26] Markus Stumptner and Franz Wotawa, ‘Coupling CSP decomposition
terreg V-A Slovenia-Austria 2014-2020, European Union, European                      methods and diagnosis algorithms for tree-structured systems’, in Pro-
Regional Development Fund.                                                           ceedings of the 18th International Joint Conference on Artificial Intel-
                                                                                     ligence (IJCAI-03), pp. 388–393, Acapulco, Mexico, (2003).
                                                                                [27] Y. Sun, Y. Zhang, Y. Chen, and R. Jin, ‘Conversational recommendation
REFERENCES                                                                           system with unsupervised learning’, pp. 397–398, (2016).
                                                                                [28] Yueming Sun and Yi Zhang, ‘Conversational recommender system’,
 [1] Johan de Kleer and Brian C. Williams, ‘Diagnosing multiple faults’,             in The 41st International ACM SIGIR Conference on Research &
     Artificial Intelligence, 32(1), 97–130, (1987).                                 Development in Information Retrieval, SIGIR ’18, pp. 235–244, New
 [2] Rina Dechter, Constraint Processing, Morgan Kaufmann, 2003.                     York, NY, USA, (2018). ACM.
 [3] Yousri El Fattah and Rina Dechter, ‘Diagnosing tree-decomposable cir-      [29] Alan M. Turing, Computing Machinery and Intelligence, 23–65,
     cuits’, in Proceedings 14th International Joint Conf. on Artificial In-         Springer Netherlands, Dordrecht, 2009.
     telligence, pp. 1742 – 1748, (1995).                                       [30] R.S. Wallace, The anatomy of A.L.I.C.E., 2009.
 [4] Alexander Felfernig, Gerhard Friedrich, Monika Schubert, Monika            [31] J. Weizenbaum, ‘Eliza-a computer program for the study of natural lan-
     Mandl, Markus Mairitsch, and Erich Teppan, ‘Plausible repairs for in-           guage communication between man and machine’, Communications of
     consistent requirements.’, in IJCAI International Joint Conference on           the ACM, 9(1), 36–45, (1966).
     Artificial Intelligence, pp. 791–796, (01 2009).                           [32] N.V. Wünderlich and S. Paluch, ‘A nice and friendly chat with a bot:
 [5] Asbjørn Følstad and Petter Bae Brandtzæg, ‘Chatbots and the new                 User perceptions of ai-based service agents’, (2018).
     world of hci’, interactions, 24(4), 38–42, (June 2017).
 [6] Hector Geffner and Judea Pearl, ‘An Improved Constraint-Propagation
     Algorithm for Diagnosis’, in Proceedings 10th International Joint
     Conf. on Artificial Intelligence, pp. 1105–1111, (1987).
 [7] Georg Gottlob, Nicola Leone, and Francesco Scarcello, ‘Hypertree De-
     composition and Tractable Queries’, in Proc. 18th ACM SIGACT SIG-
     MOD SIGART Symposium on Principles of Database Systems (PODS-
     99), pp. 21–32, Philadelphia, PA, (1999).
 [8] Georg Gottlob, Nicola Leone, and Francesco Scarcello, ‘A compari-
     son of structural CSP decomposition methods’, Artificial Intelligence,
     124(2), 243–282, (December 2000).
 [9] B. Graf, M. Krüger, F. Müller, A. Ruhland, and A. Zech, ‘Nombot - sim-
     plify food tracking’, volume 30-November-2015, pp. 360–363, (2015).
[10] Jennifer Hill, W. Randolph Ford, and Ingrid G. Farreras, ‘Real con-
     versations with artificial intelligence: A comparison between human-
     human online conversations and human-chatbot conversations’, Com-
     puters in Human Behavior, 49, 245 – 250, (2015).
[11] Dietmar Jannach, ‘Finding preferred query relaxations in content-based
     recommenders’, IEEE Intelligent Systems, 109, 81–97, (04 2008).
[12] Dietmar Jannach, Markus Zanker, and Matthias Fuchs, ‘Constraint-
     based recommendation in tourism: A multiperspective case study’,
     Journal of IT and Tourism, 11, 139–155, (2009).
[13] A. Khanna, M. Jain, T. Kumar, D. Singh, B. Pandey, and V. Jha,
     ‘Anatomy and utilities of an artificial intelligence conversational en-
     tity’, pp. 594–597, (2016).
[14] Tariq Mahmood, Francesco Ricci, and Adriano Venturini, ‘Learning
     adaptive recommendation strategies for online travel planning’, Infor-
     mation and Communication Technologies in Tourism 2009, 149–160,
     (2009).