Integration of a Semantic Storytelling Recommender
System in Speech Assistants
María González-García1,∗,† , Julián Moreno-Schneider1,† , Malte Ostendorff1,† and
Georg Rehm1,†
1
    Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI GmbH), Alt-Moabit 91c, 10559 Berlin, Germany


                                         Abstract
                                         Nowadays, practically there is no customer service that does not use speech assistants or smart voice
                                         assistants in almost every area of the society. Depending on the area, these assistants are used to
                                         develop different tasks. Considering that, in this work, we present a semantic storytelling approach
                                         executed through the combination of a question answering system and a recommender system applied
                                         in the context of speech assistants. Our contribution to this context is to provide additional information
                                         semantically related to the user’s request and QA system answer. Apart from the underlying technology,
                                         a prototypical graphical user interface (as a chatbot) has been developed to demonstrate the functionality
                                         and some tests and evaluations have been accomplished to check the performance of the approach.

                                         Keywords
                                         Semantic Storytelling, Speech Assistant, Recommender System


1. Introduction
Speech assistants or smart voice systems are becoming a very popular technology in recent
times and can be found in more diverse and specific areas. The customer service that does not
use this type of system is practically non-existent, either partially (being helped in some part
of the process by human intervention) or totally, ending the process completely automatically
(independently). In recent years some approaches are introducing the usage of artificial intelli-
gence techniques to automate parts or the entire process to avoid costly resource consuming
techniques. There are plenty of examples of this kind of artificial intelligence voice assistant in
different areas, for example within the garment industry [1], H&M or Shepora offers chatbots
for recommendation, or news portals and television networks such as the CNN chatbot or
airlines companies [2] such as KLM Royal Dutch Airlines, Lufthansa or Ryanair.
   All these voice assistants have the peculiarity that, although they use AI, their functionality
is limited to a specific domain or to a small amount of information (specific databases of the
companies themselves), achieving very good results in these limited areas. This trend has
continued in recent years, generalizing in domains or situations, that is, voice assistants such as

In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M. Litvak (eds.): Proceedings of the Text2Story’23 Workshop, Dublin
(Republic of Ireland), 2-April-2023
∗
    Corresponding author.
Envelope-Open maria.gonzalez_garcia@dfki.de (M. González-García); julian.moreno_schneider@dfki.de (J. Moreno-Schneider);
malte.ostendorff@dfki.de (M. Ostendorff); georg.rehm@dfki.de (G. Rehm)
                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                           5
Google, Amazon Alexa or Apple Siri have become systems capable of integrating perfectly into
people’s daily lives thanks to their large ability to answer specific questions. However, they
are still not capable of adequately reasoning out a complete story based on previous searches
carried out by users. At this point, ChatGPT1 comes into play. This AI chatbot, launched by
OpenAI, was built on top of OpenAI’s GPT-3 family of large language models and has been
fine-tuned using supervised and reinforcement learning techniques. ChatGPT can accomplish
different tasks such as text generation, text completion, QA, summarization, etc. These models
are improving over previous approaches, but coherent story generation is still not fully achieved.
   This is where our system appears, whose main objective is to offer the user semantically
related extra information to a specific answer previously obtain (also automatically) from a
question answering (QA) system. In summary, the main contributions are: i) Develop a specific
approach for semantic storytelling composed of two modules, a QA system, and a recommender
system; ii) Provide additional information considering the semantic relations that exist between
two documents; iii) Release the whole code of our approach, which is available at GitLab2 .


2. Related Work
The way a story is told directly influences the final result of the story. As introduced in
[3], the term semantic storytelling refers to the automatic (or semi-automatic) generation of
stories, where a story is considered a natural language text containing a complete, correct and
unambiguous story.
   As aforementioned, our semantic storytelling approach is composed of a QA system and a
recommender system, therefore, some works related to these topics have been analyzed. For
example, the recommender system used in this work is based on the work of [4] that builds an
automatic classifier for semantic relations between text segments, designs custom annotation
guidelines, annotates a dataset and performs evaluations with Transformer language models.
Our work complements this classifier building a complete system together with the QA system.
   Other examples of the development of these systems are explained below. [5] presents an
interactive recommendation approach that recommends movies and visualizes them in an
animated comic strip fashion. In [6], a storytelling approach is followed to build a music
recommender system using the structured and linked data offered by the Semantic Web. Other
example is the work of [7] that describes a new approach that recommends cultural-touristics
paths taking advantages of the user and item profile knowledge. Or the work of [8], that
provides a tailor-made story based on the context in which the user is located. These works
develop recommender systems based on the user preferences, in contrast, our recommender
system is based on the semantic relations between documents to recommend an answer.


3. Semantic Storytelling Recommender System
The main goal of our system is to assist users in obtaining additional information to concrete
answers through speech assistants. This section describes the technical details of our system,
1
    https://openai.com/blog/chatgpt
2
    https://gitlab.com/speaker-projekt/chatbotdemo


                                                     6
which is mainly composed of two modules: Question Answering (QA) and Recommender
System, as it is shown in Figure 1.


Figure 1: Semantic Storytelling Recommender System Architecture.


3.1. Question Answering
The QA module is an adapted version of an existing system that worked in English to be able to
work with documents in German. As a basis for this development, we have used the Haystack3
system, a fairly established QA framework that has modules that include the latest technologies
in the field of Information Retrieval, such as Sparse (BM25, TF-IDF) or Dense (transformer-
based) retrievers, and in the field of Information Extraction, such as FARMReader (based on
roberta-base4 ). The Haystack system consists of a large number of predefined and trained QA
modules that can be used directly to implement a QA pipeline. The predefined pipelines work
exclusively in English, so we had to adapt it for German language using components for German
in the indices where the information is stored and the German ELECTRA base model5 .

3.2. Recommender System
Let us assume the following situation. A user has provided a concrete question 𝑄, and the QA
module has already provided a suitable answer 𝐴 for it. The goal of this module is to identify
and to suggest new content for the user that could be semantically related to the question and
the answer. To accomplish this goal, the module is composed of an offline processing and an
online processing, as they are going to be described below (Figure 2).


Figure 2: Architecture of the Recommender System Module.


3
  https://haystack.deepset.ai/
4
  https://huggingface.co/deepset/roberta-base-squad2
5
  https://huggingface.co/deepset/gelectra-base


                                                       7
Offline Processing: Generation of the Semantic Relations Database In this part of the
process, firstly, we get all the origin documents from the dataset (every document is identified by
its position in the database). Secondly, each document is selected and combined one by one with
the rest of the documents to predict the semantic relations between each one of them. According
to [4], 12 semantic relations (none, identity, equivalence, causal, contrast, temporal, conditional,
description, attribution, fulfillment, summary and purpose) are going to be considered. And
finally, each document and the semantic relations between all documents are stored.

Online Processing: Finding the Related Segments The information stored in the offline
processing together with the QA answer and the context allow us to look for the semantic rela-
tions between the response of the QA service and the documents of our dataset. Moreover, if the
language of the request is German, before looking for the semantic relations, the documents of
the dataset will be translated through a machine translation module to extend the recommender
system module to this language at least until an annotated German dataset will be built.


4. Graphical User Interface (GUI): Chatbot
The graphical user interface (see Figure 3) has been designed as a simple chatbot interface with
voice input capabilities, apart from the usual text input, in which the extra information provided
by the semantic storytelling service has been included as an answer visually different.


Figure 3: Chatbot (Front-End) (Higher resolution images available at https://gitlab.com/speaker-projekt/
chatbotdemo/-/tree/main/media).


   The system allows interacting through two different modes: text and voice. Textual input is
the most common and intuitive mode and allows using directly the input for all subsequent
processes. As for the voice input, the user can record a message that is recognized by an
automatic speech recognition systems (ASR), specifically, the Hensoldt Analytics Speech-to-text
for English6 (available at the European Language Grid platform7 ). After that, the chatbot shows
the text to find out if it has being recognized properly. In this case, the chatbot behaves as if a
textual input was written. On the contrary, it provides the possibility of rewriting the question.

6
    https://live.european-language-grid.eu/catalogue/tool-service/20891/overview/
7
    https://www.european-language-grid.eu


                                                          8
  Once the text is available, it is processed by an intent recognition module. This module
uses simple rules to classify the query into five classes: GREETINGS, GOODBYE, ASSERTION,
THANKS and QUESTION. If the intent is classified into one of these classes: GREETINGS,
GOODBYE, ASSERTION and THANKS, the response of the chatbot is one of those shown in
Table 1. Otherwise if the intent is classified into the QUESTION class, the chatbot calls the QA
module and, if it is required, the recommender system module. In this case, the chatbot shows
two examples as additional information that contain the type of semantic relation between the
QA response and a document of the dataset and the document itself. Moreover, if there are
more semantic relations, the chatbot also shows the number and the type of these relations as
well as offering the possibility of downloading all this information in a JSON format file.

Table 1
Requests and Responses associated with four of the classes considered in our system.
     Intent          Input                                       Response
     GREETINGS       Hello; Hi; Hey                              Hello;Hi; Welcome
     GOODBYE         Bye; Good bye; See you later                Have a nice time;Bye; See you soon
     ASSERTION       Yes; Yes, please; Of course                 What can I help you with?; What is
                                                                 your question?
     THANKS          Thanks; Thank you; That’s helpful; Awe-     Happy to help!; Any time; You’re wel-
                     some; Great; No, thanks; No thanks          come; My pleasure


5. Experiments
The evaluation of the semantic storytelling approach is accomplished by several experiments,
which aim to explore the suitability of the approach and help us to gain an understanding
of what we can achieve in the long run. Two different datasets have been used during our
experiments, the dataset explained in [4] for training the recommender system and the German
News Dataset8 for evaluating the performance of the recommender system in German.

5.1. Experiment for the recommendation model
Two experts have evaluated the performance of the recommendation model. Due to the fact
that the German News dataset is too big, we used a sample set of it that allows us to analyze
1026 semantic relations between each two documents (or parts of a document). In this sense, it
is important to remark that we have only used the most relevant semantic relation between
documents. Besides, considering that the recommender system uses an English model and we
want to test its performance in German language, we also used a machine translation module,
concretely, the Argos Translate library9 . Once the documents are translated, we predict the
semantic relations between them and after that we calculate their accuracy (see Table 2) .


8
    https://www.kaggle.com/datasets/pqbsbk/german-news-dataset
9
    https://github.com/argosopentech/argos-translate


                                                      9
Table 2
Predicted semantic relations and their accuracy.
      Semantic relation      None    Causal        Temporal   Equivalence   Contrast   Avg
           Number             812       75            69          57           13      1026
             True             632       45            41          31           2       751
             False            180       30            28          26           11       275
           Accuracy           0.78      0.6          0.59        0.54         0.15     0.73


   After performing a qualitative analysis of the predicted semantic relations, there are several
remarks that are worth mentioning. As expected, the sample set is unbalanced and the number
of none semantic relations is much bigger than the rest of the represented semantic relations
(812 > 214). Only 5 out of 12 (the semantic relations recognized in [4]) have been identified.
But, in general, and considering the errors that can be propagated from the translations and
that we are using a sample set, the outcomes seems promising.

5.2. Other experiments
Apart from this experiment, the QA module (working in German) and the semantic storytelling
approach through the GUI have also been evaluated. Regarding the QA evaluations, we have
used the GermanQuAD dataset [9] and we have evaluated the Retriever, obtaining 𝑅𝑒𝑐𝑎𝑙𝑙 ∶
0.972, 𝑀𝑒𝑎𝑛𝐴𝑣𝑔𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∶ 0.877, and the Reader (using gelectra model), obtaining 𝑇 𝑜𝑝 − 𝑁 −
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ∶ 97.42, 𝐸𝑥𝑎𝑐𝑡𝑀𝑎𝑡𝑐ℎ ∶ 72.42 and 𝐹 1 − 𝑆𝑐𝑜𝑟𝑒 ∶ 90.35. Although we are still in the
process of intensively test and evaluate the GUI and the way the semantic storytelling integrates
into it, we have been able to test it to prove the functionality of both. Future plans include to
accomplish further user satisfaction evaluations.


6. Conclusions and Future Work
We have developed a storytelling approach starting from a QA system and a recommender
system. Our first experiment of the recommender model showed that thanks to the automatic
translation module it is possible to use the model without having to train it with new data in
different languages. Due to time constrains, we could not include in this work the evaluations
regarding the complete semantic storytelling approach through the GUI. These new evaluations
are our short-time future work. On the long-term we are planning to compare our results with a
recommender system using a model trained on German data (that we are planning to annotate).
Besides, the end-to-end integration of all components currently under development is foreseen,
as well as the adaptation of the approach to new domains.


Acknowledgments
This work has received funding from the German Federal Ministry for Economic Affairs and
Climate Action (BMWK) through the project SPEAKER (no. 01MK19011).


                                                    10
References
[1] L. Syarova, Chatbot usage in e-retailing and the effect on customer satisfaction, 2022. URL:
    http://essay.utwente.nl/92080/.
[2] S. D. Sarol, M. F. Mohammad, N. A. A. Rahman, Mobile Technology Application in Aviation:
    Chatbot for Airline Customer Experience, Springer Nature Singapore, Singapore, 2023, pp.
    59–72. URL: https://doi.org/10.1007/978-981-19-6619-4_5. doi:10.1007/978-981-19-6619-4_
    5.
[3] G. Rehm, K. Zaczynska, J. Moreno, Semantic storytelling: Towards identifying storylines in
    large amounts of text content., in: Text2Story@ ECIR, 2019, pp. 63–70.
[4] M. Raring, M. Ostendorff, G. Rehm, Semantic relations between text segments for semantic
    storytelling: Annotation tool - dataset - evaluation, in: Proceedings of the Thirteenth
    Language Resources and Evaluation Conference, European Language Resources Association,
    Marseille, France, 2022, pp. 4923–4932. URL: https://aclanthology.org/2022.lrec-1.526.
[5] K. Wegba, A. Lu, Y. Li, W. Wang, Interactive storytelling for movie recommendation through
    latent semantic analysis, in: 23rd International conference on intelligent user interfaces,
    2018, pp. 521–533.
[6] S. Baumann, R. Schirru, B. Streit, Towards a storytelling approach for novel artist recom-
    mendations, in: International Workshop on Adaptive Multimedia Retrieval, Springer, 2011,
    pp. 1–15.
[7] M. Casillo, M. D. Santo, M. Lombardi, R. Mosca, D. Santaniello, C. Valentino, Recommender
    systems and digital storytelling to enhance tourism experience in cultural heritage sites,
    in: 2021 IEEE International Conference on Smart Computing (SMARTCOMP), 2021, pp.
    323–328. doi:10.1109/SMARTCOMP52413.2021.00067.
[8] F. Clarizia, F. Colace, M. Lombardi, F. Pascale, A context aware recommender system for
    digital storytelling, in: 2018 IEEE 32nd International Conference on Advanced Information
    Networking and Applications (AINA), 2018, pp. 542–549. doi:10.1109/AINA.2018.00085.
[9] T. Möller, J. Risch, M. Pietsch, Germanquad and germandpr: Improving non-english question
    answering and passage retrieval, CoRR abs/2104.12741 (2021). URL: https://arxiv.org/abs/
    2104.12741. arXiv:2104.12741.


                                              11