<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Integration of a Semantic Storytelling Recom mender System in Speech Assistants</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>María González-García</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julián Moreno-Schneider</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Malte Ostendorf</string-name>
          <email>malte.ostendorff@dfki.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georg Rehm</string-name>
          <email>georg.rehm@dfki.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Semantic Storytelling, Speech Assistant, Recommender System</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI GmbH), Alt-Moabit 91c</institution>
          ,
          <addr-line>10559 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>Nowadays, practically there is no customer service that does not use speech assistants or smart voice assistants in almost every area of the society. Depending on the area, these assistants are used to develop diferent tasks. Considering that, in this work, we present a semantic storytelling approach executed through the combination of a question answering system and a recommender system applied in the context of speech assistants. Our contribution to this context is to provide additional information semantically related to the user's request and QA system answer. Apart from the underlying technology, a prototypical graphical user interface (as a chatbot) has been developed to demonstrate the functionality and some tests and evaluations have been accomplished to check the performance of the approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Speech assistants or smart voice systems are becoming a very popular technology in recent
times and can be found in more diverse and specific areas. The customer service that does not
use this type of system is practically non-existent, either partially (being helped in some part
of the process by human intervention) or totally, ending the process completely automatically
(independently). In recent years some approaches are introducing the usage of artificial
intelligence techniques to automate parts or the entire process to avoid costly resource consuming
techniques. There are plenty of examples of this kind of artificial intelligence voice assistant in
diferent areas, for example within the garment industry [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], H&amp;M or Shepora ofers chatbots
for recommendation, or news portals and television networks such as the CNN chatbot or
airlines companies [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] such as KLM Royal Dutch Airlines, Lufthansa or Ryanair.
      </p>
      <p>All these voice assistants have the peculiarity that, although they use AI, their functionality
is limited to a specific domain or to a small amount of information (specific databases of the
companies themselves), achieving very good results in these limited areas. This trend has
continued in recent years, generalizing in domains or situations, that is, voice assistants such as
Google, Amazon Alexa or Apple Siri have become systems capable of integrating perfectly into
people’s daily lives thanks to their large ability to answer specific questions. However, they
are still not capable of adequately reasoning out a complete story based on previous searches
carried out by users. At this point, ChatGPT1 comes into play. This AI chatbot, launched by
OpenAI, was built on top of OpenAI’s GPT-3 family of large language models and has been
ifne-tuned using supervised and reinforcement learning techniques. ChatGPT can accomplish
diferent tasks such as text generation, text completion, QA, summarization, etc. These models
are improving over previous approaches, but coherent story generation is still not fully achieved.</p>
      <p>This is where our system appears, whose main objective is to ofer the user semantically
related extra information to a specific answer previously obtain (also automatically) from a
question answering (QA) system. In summary, the main contributions are: i) Develop a specific
approach for semantic storytelling composed of two modules, a QA system, and a recommender
system; ii) Provide additional information considering the semantic relations that exist between
two documents; iii) Release the whole code of our approach, which is available at GitLab2.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The way a story is told directly influences the final result of the story. As introduced in
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the term semantic storytelling refers to the automatic (or semi-automatic) generation of
stories, where a story is considered a natural language text containing a complete, correct and
unambiguous story.
      </p>
      <p>
        As aforementioned, our semantic storytelling approach is composed of a QA system and a
recommender system, therefore, some works related to these topics have been analyzed. For
example, the recommender system used in this work is based on the work of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] that builds an
automatic classifier for semantic relations between text segments, designs custom annotation
guidelines, annotates a dataset and performs evaluations with Transformer language models.
Our work complements this classifier building a complete system together with the QA system.
      </p>
      <p>
        Other examples of the development of these systems are explained below. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] presents an
interactive recommendation approach that recommends movies and visualizes them in an
animated comic strip fashion. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], a storytelling approach is followed to build a music
recommender system using the structured and linked data ofered by the Semantic Web. Other
example is the work of [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that describes a new approach that recommends cultural-touristics
paths taking advantages of the user and item profile knowledge. Or the work of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], that
provides a tailor-made story based on the context in which the user is located. These works
develop recommender systems based on the user preferences, in contrast, our recommender
system is based on the semantic relations between documents to recommend an answer.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Semantic Storytelling Recommender System</title>
      <p>The main goal of our system is to assist users in obtaining additional information to concrete
answers through speech assistants. This section describes the technical details of our system,
1https://openai.com/blog/chatgpt
2https://gitlab.com/speaker-projekt/chatbotdemo
which is mainly composed of two modules: Question Answering (QA) and Recommender
System, as it is shown in Figure 1.</p>
      <sec id="sec-3-1">
        <title>3.1. Question Answering</title>
        <p>The QA module is an adapted version of an existing system that worked in English to be able to
work with documents in German. As a basis for this development, we have used the Haystack3
system, a fairly established QA framework that has modules that include the latest technologies
in the field of Information Retrieval, such as Sparse (BM25, TF-IDF) or Dense
(transformerbased) retrievers, and in the field of Information Extraction, such as FARMReader (based on
roberta-base4). The Haystack system consists of a large number of predefined and trained QA
modules that can be used directly to implement a QA pipeline. The predefined pipelines work
exclusively in English, so we had to adapt it for German language using components for German
in the indices where the information is stored and the German ELECTRA base model5.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Recommender System</title>
        <p>
          Let us assume the following situation. A user has provided a concrete question  , and the QA
module has already provided a suitable answer  for it. The goal of this module is to identify
and to suggest new content for the user that could be semantically related to the question and
the answer. To accomplish this goal, the module is composed of an ofline processing and an
online processing, as they are going to be described below (Figure 2).
3https://haystack.deepset.ai/
4https://huggingface.co/deepset/roberta-base-squad2
5https://huggingface.co/deepset/gelectra-base
Ofline Processing: Generation of the Semantic Relations Database In this part of the
process, firstly, we get all the origin documents from the dataset (every document is identified by
its position in the database). Secondly, each document is selected and combined one by one with
the rest of the documents to predict the semantic relations between each one of them. According
to [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], 12 semantic relations (none, identity, equivalence, causal, contrast, temporal, conditional,
description, attribution, fulfillment, summary and purpose) are going to be considered. And
ifnally, each document and the semantic relations between all documents are stored.
Online Processing: Finding the Related Segments The information stored in the ofline
processing together with the QA answer and the context allow us to look for the semantic
relations between the response of the QA service and the documents of our dataset. Moreover, if the
language of the request is German, before looking for the semantic relations, the documents of
the dataset will be translated through a machine translation module to extend the recommender
system module to this language at least until an annotated German dataset will be built.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Graphical User Interface (GUI): Chatbot</title>
      <p>The graphical user interface (see Figure 3) has been designed as a simple chatbot interface with
voice input capabilities, apart from the usual text input, in which the extra information provided
by the semantic storytelling service has been included as an answer visually diferent.</p>
      <p>The system allows interacting through two diferent modes: text and voice. Textual input is
the most common and intuitive mode and allows using directly the input for all subsequent
processes. As for the voice input, the user can record a message that is recognized by an
automatic speech recognition systems (ASR), specifically, the Hensoldt Analytics Speech-to-text
for English6 (available at the European Language Grid platform7). After that, the chatbot shows
the text to find out if it has being recognized properly. In this case, the chatbot behaves as if a
textual input was written. On the contrary, it provides the possibility of rewriting the question.
6https://live.european-language-grid.eu/catalogue/tool-service/20891/overview/
7https://www.european-language-grid.eu</p>
      <p>Once the text is available, it is processed by an intent recognition module. This module
uses simple rules to classify the query into five classes: GREETINGS, GOODBYE, ASSERTION,
THANKS and QUESTION. If the intent is classified into one of these classes: GREETINGS,
GOODBYE, ASSERTION and THANKS, the response of the chatbot is one of those shown in
Table 1. Otherwise if the intent is classified into the QUESTION class, the chatbot calls the QA
module and, if it is required, the recommender system module. In this case, the chatbot shows
two examples as additional information that contain the type of semantic relation between the
QA response and a document of the dataset and the document itself. Moreover, if there are
more semantic relations, the chatbot also shows the number and the type of these relations as
well as ofering the possibility of downloading all this information in a JSON format file.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>
        The evaluation of the semantic storytelling approach is accomplished by several experiments,
which aim to explore the suitability of the approach and help us to gain an understanding
of what we can achieve in the long run. Two diferent datasets have been used during our
experiments, the dataset explained in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for training the recommender system and the German
News Dataset8 for evaluating the performance of the recommender system in German.
      </p>
      <sec id="sec-5-1">
        <title>5.1. Experiment for the recommendation model</title>
        <p>Two experts have evaluated the performance of the recommendation model. Due to the fact
that the German News dataset is too big, we used a sample set of it that allows us to analyze
1026 semantic relations between each two documents (or parts of a document). In this sense, it
is important to remark that we have only used the most relevant semantic relation between
documents. Besides, considering that the recommender system uses an English model and we
want to test its performance in German language, we also used a machine translation module,
concretely, the Argos Translate library9. Once the documents are translated, we predict the
semantic relations between them and after that we calculate their accuracy (see Table 2) .
8https://www.kaggle.com/datasets/pqbsbk/german-news-dataset
9https://github.com/argosopentech/argos-translate</p>
        <p>
          After performing a qualitative analysis of the predicted semantic relations, there are several
remarks that are worth mentioning. As expected, the sample set is unbalanced and the number
of none semantic relations is much bigger than the rest of the represented semantic relations
(812 &gt; 214). Only 5 out of 12 (the semantic relations recognized in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]) have been identified.
But, in general, and considering the errors that can be propagated from the translations and
that we are using a sample set, the outcomes seems promising.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Other experiments</title>
        <p>
          Apart from this experiment, the QA module (working in German) and the semantic storytelling
approach through the GUI have also been evaluated. Regarding the QA evaluations, we have
used the GermanQuAD dataset [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and we have evaluated the Retriever, obtaining  ∶
0.972,      ∶ 0.877 , and the Reader (using gelectra model), obtaining   −  −
   ∶ 97.42 ,  ℎ ∶ 72.42 and  1 −   ∶ 90.35 . Although we are still in the
process of intensively test and evaluate the GUI and the way the semantic storytelling integrates
into it, we have been able to test it to prove the functionality of both. Future plans include to
accomplish further user satisfaction evaluations.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>We have developed a storytelling approach starting from a QA system and a recommender
system. Our first experiment of the recommender model showed that thanks to the automatic
translation module it is possible to use the model without having to train it with new data in
diferent languages. Due to time constrains, we could not include in this work the evaluations
regarding the complete semantic storytelling approach through the GUI. These new evaluations
are our short-time future work. On the long-term we are planning to compare our results with a
recommender system using a model trained on German data (that we are planning to annotate).
Besides, the end-to-end integration of all components currently under development is foreseen,
as well as the adaptation of the approach to new domains.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has received funding from the German Federal Ministry for Economic Afairs and
Climate Action (BMWK) through the project SPEAKER (no. 01MK19011).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Syarova</surname>
          </string-name>
          ,
          <article-title>Chatbot usage in e-retailing and the efect on customer satisfaction</article-title>
          ,
          <year>2022</year>
          . URL: http://essay.utwente.nl/92080/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Sarol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Mohammad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A. A.</given-names>
            <surname>Rahman</surname>
          </string-name>
          ,
          <source>Mobile Technology Application in Aviation: Chatbot for Airline Customer Experience</source>
          , Springer Nature Singapore, Singapore,
          <year>2023</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>72</lpage>
          . URL: https://doi.org/10.1007/
          <fpage>978</fpage>
          -981-19-6619-
          <issue>4</issue>
          _5. doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -981-19-6619-
          <issue>4</issue>
          _
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Rehm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zaczynska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Moreno</surname>
          </string-name>
          ,
          <article-title>Semantic storytelling: Towards identifying storylines in large amounts of text content</article-title>
          .,
          <source>in: Text2Story@ ECIR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>63</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Raring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ostendorf</surname>
          </string-name>
          , G. Rehm,
          <article-title>Semantic relations between text segments for semantic storytelling: Annotation tool - dataset - evaluation, in: Proceedings of the Thirteenth Language Resources</article-title>
          and Evaluation Conference, European Language Resources Association, Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>4923</fpage>
          -
          <lpage>4932</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .
          <fpage>526</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wegba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Interactive storytelling for movie recommendation through latent semantic analysis</article-title>
          ,
          <source>in: 23rd International conference on intelligent user interfaces</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>521</fpage>
          -
          <lpage>533</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Baumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schirru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Streit</surname>
          </string-name>
          ,
          <article-title>Towards a storytelling approach for novel artist recommendations</article-title>
          , in: International Workshop on Adaptive Multimedia Retrieval, Springer,
          <year>2011</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Casillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Santo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lombardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mosca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santaniello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Valentino</surname>
          </string-name>
          ,
          <article-title>Recommender systems and digital storytelling to enhance tourism experience in cultural heritage sites</article-title>
          ,
          <source>in: 2021 IEEE International Conference on Smart Computing (SMARTCOMP)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>323</fpage>
          -
          <lpage>328</lpage>
          . doi:
          <volume>10</volume>
          .1109/SMARTCOMP52413.
          <year>2021</year>
          .
          <volume>00067</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Clarizia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Colace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lombardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pascale</surname>
          </string-name>
          ,
          <article-title>A context aware recommender system for digital storytelling</article-title>
          ,
          <source>in: 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>542</fpage>
          -
          <lpage>549</lpage>
          . doi:
          <volume>10</volume>
          .1109/AINA.
          <year>2018</year>
          .
          <volume>00085</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Möller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Risch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pietsch</surname>
          </string-name>
          ,
          <article-title>Germanquad and germandpr: Improving non-english question answering and passage retrieval</article-title>
          ,
          <source>CoRR abs/2104</source>
          .12741 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/ 2104.12741. arXiv:
          <volume>2104</volume>
          .
          <fpage>12741</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>