<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Vienna, Austria
$ alena@coreon.com (A. Vasilevich); michael@coreon.com (M. Wetzel); Sedlbauer@wirtschaftsagentur.at
(G. Sedlbauer); hubmer@wirtschaftsagentur.at (K. Hubmer)
 https://www.coreon.com/ (M. Wetzel); https://wirtschaftsagentur.at/ (G. Sedlbauer)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Language-Agnostic Knowledge Graphs for Smarter Multilingual Chatbots</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alena Vasilevich</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Wetzel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georg Sedlbauer</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kerstin Hubmer</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Coreon GmbH</institution>
          ,
          <addr-line>Rungestrasse 20, 10179 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vienna Business Agency</institution>
          ,
          <addr-line>Mariahilfer Strasse 20, 1070 Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>CEFAT4Cities targets the development of multilingual cross-border e-Government services, facilitating the conversion of natural-language administrative procedures into machine-readable data. We showcase the integration of CEFAT4Cities results into SmartBot, a prototype of a multilingual chatbot, developed for the Vienna Business Agency (VBA) in scope of the project. SmartBot makes VBA's services discoverable in a user-friendly way, fine-targeting such topics as starting a new business and finding relevant grants among hundreds of funding opportunities. It is driven by multilingual AI that contains the results of CEFAT4Cities workflows, integrated into its domain knowledge along with multilingual domain-specific vocabularies, represented in a language-agnostic knowledge graph in Coreon. Thanks to the integrated multilingual knowledge system (MKS), SmartBot is able to infer connections between language-agnostic concepts and deal with terms, previously unseen by the bot's language model.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Chatbots</kwd>
        <kwd>knowledge management</kwd>
        <kwd>terminology management</kwd>
        <kwd>knowledge graphs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Nowadays, tools and data provided by public sector and private organizations still tend to be
institutionally fragmented. The fragmentation of European e-government fabric triggered the
emergence of interoperability solutions, to unify and simplify interaction between cross-border
and cross-sector services. EuroVoc1 and ISA22 belong to such inter-operable solutions, fostering
uniformity within technical, semantic, organizational, and legal layers across the EU [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. The
existing Standards for Public Sector Information (PSI) provision supply instruments to describe
e-Government services in a uniform way. Yet they remain mostly unexploited and often lack
user-centric design, let alone multilingual functionality that would support the oficial linguistic
diversity of the EU [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. CEFAT4Cities project (2020-2022)3 targets this challenge of interaction
between EU residents, businesses, and public services, aiming to speed up the adoption of
multilingual cross-border eGovernment services. Its main objective is a software layer that
facilitates the conversion of natural-language administrative procedures into machine-readable
data (see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for details). Integrating its output into the existing EU resources, such as ISA2 and
CPSV4 that describe public services and associated life and business events [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we created an
open linked data repository, uniting concepts, relevant for businesses and citizens.
      </p>
      <p>
        In this paper, we showcase how this resource is leveraged in a prototype of a real-life chatbot
application, SmartBot, developed for the Vienna Business Agency (VBA)5 in scope of the
CEFAT4Cities project. Lately, chatbots have started to emerge in various fields, featuring use-cases
like information retrieval, service discoverability, customer service, and administrative
worklfows [
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8">5, 6, 7, 8</xref>
        ]. Since dialogue is a natural way of interaction between humans, conversational
agents designed to mimic this behaviour have potential to increase the eficiency of public
services. In our case, SmartBot’s goal is to automate and make VBA’s services discoverable in a
user-friendly way, targeting such topics as starting a new business in Vienna and helping users
ifnd relevant grants among hundreds of funding opportunities for companies of various scale.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Chatbot’s architecture</title>
      <p>
        4https://ec.europa.eu/isa2/solutions/core-public-service-vocabulary-application-profile-cpsv-ap_en
5https://viennabusinessagency.at/
6https://github.com/RasaHQ/
7https://www.coreon.com/
utterances. The dialogue management component predicts the next action in a conversation
based on the context. Rasa SDK handles all of our custom code: it is organised as custom
actions that search databases, make API calls, trigger a handover of the conversation to a human,
etc. Rasa Open Source is therefore adjustable to developers’ needs, featuring straightforward
integration and data control [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        On top of Rasa, our architecture features an integration with Coreon Multilingual Knowledge
System (MKS) [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. MKS is a a semantic knowledge repository, comprised of concepts
linked via relations. Following the semantic web standards, it caters for visual discovery, access,
drafting, and re-usability of any assets, organised in language-agnostic knowledge graphs. Since
the linking is performed at the concept level, we can abstract from language-specific terms and
model structured knowledge for phenomena that reflect the non-deterministic nature of the
human language, such as word sense ambiguity, synonymy, homonymy, and multilingualism.
Linking per concept ensures smooth maintenance of relations without additional data clutter:
relation edges are independent from labels and terms and other metadata. It thus helps exchange
information among acting systems and ensures that its precise meaning is understood and
preserved among all parties, in any language.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Leveraging language-agnostic knowledge graphs</title>
      <p>A big part of any chatbot’s implementation is associated with domain data. In our case, a
smooth cooperation in knowledge transfer is facilitated by MKS: VBA domain experts used it
to model their domain knowledge, populating and curating it as a graph (see Data Curation
side in Figure 1). The repository also incorporates the interoperability layer and public
servicethemed multilingual vocabularies. Aside from easy knowledge drafting, there are 4 concrete
challenges that are tackled by the incorporation of language-agnostic knowledge graphs in
virtual assistants: i) multilingualism; ii) language-independent entity management; iii) enabling
semantic search; iv) dealing with homonymy and unseen terms.</p>
      <p>In the European context, multilingualism is a big asset, yet it also brings along a
conceptual challenge: the kind of multilingualism served tends to heavily influence the architecture
and scalability of a solution. We decided to go with individual NLU models per language, i.e.
keeping them language-specific, while making dialogue management – Stories – universal,
adding an extra layer of abstraction to maintain consistency in bot’s behaviour across
languages. It implies that the core model should not have a single language-specific string among
the training data, but rather an abstraction for the representation of entities, like
languageindependent IDs. We abstracted from entity maintenance in distinct languages, replacing
language-specific terms in the NLU training data with their unique Coreon concept IDs.
Maintaining entities in each language separately would be tedious and not consistent, particularly
since the VBA domain knowledge is not static. Also, agnostic entities are crucial for keeping
the Core module language-agnostic, abstracted from entity names in a specific language. Once
VBA decides to expand SmartBot’s language capabilities with a new language, this method of
universal entities will ensure smooth model development and minimization of the labeling efort.</p>
      <p>The core goal of SmartBot is to serve the user
relevant grant recommendations based on previously
provided input (see Figure 2 for a conversation
snippet). This implies that the bot will have to fetch
records with relevant VBA grants. To achieve this,
we match information drawn from the user’s input
that influences the funding outcome (e.g., intents,
entities, and their types extracted by Rasa NLU).</p>
      <p>Since grant information was also imported into
MKS and each grant entry linked with relevant
entity types, we can leverage these relations between
concepts in the repository. With this functionality,
SmartBot is able to fetch relevant funding entries
even when terms extracted from the user’s input
are not explicitly appearing among VBA funding
entries: the bot navigates parental and
associative relations of the extracted entity and infers
if there are any semantically close or connected
concepts, linked with specific funding entries.
Ultimately, we cover this scenario: given a VBA grant
for SMEs focusing on environmental protection, a
user X, searching for grants for small businesses
doing roof planting/vertical gardening, and a user</p>
      <p>Y, looking for funding to support a startup that
calFigure 2: A demo dialogue snippet. culates CO2 footprint for businesses, would both
land at the aforementioned grant.</p>
      <p>Unseen terms and homonymy is tackled by the KG in the same fashion. If users choose to
use terminology previously unknown to the model, SmartBot will first try to get its meaning
using the connector to Coreon rather than taking a standard fallback. If a German user enquires
about the amount of money they can get from VBA; they refer to money as Kohle, a slang term
homonymous to Kohle, "coal", a fossil. The NLU model does not know this term, so the bot
makes an API request, searches for it in MKS, and finds two hits in two distinct concepts. The
ifrst one belongs to CO2 concept in a branch dealing with resource-saving and sustainability.
The second one is found among synonyms for Geldmittel, denoting financial funds and has
a more generic parent Geld, "money". Since quite a few terms of the concept Geldmittel are
known to the NLU model and the context of the conversation is corresponding, the meaning
of Kohle is disambiguated for the chatbot; subsequently, SmartBot informs the user about the
amount of money they can qualify for.</p>
      <p>Chatbots are becoming a turning point for rationalizing of business processes. Here we
investigated technical feasibility and described the implementation of the prototype that can
support VBA and serve the needs of Vienna residents, catering to the interaction in the language
of their choice and understanding the intents of their requests.</p>
      <p>Combining Rasa Open Source with reusable multilingual KG data, we delivered the intelligent
chatbot solution, robust, extendable, and modular – a steady reference point for similar activities
to facilitate provision of PSI. Accommodating the chatbot interaction to the user’s needs, VBA
SmartBot automatically overcomes the language gap, contributing to the elevation of local
public services to the European scale and red tape reduction.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This project utilises the results of CEFAT4Cities Action, funded by the European Commission’s
CEF Telecom programme under Grant 2019-EU-IA-0015.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bovalis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Peristeras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abecasis</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.-M. Abril-Jimenez</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gattegno</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Karalopoulos</surname>
            , I. Sagias,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Szekacs</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Wigard</surname>
          </string-name>
          ,
          <article-title>Promoting interoperability in europe's e-government</article-title>
          ,
          <source>Computer</source>
          <volume>47</volume>
          (
          <year>2014</year>
          )
          <fpage>25</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tambouris</surname>
          </string-name>
          ,
          <article-title>Using chatbots and semantics to exploit public sector information</article-title>
          , EGOVCeDEM-ePart
          <year>2018</year>
          (
          <year>2018</year>
          )
          <fpage>125</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.-J.</given-names>
            <surname>Mulder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Snijders</surname>
          </string-name>
          ,
          <article-title>Playing the telephone game in a multilevel polity: On the implementation of e-government services for business in the eu</article-title>
          ,
          <source>Government Information Quarterly</source>
          (
          <year>2020</year>
          )
          <fpage>101526</fpage>
          -
          <lpage>101534</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>J. Van den Bogaert</surname>
          </string-name>
          , A. Defauw,
          <string-name>
            <given-names>S.</given-names>
            <surname>Szoc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Everaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Van Winckel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kramchaninova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bardadym</surname>
          </string-name>
          , T. Vanallemeersch,
          <article-title>Cefat4cities, a natural language layer for the isa2 core public service vocabulary</article-title>
          ,
          <source>in: Proceedings of the 22nd Annual Conference of the European Association for Machine Translation</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>483</fpage>
          -
          <lpage>484</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Mehr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fellow</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence for citizen services and government, Ash Cent</article-title>
          . Democr. Gov. Innov. Harvard Kennedy Sch., no.
          <source>August</source>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Stamatis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gerontas</surname>
          </string-name>
          , E. Tambouris,
          <article-title>On using chatbots and cpsv-ap for public service provision, EGOV-CeDEM-ePart</article-title>
          <year>2019</year>
          (
          <year>2019</year>
          )
          <fpage>133</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Adnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hamdan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Alareeni</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence for public sector: chatbots as a customer service representative</article-title>
          ,
          <source>in: International Conference on Business and Technology</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>164</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Linnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pelzel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sultanow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Welter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cox</surname>
          </string-name>
          ,
          <article-title>A reference architecture for on-premises chatbots in banks and public institutions</article-title>
          ,
          <source>in: INFORMATIK</source>
          <year>2021</year>
          ,
          <article-title>Gesellschaft für Informatik</article-title>
          , Bonn,
          <year>2021</year>
          , pp.
          <fpage>1265</fpage>
          -
          <lpage>1281</lpage>
          . doi:
          <volume>10</volume>
          .18420/informatik2021-
          <fpage>106</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Hernandez</given-names>
            <surname>Mendez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Matthes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Langen</surname>
          </string-name>
          ,
          <article-title>Evaluating natural language understanding services for conversational question answering systems</article-title>
          ,
          <source>in: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</source>
          , Association for Computational Linguistics, Saarbrücken, Germany,
          <year>2017</year>
          , pp.
          <fpage>174</fpage>
          -
          <lpage>185</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W17</fpage>
          -5522.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wetzel</surname>
          </string-name>
          ,
          <article-title>Multilinguale taxonomien mit coreon. wissens- und sprachmanagement in einer lösung</article-title>
          , Rechte, Rendite, Ressourcen.
          <source>Wirtschaftliche Aspekte des Terminologiemanagements</source>
          <volume>14</volume>
          (
          <year>2014</year>
          )
          <fpage>41</fpage>
          -
          <lpage>51</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <article-title>Metadaten für intelligenten content</article-title>
          ,
          <source>Intelligente Information: Schriften zur Technischen Kommunikation</source>
          <volume>22</volume>
          (
          <year>2017</year>
          )
          <fpage>51</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>