<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>PCC2018-Bot: A Telegram bot for “Palermo Capitale della Cultura 2018” events powered by Linked Open Data and Schema.org annotations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>A. Lo Bue</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Machì</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D. Taibi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ICAR-CNR</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Palermo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy icar.cnr.it ing.antonino.lobue@gmail.com</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ICAR-CNR</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Palermo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy alberto.machi@icar.cnr.it</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ITD-CNR</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Palermo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy davide.taibi@itd.cnr.it</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>This paper describes a practice for live social reuse of Schema.org annotations and Linked Open Data in the realm of events. The events of the “Palermo Capitale della Cultura 2018” initiative of the Ministry of Culture of Italy, were semantically enhanced by interlinking the available open data with related information inferred from the Linked Open Data cloud (namely DBpedia and Geonames). The resulting dataset - stored as an RDF graph within a triplestore was exposed via API in a knowledge graph by using the Schema.org vocabulary .The dataset was made also accessible to an event search assistant for tourists implemented as a bot for the instant messaging application Telegram. This effort shows how plain open metadata can be powered by Linked Data and semantic vocabularies like Schema.org, to became rich machineunderstandable descriptions usable by automatic bots to provide improved question answering experiences for the social user.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Web</kwd>
        <kwd>Data Integration</kwd>
        <kwd>Linked Open Data</kwd>
        <kwd>Telegram bot</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <sec id="sec-2-1">
        <title>Linked Open data</title>
        <p>The evolution of the Web is strictly connected to the way users interact with it.
Nowadays, the potential users of web data are not only human beings but also software
services and software agents. For this reason data should be published on the web
using standards and technologies which can be understood and elaborated
automatically.</p>
        <p>
          At present, the most popular Web applications, such as Facebook and Youtube,
offer Application Program Interfaces (APIs) that allow software agents to access the
information they host. Semantic Web technologies provide an adequate technological
substrate for supporting the representation of concepts and the relationships between
them through ontologies, and the recent evolution of Linked Open Data is the natural
way to publish, integrate, and link data semantically described. The information
available on the web uses different typologies and is published in heterogeneous
formats. Linked Open Data (LOD) aims to provide a technological substrate for
publishing structural data in a standardized format. The advantages of such an approach are
tangible and it is increasingly common for data on the web to be published following
the LOD principles [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. While the linking of pages has marked the success of the
Web, at the same time LOD aims to connect datasets and the concepts they host,
providing information not only for humans but also for software agents.
        </p>
        <p>SPARQL Query Language1 is the standardized language to query and retrieve
Semantic Web data stored as RDF2 triples thus allowing and facilitating access to LOD
resources. DBpedia3, can be seen as the semantic version of Wikipedia; it is the core
of the Linked Open Data cloud, and provides a main access point for semantic
enrichment.
1.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Schema.org and bots</title>
        <p>Schema.org was launched in 2011, as a result of a join effort of the big players in the
search engine field: Google, Microsoft, Yahoo!, and Yandex, with the aim of defining
a shared vocabulary on common concepts of the real world.</p>
        <p>
          Starting from the top-level concept, represented by the most generic type named
Thing, main sub-concepts have been defined to represent concepts related to
CreativeWork, Organization, Person, Place, Product and Event. Moreover, specific
subtypes have been defined to represent concepts in different popular domains such as
medicine4, or education [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>As detected by the Web Data Commons5 project, nowadays, the adoption of
schema.org is more than 39% of all Web pages.</p>
        <p>
          Recent studies report that over 2.5 billion people have installed an instant
messaging app in its mobile phone, and already in 2015, interactions between people on the
Web were mediated by instant messaging apps more than social network. More
popular instant messaging platforms are: WhatsApp, Telegram, Viber [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          Amongst them, Telegram is cross-platform and provides an appropriate API for
building chat-bots to interact with a user or with a group of users. [
          <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
          ].
2
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Providing enriched contents on cultural events via bots</title>
      <p>“Palermo Capitale della Cultura 2018” is one of the initiatives of the Ministry of
Culture of Italy supporting coordination of cultural events in order to promote tourism
in a chosen city. This paper describes a bot developed for instant messaging Telegram
platform is presented to provide and share rich information about the initiative events.
The bot enrich sparse open event descriptions with LOD data and provides the social
1 https://www.w3.org/TR/rdf-sparql-query/
2 https://www.w3.org/RDF/
3 http://wiki.dbpedia.org
4 http://schema.org/docs/meddocs.html
5 http://webdatacommons.org
user an easy to use interface to browse them. The interface hides the complexity of
queries required to semantically build useful descriptions according to user
spatiotemporal context.</p>
      <p>Figure 1 shows the overall system architecture. The system serializes information
about events in a knowledge graph containing not only the sparse open data published
in the official website6, but also enriched entities extracted from the Linked Data silos
that were interlinked as described in Section 2.1. This approach, based on knowledge
graph and enriched entities, allows users to obtain additional information that were
not included in the original website or in all the other services and mobile apps that
are based on it.</p>
      <p>Data extracted from the PCC2018 web site are firstly imported in a Drupal CMS in
order to perform lexical cleaning of data and to improve improve efficiency of user
query .</p>
      <p>A mapping module translates CMS data into triples and stores them on a Virtuoso
RDF triple-store. A interlinking module implemented via web-services enriches the
resulting knowledge graph. A SPARQL endpoint anwers semantic queries on the
graph.</p>
      <p>Semantically enriched event descriptions are then reimported and delivered to a
Telegram bot through an API endpoint supporting field selection and range queries on
temporal and spatial data.
6 http://palermocapitalecultura.it/
2.1</p>
      <sec id="sec-3-1">
        <title>Semantic Enrichment</title>
        <p>Semantic Enrichment is a term used to describe the process of transforming plain data
into structured data that contains machine-readable statements. This enrichment can
happen using ontologies or taxonomies of controlled terms with semantics defined by
the data owner or, in the context of Linked Open Data principles, reusing
machineunderstandable vocabularies with metadata values defined by external data providers
as Europeana7 or DBpedia8). Knowledge graphs published on the LOD cloud can be
traversed to extract references or descriptions of related entities.</p>
        <p>
          The main issue with semantic enrichment lies in the way to automate the process,
in order to apply the enrichment to large volumes of data, instead of using manual
domain expert annotation. Interlinking rules, distance measure algorithms as well as
natural language processing techniques can support automated enrichment processes
and generate well-formed semantic data that exploit the LOD cloud [
          <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
          ].
        </p>
        <p>
          In the context of this work, data enrichment was implemented using a mixed
approach including programmatic tagging via external services and federated SPARQL
queries to provide interlinking enrichment [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>To semantically enrich event data, three specific enrichment techniques were used:
• Semantic named-entity recognition
• Geocoding enrichment
• Spatial interlinking</p>
        <p>The interlinking module developed in Python, implementing appropriate connector
interfaces between the external services and the triple-store.</p>
        <p>For example, the text referring the locality where the event happens was geocoded
using Google Geocode APIs9 , then, expressed using Schema.org relations (Address,
Administrative Areas, Latitude/Longitude coordinates) and finally reconverted in
plain text address of the event place for sake of simplicity.</p>
        <p>
          For the recognition of named entities the textual contents of the “title” and
“description” fields of each events was sent as input text to DBpedia Spotlight APIs10.
DBpedia Spotlight [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is a tool for automatically annotating mentions of DBpedia
resources in texts. by means. As output we received from the service, for each event,
an array of related "DBpedia intities" expressed as rdfs:seeAlso11 statements.
        </p>
        <p>The geocoding and named-entity inferred triples were used as source data for the
third type of interlinking, exploited via SPARQL federated queries12 to merge facts
about the same event extracted from different sources of the Linked Open Data cloud.
In particular, the SPARQL query implemented, infers from DBpedia nearby entities
(places, historical monuments, etc). A threshold to the Haversine geospatial distance
from the Event place coordinates was used to define the effective region of interest
around the event place.
7 https://pro.europeana.eu/page/linked-open-data
8 https://wiki.dbpedia.org
9 https://developers.google.com/maps/documentation/geocoding/intro
10 http://spotlight.dbpedia.org
11 https://www.w3.org/TR/rdf-schema/#ch_seealso
12 https://www.w3.org/TR/sparql11-federated-query/</p>
        <p>The semantically enriched events stored in the triple-store, were mapped back to
the application main CMS, and served to the telegram BOT via the CMS output API
endpoint.</p>
        <p>Table1 shows the semantics and formats of event descriptors at various steps of the
enrichment chain.
2.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>The Telegram bot</title>
        <p>The Telegram Bot presented in this paper was designed to guide users in searching for
events organized in the framework of the “Palermo Capitale della Cultura 2018”
initiative. The interface facilitates the searching process by simplifying users’
interactions in order to provide access to information related to events matching their interest
in a minimum number of clicks. In particular, customized keyboards were designed to
help users in selecting straightforwardly the most commonly used search options.</p>
        <p>The customized keyboard shown by the bot allows users to search along three
specific dimensions: temporal, spatial and categorical (Fig 2a).</p>
        <p>The spatial dimension allows users to search events in the nearby of their current
position. Telegram Bot API supports the possibility to transmit to the bot the user
coordinates, by using the request_location parameter associated to the keyboard
button. After selecting the options along the three dimensions, a list of events matching
user’s preference is shown as selection list (Fig 2b).</p>
        <p>(a)
(b)
(c)</p>
        <p>User can select from the list an event in order to access event description. In fact,
for each event the bot answers with a message containing the title, an image related to
the event and a brief description. Moreover, the bot can provide additional content
related to opening hours, description details and a list of and nearby points of interest.</p>
        <p>The enriched knowledge graph is used to provide this additional information.
Specifically, the rdf:seeAlso property is used to provide detailed information to the
entities related to the event, while the geonames:nearby property is used to provide
information about the points of interest located in the neighborhood of the place in
which the event takes place.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and future work:</title>
      <p>Descriptions of more than about 200 events (plus replicas) in the framework of the
initiative “Palermo European Capitale della Cultura 2018” were enriched with full
geo-location information, completed with images related to location or named subject,
and with references to Linked Data entities, extracted via Named-Entity recognition
and via geospatial inference rules. A compact semantic search GUI was provided
through a Telegram bot to allow users to easily search and share information.</p>
      <p>This effort shows how plain metadata (and Open Data) can be powered by Linked
Data and expressed through semantic vocabularies like Schema.org, to became
richmachine understandable data used unambiguously by automatic bots to provide
improved question answering experiences for the social user.</p>
      <p>Topics of current research are the enhancement of the enrichment inference engine
and the upgrade of the bot to a conversational bot (chat-bot).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.,</given-names>
          </string-name>
          <article-title>The emerging web of linked data</article-title>
          .
          <source>Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications</source>
          , (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Guha</surname>
          </string-name>
          , Dan Brickley, and
          <string-name>
            <given-names>Steve</given-names>
            <surname>Macbeth</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Schema.org: evolution of structured data on the web</article-title>
          .
          <source>Commun. ACM 59</source>
          ,
          <issue>2</issue>
          (
          <year>January 2016</year>
          ),
          <fpage>44</fpage>
          -
          <lpage>51</lpage>
          . DOI: https://doi.org/10.1145/2844544
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Sutikno</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Handayani</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stiawan</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riyadi</surname>
            <given-names>M A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Much</surname>
            <given-names>I</given-names>
          </string-name>
          and
          <string-name>
            <surname>Subroto</surname>
            <given-names>I 2016 WhatsApp</given-names>
          </string-name>
          ,
          <article-title>Viber and Telegram: which is the best for instant messaging?</article-title>
          <source>Int. J. of Electrical and Computer Eng</source>
          .
          <source>(IJECE) 6</source>
          <volume>909</volume>
          -
          <fpage>14</fpage>
          http://doi.org/10.11591/ijece.v6i3.
          <fpage>10271</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Pereira</surname>
            <given-names>J.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Leveraging chatbots to improve self-guided learning through conversational quizzes</article-title>
          .
          <source>In Proceedings of the Fourth International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM '16)</source>
          , Francisco José García-Peñalvo (Ed.). ACM, NY, USA,
          <fpage>911</fpage>
          -
          <lpage>918</lpage>
          . DOI: https://doi.org/10.1145/3012430.3012625
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dietze</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taibi</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barker</surname>
            <given-names>P</given-names>
          </string-name>
          , and
          <string-name>
            <surname>d'Aquin</surname>
            <given-names>M.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Analysing and Improving Embedded Markup of Learning Resources on the Web</article-title>
          .
          <source>In Proceedings of the 26th International Conference on World Wide Web Companion (WWW '17 Companion)</source>
          .
          <source>International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland</source>
          ,
          <fpage>283</fpage>
          -
          <lpage>292</lpage>
          . DOI: https://doi.org/10.1145/3041021.3054160
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Lo</given-names>
            <surname>Bue</surname>
          </string-name>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Machi</surname>
          </string-name>
          <string-name>
            <surname>A.</surname>
          </string-name>
          ,
          <article-title>Open Data Integration using SPARQL and SPIN - A Case Study for the Tourism Domain</article-title>
          .
          <source>AI*IA</source>
          <year>2015</year>
          :
          <article-title>Artificial Intelligence</article-title>
          and
          <string-name>
            <surname>Human-Oriented</surname>
            <given-names>Computing</given-names>
          </string-name>
          ,
          <source>September</source>
          <volume>23</volume>
          -25
          <string-name>
            <surname>Ferrara</surname>
          </string-name>
          (
          <year>2015</year>
          ),
          <article-title>Italy</article-title>
          . LNCS 9336 pp
          <fpage>316</fpage>
          -
          <lpage>326</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Lo</given-names>
            <surname>Bue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Wecker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            ,
            <surname>Kuflik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Machì</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            , &amp;
            <surname>Stock</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Providing Personalized Cultural Heritage Information for the Smart Region-A Proposed Methodology</article-title>
          .
          <source>In UMAP Workshops.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Simou</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chortaras</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamou</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kollias</surname>
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2017</year>
          )
          <article-title>Enriching and Publishing Cultural Heritage as Linked Open Data</article-title>
          . In:
          <string-name>
            <surname>Ioannides</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Magnenat-Thalmann</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papagiannakis</surname>
            <given-names>G</given-names>
          </string-name>
          . (
          <article-title>eds) Mixed Reality and Gamification for Cultural Heritage</article-title>
          . Springer, Cham
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Pablo</given-names>
            <surname>Mendes</surname>
          </string-name>
          , Max Jakob, Andrés García-Silva, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>DBpedia spotlight: shedding light on the web of documents</article-title>
          .
          <source>In Proceedings of the 7th International Conference on Semantic Systems</source>
          , Chiara Ghidini,
          <string-name>
            <surname>Axel-Cyrille Ngonga</surname>
            <given-names>Ngomo</given-names>
          </string-name>
          , Stefanie Lindstaedt, and Tassilo Pellegrini (Eds.). ACM, New York, NY, USA,
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>