<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Annotation based automatic action processing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elias Karle</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dieter Fensel</string-name>
          <email>dieter.fenselg@sti2.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Semantic Technology Institute, Universitat Innsbruck</institution>
          ,
          <addr-line>Innsbruck</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With a strong motivational background in search engine optimization the amount of structured data on the web is growing rapidly. The main search engine providers are promising great increase in visibility through annotation of the web page's content with the vocabulary of schema.org and thus providing it as structured data. But besides the usage by search engines the data can be used in various other ways, for example for automatic processing of annotated web services or actions. In this work we present an approach to consume and process schema.org annotated data on the web and give an idea how a best practice can look like.</p>
      </abstract>
      <kwd-group>
        <kwd>semantic web</kwd>
        <kwd>semantic web services</kwd>
        <kwd>open data</kwd>
        <kwd>schema</kwd>
        <kwd>org</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The introduction of intelligent personal assistants (IPAs), like Amazon's Echo,
Apple's Siri, Google's Allo or Microsoft's Cortana, is about to fundamentally
change the way we search for information or consume content on the web. With
the introduction of schema.org in 2011 the web received a de-facto standard for
structuring content on the web and make it machine read- and interpretable.
Annotating a website with schema.org is currently a common SEO1 practice to
increase visibility. Yet the semantically enriched data can also be used by third
party software and hence make the website a database-like knowledge source.</p>
      <p>With this work in progress paper we present the idea of using schema.org
annotated data on the web to automatically process information and to execute
schema.org actions in the manner of semantic web services. Based on a
predened set of websites (URLs of websites of similar content) our system collects
schema.org annotated data, stores it and provides the data over a likewise
semantically annotated API to third party software, like before mentioned IPAs,
chatbots or alike.</p>
      <p>
        In four steps we are (1) collecting the data of a prede ned set of websites and
(2) storing it into a document store or graph database. In step (3) we de ne a
layer for data retrieval and step (4) de nes the API for third party consumption
and describes the API with schema.org actions. As a showcase we are using an
example from the tourism sector. Tourism is a convenient example because in
an analysis we found out, that this sector is increasingly adapting schema.org
for hotels [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], events, POIs and alike, and because we know of the existence of
an almost fully annotated destination marketing organization platform [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Some of the steps mentioned in Section 1 are already well established and
discussed in the literature, like crawling and data storage. Others are not su ciently
covered, like data retrieval and reconciliation, and will be the matter of
contribution in this paper. This sections presents literature about both areas. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
the authors describe YARS2, a system to query content in a structured data
graph built upon information from various websites. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] the authors present
Swoogle a, as the name suggests, Google for the semantic web (before Google
itself started operating in the eld of the semantic web). As opposed to the two
ideas we do not want to provide a holistic semantic web search engine but an
endpoint for domain speci c annotated data and annotated web services from
a prede ned set of URLs which we then provide not over a user interface but
over an API. Very early attempts of extracting structured and semi-structured
data from the web can be found in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] where the authors were creating wrappers
based on templates to extract content from the web. Our approach is targeting
fully structured data in RDFa, Microdata or JSON-LD format. The data found
by the crawler will be translated to JSON-LD and, due to its JSON nature,
stored in the NoSQL database MongoDB. The advantages of using NoSQL over
classical RDBMSs are described in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. A comparison of MongoDB to Cassandra,
another widely distributed NoSQL database can be found in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and justi es the
advantages of the use of MongoDB for our use case. Another way to store
semantic data would be, due to its triple nature, a graph database as described in
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which we consider as a storage for the next step of this paper. The challenges
of information retrieval, especially in large graph databases, are described in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
a solution for semantic data reconciliation can be found here [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Even though
our approach is not using a graph database but the above mentioned document
database (MongoDB) similar challenges occur and a lot can be learned from
the mentioned works. The identi cation of (web-) services will use techniques
of semantic web service discovery as in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The publication of the structured
data found on the web will be enabled by semantically annotated web services,
as opposed to the ideas found in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] we are using schema.org actions
as a light weight semantic web service annotation language.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>
        This section presents the methodology of the four steps mentioned in the
introduction in more detail. The general idea is to collect the structured data
which is available on certain websites and process the web services annotated
with schema.org in an automatic way. Depending on the domain those web
services can be a booking of a touristic service, like a hotel room or a ski course,
a purchase of a product in a web shop or others. A precondition for working
with schema.org annotated data is an e cient and automated way to publish
it for website owners. We are also working on means for automatic annotation
publication (like described in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] amongst others) but that would exceed the
scope of that paper. The starting point for the process is a prede ned collection
of URLs of websites containing schema.org annotated data. First we collect that
structured data and identify the type of data and the presence of web services
by web service discovery (see therefore [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]).The collecting frequency is hereby
dependent on the sort of data (static, dynamic or active data). To further process
the data and to perform reasoning we store everything we found in a database.
In the rst step we are using the NoSQL database MongoDB because it stores
data in form of JSON which is the exact same format most of the annotations
of our use case are in. Hereby we have to take care of avoiding redundancies,
combine newly found entities with already existing entities in the graph by entity
resolution and make meaningful subgraphs. By the time our store is populated,
we can query the data from the document store. The challenge here is to nd
exactly the data the user is looking for, even though the data is heterogeneous
and might be incomplete and erroneous. The methods of information retrieval,
semantic reconciliation and heuristic classi cation have promising approaches
and will be used here. To make the data accessible for third party software it
will be published through an application programming interface (API). This
interface will be described as a semantic web service itself with schema.org actions,
and take search parameters as an input and return schema.org objects as results
which can be used for further interaction with the API.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Use Case</title>
      <p>
        As a show case we take a domain with a high density of many well and fully
annotated websites, the tourism sector and the destination marketing organization
(DMO) of Mayrhofen (annotated during the work on [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). We collect the
structured data on the DMO's website, store it in MongoDB and make an Alexa skill
which is connected to our web service. So when a user asks Alexa about an
available hotel room, the request will be forwarded to our API. Our API then queries
our database and returns an answer, containing a schema.org object with room
o ers. The o er object contains further references to actions (API requests),
for example a reservation request for the room or a booking action. Those
requests are then referred to the source where our system initially found them, for
example the accommodation provider's internet booking engine. So our system
acts like a proxy and the actual business logic stays with the accommodation
provider.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion &amp; Ongoing Work</title>
      <p>This work-in-progress paper describes a way to collect schema.org annotated
data from websites, store the data in a document- or graph database and provide
the information over an API for automatic consumption of content and execution
of actions. The ongoing work covers all four steps of the methodology to raise
the development level of this research idea to a working prototype for real life
application.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Karle, E.,
          <string-name>
            <surname>Fensel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toma</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fensel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Why are there more hotels in tyrol than in austria? analyzing schema. org usage in the hotel domain</article-title>
          .
          <source>In: Information and Communication Technologies in Tourism 2016</source>
          . Springer (
          <year>2016</year>
          )
          <volume>99</volume>
          {
          <fpage>112</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Akbar</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Karle, E.,
          <string-name>
            <surname>Panasiuk</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simsek</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toma</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fensel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Complete Semantics to empower Touristic Service Providers</article-title>
          . Preprint, accepted at ODBASE conference
          <year>2017</year>
          (
          <year>June 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Yars2: A federated repository for querying graph structured data from the web</article-title>
          .
          <source>The Semantic Web</source>
          (
          <year>2007</year>
          )
          <volume>211</volume>
          {
          <fpage>224</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cost</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reddivari</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doshi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sachs</surname>
          </string-name>
          , J.:
          <article-title>Swoogle: a search and metadata engine for the semantic web</article-title>
          .
          <source>In: Proceedings of the thirteenth ACM international conference on Information and knowledge management</source>
          ,
          <source>ACM</source>
          (
          <year>2004</year>
          )
          <volume>652</volume>
          {
          <fpage>659</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Abiteboul</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Querying semi-structured data</article-title>
          .
          <source>Database TheoryICDT</source>
          '
          <volume>97</volume>
          (
          <year>1997</year>
          )
          <volume>1</volume>
          {
          <fpage>18</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Moniruzzaman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hossain</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          :
          <article-title>Nosql database: New era of databases for big data analytics-classi cation, characteristics and comparison</article-title>
          .
          <source>arXiv preprint arXiv:1307.0191</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Abramova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernardino</surname>
          </string-name>
          , J.:
          <article-title>Nosql databases: Mongodb vs cassandra</article-title>
          .
          <source>In: Proceedings of the international C* conference on computer science and software engineering</source>
          ,
          <source>ACM</source>
          (
          <year>2013</year>
          )
          <volume>14</volume>
          {
          <fpage>22</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Angles</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Querying rdf data from a graph database perspective</article-title>
          .
          <source>In: European Semantic Web Conference</source>
          , Springer (
          <year>2005</year>
          )
          <volume>346</volume>
          {
          <fpage>360</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Weikum</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasneci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramanath</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Database and informationretrieval methods for knowledge discovery</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>52</volume>
          (
          <issue>4</issue>
          ) (
          <year>2009</year>
          )
          <volume>56</volume>
          {
          <fpage>64</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anaby-Tavor</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trombetta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montesi</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A framework for modeling and evaluating automatic semantic reconciliation</article-title>
          .
          <source>The VLDB JournalThe International Journal on Very Large Data Bases</source>
          <volume>14</volume>
          (
          <issue>1</issue>
          ) (
          <year>2005</year>
          )
          <volume>50</volume>
          {
          <fpage>67</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Klusch</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fries</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sycara</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Automated semantic web service discovery with owls-mx</article-title>
          .
          <source>In: Proceedings of the fth international joint conference on Autonomous agents and multiagent systems</source>
          ,
          <source>ACM</source>
          (
          <year>2006</year>
          )
          <volume>915</volume>
          {
          <fpage>922</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Fensel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lausen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Bruijn</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stollberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingue</surname>
          </string-name>
          , J.:
          <article-title>Enabling semantic web services: the web service modeling ontology</article-title>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ankolekar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hobbs</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lassila</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDermott</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McIlraith</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paolucci</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Payne</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , et al.:
          <article-title>Daml-s: Web service description for the semantic web</article-title>
          .
          <source>The Semantic WebISWC</source>
          <year>2002</year>
          (
          <year>2002</year>
          )
          <volume>348</volume>
          {
          <fpage>363</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. Karle, E.,
          <string-name>
            <surname>Simsek</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fensel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>: semantify.it, a platform for creation, publication and distribution of semantic annotations</article-title>
          . Preprint, accepted at SEMAPRO conference
          <year>2017</year>
          (
          <year>June 2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>