<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AGDLI: ArCo, GVP and DBpedia Linking Initiative</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>University of Rome UnitelmaSapienza</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy stefano.faralli@unitelmasapienza.it</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sapienza University of Rome</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>lenzi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>velardig@di.uniroma</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>We present the ArCo, GVP and DBpedia Linking Initiative (AGDLI), a research activity within the project SMARTOUR: intelligent platforms for tourism, funded by the Italian Ministry of University and Research. Our initiative is aimed at linking ArCo's cultural entities to the well known Getty Vocabulary Program and DBpedia ontologies, with the main goal of providing a semantically rich representation of the Italian cultural heritage for tourism-related knowledge-based applications. In this paper we provide a detailed description of the initiative and describe the current research developments and outcomes.</p>
      </abstract>
      <kwd-group>
        <kwd>ArCo</kwd>
        <kwd>Getty Vocabularies</kwd>
        <kwd>DBpedia</kwd>
        <kwd>knowledge-based applications</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Nowadays, we are observing an increasing number of novel semantically-enabled
and knowledge-based applications. Hence, Linked Open Data are more and more
gaining the attention from public administrations and industries all over the
world. In this paper3, we describe the ArCo, GVP and DBpedia Linking
Initiative (AGDLI ). Our initiative is a research activity part of the SMARTOUR:
intelligent platform for tourism project (see Section Acknowledgements). The
main goal of the initiative is to study semi-supervised methodologies to
generate semantically rich de nitions of Italian cultural heritage entities, to be used
in di erent knowledge-based tourism related applications, such as recommender
systems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and semantically-enriched augmented reality tools for point of
interests discovery [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. To this end, we decided to link the entities de ned in ArCo4
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] with the concepts de ned in the Getty Vocabulary Program5 (GVP ) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and
3 Copyright © 2021 for this paper by its authors. Use permitted under Creative
      </p>
      <p>
        Commons License Attribution 4.0 International (CC BY 4.0).
4 http://wit.istc.cnr.it/arco/?lang=en.
5 https://www.getty.edu/research/tools/vocabularies/.
DBpedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]6 ontologies. ArCo is a state-of-the art knowledge graph of the
Italian cultural heritage, which de nes 169 million triples describing 820 thousand
cultural entities. In ArCo (see Figure 1), important properties - such as the type
(dc:type) - are valued with literals or not linked with existing ontologies e.g.
authorship attributions (I0:Agent ). The GVP is a top ontology on which the Art &amp;
Architecture Thesaurus® (AAT ), the Getty Thesaurus of Geographic Names®
(TGN ), and the Union List of Artist Names® (ULAN ) vocabularies are based
on. AAT, TGN and ULAN vocabularies provide semantic de nitions for
concepts useful for cataloging, documenting and retrieving information related to
art, architecture, and other material culture. By targeting both the GVP and
DBpedia ontologies, we can generate, with high coverage, links for ArCo entities
and their properties. This may considerably enrich the ArCo ontology, which
currently de nes only 14 high-level classes with a depth of 4, while e.g. the
AAT ontology provides more than 55K domain speci c concepts divided in 8
facet taxonomies with an average height of 13 levels. In this paper, we provide
a description of the current research outcomes and future work of the AGDLI
initiative.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The linking initiative</title>
      <p>In Figure 1, we depict an excerpt of the ArCo schema. In this diagram we
highlight some of the properties of ArCo CulturalProperty entities that we link to
the GVP ontology. Speci cally, we are investigating semi-supervised
methodologies to automatically: i) mine and link the dc:type and rdfs:label properties
of CulturalProperty instances to the AAT ; ii) link the cities of the addresses of
CulturalProperty instances to the TGN ; iii) link the agents of authorship
attributions of CulturalProperty instances to the ULAN ; iv) normalize the date
intervals of CulturalProperty instances into a machine readable format7, such
as the Open Date Range Format 8. We note that ArCo is in Italian, while the
GVP is mainly in English, which represents an additional challenge of our
linking initiative. In Figure 2, we depict an example of mining concepts from ArCo
entities' textual descriptions and linking them to corresponding concepts in the
AAT. In this task:
1. we automatically translated from English to Italian the AAT terms. To this
end, we used the Google Translate API9. Note that, we preserved the original
Italian terminology when already provided by the AAT;
6 https://www.dbpedia.org/.
7 This initiative's aim is intended to provide a ready to use resource for time-based
tourism applications.
8 Dublin Core Collection Description: Open Date Range Format
http://www.ukoln.ac.uk/metadata/dcmi/date-dccd-odrf/.
9 https://cloud.google.com/translate/.
skos:subject</p>
      <p>AAT
gvp:Concept
*
type</p>
      <p>label
dc:type
rdfs:label</p>
      <p>dc:date
arco-arco:CulturalProperty
date
dc:date</p>
      <p>DCCD
ODRF
clvapit:hasGeometry
arco-location:hasCulturalPropertyAddress</p>
      <p>arco-cd:hasAuthorshipAttribution
clvapit:Geometry
clvapit:Address
arco-cd:AuthorshipAttribution
arco-location:hasCoordinates
clvapit:hasCity</p>
      <p>arco-cd:hasAttributedAuthor
arco-location:Coordinates
clvapit:City
l0:Agent
arco-location:lat
arco-location:long
rdfs:label</p>
      <p>rdfs:label
lat</p>
      <p>long
Prefixes
arco-arco: &lt;https://w3id.org/arco/ontology/arco/&gt;
arco-location: &lt;https://w3id.org/arco/ontology/location/&gt;
arco-cd: &lt;https://w3id.org/arco/ontology/context-description/&gt;
clvapit: &lt;https://w3id.org/italia/onto/CLV/&gt;
dbpedia: &lt;http://dbpedia.org/resource/&gt;
dc: &lt;http://purl.org/dc/elements/1.1/&gt;
gvp: &lt;http://vocab.getty.edu/ontology#&gt;
l0: &lt;https://w3id.org/italia/onto/l0/&gt;
owl: &lt;http://www.w3.org/2002/07/owl#&gt;
rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt;
skos: &lt;http://www.w3.org/2004/02/skos/core#&gt;
city</p>
      <p>author
owl:sameAs
owl:sameAs</p>
      <p>owl:sameAs
owl:Class</p>
      <sec id="sec-2-1">
        <title>DBpedia</title>
        <p>gvp:PhysPlaceConcept
gvp:PersonConcept
TGN</p>
      </sec>
      <sec id="sec-2-2">
        <title>ULAN</title>
        <p>2. we applied standard text pre-processing techniques (e.g., tokenization,
lowercasing) to ArCo entities' textual descriptions. To this end, we adopted the
Stanford NLP API10;
3. we automatically collected all the occurrences of Italian AAT terms in ArCo
entities' rdfs:label and dc:type resulting preprocessed textual properties. In
this step, for each ArCo's entity, we obtained a collection of links with AAT
concepts. As a result, we obtained a collection of ambiguous links to all the
AAT concepts having the same skosxl:literalForm 11 (see the example of as
described in Figure 2,).
4. since these tasks are error-prone, we performed a manual re nement of the
translated AAT 's terms, xing translation errors and adding synonyms,
singular, plural and hypernymous forms for terms occurring in the the textual
properties of missing linked ArCo's entities;
5. we repeated steps 3 and 4 until an adequate coverage was reached.
10 https://nlp.stanford.edu/software/.
11 https://www.w3.org/TR/skos-reference/skos-xl.html#literalForm.
To link ArCo's cities to TGN and DBpedia we performed string matching12 with
the corresponding terms and entities. At the time of writing, we are investigating
on e ective linking methodologies of ArCo's I0:Agent s with ULAN entities, and
on dc:date normalization.
3</p>
        <p>Current Outcomes and Conclusions
In this paper, we introduced the AGDLI initiative. As a result, we obtained13:
{ the automatic translation in Italian of the 55K AAT terms;
{ a total of 5.6 M triples (skos:relatedMatch and skos:related ) linking the 98.2%
(by dc:type) and the 99.9% (by rdfs:label ) of arco-arco:CulturalProperty
entities to candidate AAT concepts;
{ a total of 6.6 K triples (skos:relatedMatch) linking the 86.3% of clvapit:City
instances to candidate TGN entities; iv) 4.7 K novel owl:sameAs relations,
now linking the 100% clvapit:City to DBpedia.
12 We applied di erent similarity measures e.g., string edit distance-based similarity.
13 Resources are available under Creative Commons Attribution 4.0 International (CC
BY 4.0) at https://sites.google.com/unitelmasapienza.it/agdli/.
As already introduced in Section 2, the next planned activities are aimed at both
linking ArCo's authorship attributions to ULAN entities and normalizing the
CulturalProperty 's dc:date.</p>
        <p>Moreover, we are planning to apply semi-supervised methodologies for the
disambiguation of the generated candidate links. For instance, generated links to
AAT concepts can be re ned with semi-supervised word sense disambiguation
approaches, while the generated matches with TGN candidates can be
disambiguated based on the distance between the geographical coordinates of ArCo
and TGN entities.</p>
        <p>
          Further plans of the AGDLI initiative include, among others, the
application and investigation of knowledge graph completion methodologies [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] to link
isolated (unmatched) entities of the resulting graph, and the adoption of best
practices for continuous resource maintenance and deployment.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgements</title>
      <p>This work was carried out within the research project "SMARTOUR: intelligent
platform for tourism" funded by the Italian Ministry of University and Research
with the Regional Development Fund of European Union (PON Research and
Competitiveness 2007-2013).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          .
          <source>In: The Semantic Web</source>
          . pp.
          <volume>722</volume>
          {
          <fpage>735</fpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2007</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          - 76298-0 52
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Carriero</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mancinelli</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marinucci</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nuzzolese</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Presutti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veninata</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Arco: The italian cultural heritage knowledge graph</article-title>
          .
          <source>In: The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference</source>
          , Auckland, New Zealand,
          <source>October 26-30</source>
          ,
          <year>2019</year>
          , Proceedings,
          <source>Part II. Lecture Notes in Computer Science</source>
          , vol.
          <volume>11779</volume>
          , pp.
          <volume>36</volume>
          {
          <fpage>52</fpage>
          . Springer (
          <year>2019</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -30796-7 3
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Cheng,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          :
          <article-title>Knowledge graph completion: A review</article-title>
          .
          <source>IEEE Access 8</source>
          ,
          <issue>192435</issue>
          {
          <fpage>192456</fpage>
          (
          <year>2020</year>
          ). https://doi.org/10.1109/ACCESS.
          <year>2020</year>
          .3030076
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Harpring</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Development of the getty vocabularies: Aat, tgn, ulan, and cona</article-title>
          .
          <source>Art Documentation: Journal of the Art Libraries Society of North America</source>
          <volume>29</volume>
          (
          <issue>1</issue>
          ),
          <volume>67</volume>
          {
          <fpage>72</fpage>
          (
          <year>2010</year>
          ), http://www.jstor.org/stable/27949541
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ruta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scioscia</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Filippis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ieva</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Binetti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Di</given-names>
            <surname>Sciascio</surname>
          </string-name>
          , E.:
          <article-title>A semantic-enhanced augmented reality tool for openstreetmap poi discovery</article-title>
          .
          <source>Transportation Research Procedia</source>
          <volume>3</volume>
          ,
          <issue>479</issue>
          {
          <fpage>488</fpage>
          (
          <year>2014</year>
          ). https://doi.org/https://doi.org/10.1016/j.trpro.
          <year>2014</year>
          .
          <volume>10</volume>
          .029, https://www.sciencedirect.com/science/article/pii/S2352146514001926, 17th Meeting of the EURO Working Group on Transportation,
          <year>EWGT2014</year>
          ,
          <fpage>2</fpage>
          -
          <issue>4</issue>
          <year>July 2014</year>
          , Sevilla, Spain
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Arti cial intelligence in recommender systems</article-title>
          .
          <source>Complex &amp; Intelligent Systems</source>
          <volume>7</volume>
          (
          <issue>1</issue>
          ),
          <volume>439</volume>
          {457 (Feb
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>