AGDLI: ArCo, GVP and DBpedia Linking
Initiative
Stefano Faralli1[0000−0003−3684−8815] , Andrea Lenzi2[0000−0002−8997−9862] , and
Paola Velardi2[0000−0003−0884−1499]
1
University of Rome UnitelmaSapienza, Italy
stefano.faralli@unitelmasapienza.it
2
Sapienza University of Rome, Italy
{lenzi,velardi}@di.uniroma1.it
Abstract. We present the ArCo, GVP and DBpedia Linking Initiative
(AGDLI), a research activity within the project SMARTOUR: intelligent
platforms for tourism, funded by the Italian Ministry of University and
Research. Our initiative is aimed at linking ArCo’s cultural entities to the
well known Getty Vocabulary Program and DBpedia ontologies, with the
main goal of providing a semantically rich representation of the Italian
cultural heritage for tourism-related knowledge-based applications. In
this paper we provide a detailed description of the initiative and describe
the current research developments and outcomes.
Keywords: ArCo · Getty Vocabularies · DBpedia · knowledge-based
applications
1 Introduction
Nowadays, we are observing an increasing number of novel semantically-enabled
and knowledge-based applications. Hence, Linked Open Data are more and more
gaining the attention from public administrations and industries all over the
world. In this paper3 , we describe the ArCo, GVP and DBpedia Linking Ini-
tiative (AGDLI ). Our initiative is a research activity part of the SMARTOUR:
intelligent platform for tourism project (see Section Acknowledgements). The
main goal of the initiative is to study semi-supervised methodologies to gener-
ate semantically rich definitions of Italian cultural heritage entities, to be used
in different knowledge-based tourism related applications, such as recommender
systems [6] and semantically-enriched augmented reality tools for point of inter-
ests discovery [5]. To this end, we decided to link the entities defined in ArCo 4
[2] with the concepts defined in the Getty Vocabulary Program 5 (GVP ) [4] and
3
Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
4
http://wit.istc.cnr.it/arco/?lang=en.
5
https://www.getty.edu/research/tools/vocabularies/.
S. Faralli, A. Lenzi and P. Velardi
DBpedia [1]6 ontologies. ArCo is a state-of-the art knowledge graph of the Ital-
ian cultural heritage, which defines 169 million triples describing 820 thousand
cultural entities. In ArCo (see Figure 1), important properties - such as the type
(dc:type) - are valued with literals or not linked with existing ontologies e.g. au-
thorship attributions (I0:Agent). The GVP is a top ontology on which the Art &
Architecture Thesaurus® (AAT ), the Getty Thesaurus of Geographic Names®
(TGN ), and the Union List of Artist Names® (ULAN ) vocabularies are based
on. AAT, TGN and ULAN vocabularies provide semantic definitions for con-
cepts useful for cataloging, documenting and retrieving information related to
art, architecture, and other material culture. By targeting both the GVP and
DBpedia ontologies, we can generate, with high coverage, links for ArCo entities
and their properties. This may considerably enrich the ArCo ontology, which
currently defines only 14 high-level classes with a depth of 4, while e.g. the
AAT ontology provides more than 55K domain specific concepts divided in 8
facet taxonomies with an average height of 13 levels. In this paper, we provide
a description of the current research outcomes and future work of the AGDLI
initiative.
2 The linking initiative
In Figure 1, we depict an excerpt of the ArCo schema. In this diagram we high-
light some of the properties of ArCo CulturalProperty entities that we link to
the GVP ontology. Specifically, we are investigating semi-supervised method-
ologies to automatically: i) mine and link the dc:type and rdfs:label properties
of CulturalProperty instances to the AAT ; ii) link the cities of the addresses of
CulturalProperty instances to the TGN ; iii) link the agents of authorship at-
tributions of CulturalProperty instances to the ULAN ; iv) normalize the date
intervals of CulturalProperty instances into a machine readable format7 , such
as the Open Date Range Format 8 . We note that ArCo is in Italian, while the
GVP is mainly in English, which represents an additional challenge of our link-
ing initiative. In Figure 2, we depict an example of mining concepts from ArCo
entities’ textual descriptions and linking them to corresponding concepts in the
AAT. In this task:
1. we automatically translated from English to Italian the AAT terms. To this
end, we used the Google Translate API9 . Note that, we preserved the original
Italian terminology when already provided by the AAT;
6
https://www.dbpedia.org/.
7
This initiative’s aim is intended to provide a ready to use resource for time-based
tourism applications.
8
Dublin Core Collection Description: Open Date Range Format
http://www.ukoln.ac.uk/metadata/dcmi/date-dccd-odrf/.
9
https://cloud.google.com/translate/.
AGDLI: ArCo, GVP and DBpedia Linking Initiative
AAT
gvp:Concept DCCD
ODRF
*
skos:subject
dc:date
type label date
dc:type rdfs:label
dc:date
arco-arco:CulturalProperty
clvapit:hasGeometry arco-location:hasCulturalPropertyAddress arco-cd:hasAuthorshipAttribution
clvapit:Geometry clvapit:Address arco-cd:AuthorshipAttribution
arco-location:hasCoordinates clvapit:hasCity arco-cd:hasAttributedAuthor
arco-location:Coordinates clvapit:City l0:Agent
arco-location:long rdfs:label rdfs:label
arco-location:lat
lat long city author
Prefixes
arco-arco:
arco-location:
arco-cd: owl:sameAs
owl:sameAs owl:sameAs
clvapit:
dbpedia:
dc:
gvp:
owl:Class gvp:PhysPlaceConcept gvp:PersonConcept
l0:
owl:
rdfs: DBpedia ULAN
TGN
skos:
Fig. 1. A diagram representing an excerpt of ArCo ontology (green boxes) and the links
to external classes and properties (yellow boxes) our initiative is aimed to generate.
2. we applied standard text pre-processing techniques (e.g., tokenization, low-
ercasing) to ArCo entities’ textual descriptions. To this end, we adopted the
Stanford NLP API10 ;
3. we automatically collected all the occurrences of Italian AAT terms in ArCo
entities’ rdfs:label and dc:type resulting preprocessed textual properties. In
this step, for each ArCo’s entity, we obtained a collection of links with AAT
concepts. As a result, we obtained a collection of ambiguous links to all the
AAT concepts having the same skosxl:literalForm 11 (see the example of as
described in Figure 2,).
4. since these tasks are error-prone, we performed a manual refinement of the
translated AAT ’s terms, fixing translation errors and adding synonyms, sin-
gular, plural and hypernymous forms for terms occurring in the the textual
properties of missing linked ArCo’s entities;
5. we repeated steps 3 and 4 until an adequate coverage was reached.
10
https://nlp.stanford.edu/software/.
11
https://www.w3.org/TR/skos-reference/skos-xl.html#literalForm.
S. Faralli, A. Lenzi and P. Velardi
Fig. 2. An example of automatically mined and linked concepts form the
rdfs:label and the dc:type properties of the arco-arco:CulturalProperty described
at https://w3id.org/arco/resource/HistoricOrArtisticProperty/1500409235. Note that
both the singular form Italian words ”figura” (figure) and ”statua” (statue) were cor-
rectly linked to the corresponding English plural forms ”figures” and ”statues”.
To link ArCo’s cities to TGN and DBpedia we performed string matching12 with
the corresponding terms and entities. At the time of writing, we are investigating
on effective linking methodologies of ArCo’s I0:Agents with ULAN entities, and
on dc:date normalization.
3 Current Outcomes and Conclusions
In this paper, we introduced the AGDLI initiative. As a result, we obtained13 :
– the automatic translation in Italian of the 55K AAT terms;
– a total of 5.6 M triples (skos:relatedMatch and skos:related ) linking the 98.2%
(by dc:type) and the 99.9% (by rdfs:label ) of arco-arco:CulturalProperty en-
tities to candidate AAT concepts;
– a total of 6.6 K triples (skos:relatedMatch) linking the 86.3% of clvapit:City
instances to candidate TGN entities; iv) 4.7 K novel owl:sameAs relations,
now linking the 100% clvapit:City to DBpedia.
12
We applied different similarity measures e.g., string edit distance-based similarity.
13
Resources are available under Creative Commons Attribution 4.0 International (CC
BY 4.0) at https://sites.google.com/unitelmasapienza.it/agdli/.
AGDLI: ArCo, GVP and DBpedia Linking Initiative
As already introduced in Section 2, the next planned activities are aimed at both
linking ArCo’s authorship attributions to ULAN entities and normalizing the
CulturalProperty’s dc:date.
Moreover, we are planning to apply semi-supervised methodologies for the
disambiguation of the generated candidate links. For instance, generated links to
AAT concepts can be refined with semi-supervised word sense disambiguation
approaches, while the generated matches with TGN candidates can be disam-
biguated based on the distance between the geographical coordinates of ArCo
and TGN entities.
Further plans of the AGDLI initiative include, among others, the applica-
tion and investigation of knowledge graph completion methodologies [3] to link
isolated (unmatched) entities of the resulting graph, and the adoption of best
practices for continuous resource maintenance and deployment.
Acknowledgements
This work was carried out within the research project ”SMARTOUR: intelligent
platform for tourism” funded by the Italian Ministry of University and Research
with the Regional Development Fund of European Union (PON Research and
Competitiveness 2007-2013).
References
1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia:
A nucleus for a web of open data. In: The Semantic Web. pp. 722–735. Springer
Berlin Heidelberg, Berlin, Heidelberg (2007). https://doi.org/10.1007/978-3-540-
76298-0 52
2. Carriero, V.A., Gangemi, A., Mancinelli, M.L., Marinucci, L., Nuzzolese, A.G.,
Presutti, V., Veninata, C.: Arco: The italian cultural heritage knowledge graph.
In: The Semantic Web - ISWC 2019 - 18th International Semantic Web Con-
ference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part II.
Lecture Notes in Computer Science, vol. 11779, pp. 36–52. Springer (2019).
https://doi.org/10.1007/978-3-030-30796-7 3
3. Chen, Z., Wang, Y., Zhao, B., Cheng, J., Zhao, X., Duan, Z.: Knowl-
edge graph completion: A review. IEEE Access 8, 192435–192456 (2020).
https://doi.org/10.1109/ACCESS.2020.3030076
4. Harpring, P.: Development of the getty vocabularies: Aat, tgn, ulan, and cona. Art
Documentation: Journal of the Art Libraries Society of North America 29(1), 67–72
(2010), http://www.jstor.org/stable/27949541
5. Ruta, M., Scioscia, F., De Filippis, D., Ieva, S., Binetti, M., Di
Sciascio, E.: A semantic-enhanced augmented reality tool for open-
streetmap poi discovery. Transportation Research Procedia 3, 479–
488 (2014). https://doi.org/https://doi.org/10.1016/j.trpro.2014.10.029,
https://www.sciencedirect.com/science/article/pii/S2352146514001926, 17th
Meeting of the EURO Working Group on Transportation, EWGT2014, 2-4 July
2014, Sevilla, Spain
6. Zhang, Q., Lu, J., Jin, Y.: Artificial intelligence in recommender systems. Complex
& Intelligent Systems 7(1), 439–457 (Feb 2021)