=Paper=
{{Paper
|id=Vol-2769/51
|storemode=property
|title=The Archaeo-Term Project: Multilingual Terminology in Archaeology
|pdfUrl=https://ceur-ws.org/Vol-2769/paper_51.pdf
|volume=Vol-2769
|authors=Giulia Speranza,Raffaele Manna,Maria Pia Di Buono,Johanna Monti
|dblpUrl=https://dblp.org/rec/conf/clic-it/SperanzaMBM20
}}
==The Archaeo-Term Project: Multilingual Terminology in Archaeology==
The Archaeo-Term Project: Multilingual Terminology in Archaeology
Giulia Speranza, Raffaele Manna, Maria Pia di Buono, Johanna Monti
UniOr NLP Research Group
“L’Orientale” University of Naples
Italy
{gsperanza, rmanna, mpdibuono, jmonti}@unior.it
Abstract dactic and informative functions of communica-
tion (Cortelazzo, 1994).
In this paper, we present the Archaeo- The language used in the domain of Cultural Her-
Term Project, along with one of its first itage (CH), and its sub-domains, such as Archae-
efforts in enhancing multilingual access ology, shares many points with other LSPs, such
to Archaeological data, making avail- as the presence of technical terminology, terms of
able a resource of Archaeological terms Greek and Latin origins, re-semantisation of com-
within the framework of YourTerm CULT mon words into specialised domains of knowl-
project. In order to enhance and pro- edge, complex multiword expressions, to mention
mote the use of a terminological com- a few. Nonetheless, it has been traditionally less
mon ground across different languages the investigated if compared to, for example, the lan-
Archaeo-Term multilingual Glossary is in- guage of medicine or law, which are considered
tended both for scholars, experts in the soft disciplines too. As a consequence, except
field, translators and the general public. for a few felicitous examples (see Section 2), lan-
Its first release contains terms in Italian, guage resources and especially terminological re-
English, German, Spanish and Dutch to- sources, in this domain, are still needed.
gether with PoS, definitions and other lin- Language resources such as glossaries, thesauri,
guistic information. This paper presents dictionaries and term-banks are invaluable sources
the data and the methodology adopted to for language experts, translators, learners, among
create the glossary as well as the evalua- others. Their development can often be demand-
tion of the first results. ing and time-consuming, especially when carried
out manually.
1 Introduction Specialised domain resources are even more chal-
lenging because their creation also needs the vali-
Languages for Special Purposes (LSP) have their
dation of experts in the domain of knowledge.
roots in the need of communicating specialised
In this paper we present our work aimed at the cre-
and technical knowledge within a restricted group
ation of a multilingual glossary of archaeological
of domain experts.
terms, which is useful in many application sce-
From a linguistic perspective, LSP are mainly
narios from Machine Translation (MT) to Natural
characterised by the use of specialised terminol-
Language Processing (NLP).
ogy, which is usually monosemous for the princi-
The remainder of the paper is organized as fol-
ple of clearly defining concepts and avoiding mis-
lows: Section 2 describes related work and, fol-
communication and can often result opaque and
lowing this, Section 3 presents the Archaeo-Term
unintelligible to laypeople (Gotti, 2008; Cabré,
Project’s aims and the creation of the multilingual
1999; Faber and Rodrı́guez, 2012; Crystal, 1997).
glossary of archaeological terms, along with the
In fact, for these reasons, it is often necessary
description of the starting data used so far, namely
to modulate specialised languages when both oral
the ICCD Thesaurus, and the methodology ap-
and written communication takes place between
plied to extract multilingual data from the Getty
expert and non-experts, in order to ease the di-
AAT. To complete this section, we illustrate the
Copyright c 2020 for this paper by its authors. Use first results together with their evaluation. Finally,
permitted under Creative Commons License Attribution 4.0 the paper ends with the conclusions and the future
International (CC BY 4.0).
work. Catalogo e La Documentazione - ICCD) 6 .
The ICCD has also started, in 2017, the ArCo
2 Related Work project7 together with l’Istituto di Scienze e Tec-
nologie della Cognizione (ISTC) del CNR, in or-
Terminology, as several scholars pointed out der to make available data from the General Cat-
(Wright et al., 2010; Melby, 2012), may some- alogue of Cultural Heritage according to the LOD
times result in a heterogeneous activity involv- principles (Carriero et al., 2019b; Carriero et al.,
ing different formats, data models and practices; 2019a).
therefore, in order to support the sharing and the Some glossaries are also released by the muse-
reuse of terminological resources, several standard ums or cultural institutions such as the British Mu-
formats have been developed, such as TermBase seum’s Object Names Thesaurus8 .
eXchange (TBX) (Melby, 2015). In the field of Cultural Heritage in general, and
More recently, with the spreading of the Seman- particularly, in archaeology, it is worth mention-
tic Web Technologies, many language resources ing the ARIADNE Project (Meghini et al., 2017)
are being released in compliance with the Linked which provides a portal for the collection of data
Open Data (LOD) principles, using formalisms and resources in order to overcome the fragmen-
such as SKOS and Ontolex-Lemon, which are tation of archaeological data repositories of all
based on the Resource Description Framework types.
(RDF), for representing glossaries, vocabularies
and taxonomies (Chiarcos et al., 2013). 3 Archaeo-Term project
In the field of CH some language resources have
been released during the years, both monolingual The Archaeo-Term project of the UNIOR NLP
and multilingual. Among the multilingual re- Research Group9 of the University of Naples
sources, the most referred one in this domain is the “L’Orientale” is part of the YourTerm CULT
Art & Architecture Thesaurus (AAT)2 , developed initiative10 in partnership with the Terminology
and maintained by The Getty Research Institute. Without Borders program fostered by the Ter-
It is a multilingual thesaurus used to describe art, minology Coordination Unit (TermCoord)11 of
architecture, decorative arts, material culture, and the European Parliament’s Directorate-General for
archival materials, which can be accessed through Translation (DG TRAD). Among the different
a web interface or via its LOD version (JSON, projects, YourTerm CULT is specifically designed
RDF, N3/Turtle, N-Triples), as well as XML and to operate in all aspects of culture.
relational tables. The Archaeo-Term project has been launched to
Another multilingual terminological project on fill the gap in an important field which takes us
CH is the iDAI.vocab3 , a controlled vocabu- back to the roots of European culture and history,
lary specifically designed for archaeological terms namely Archaeology.
available in several languages, developed by the The project aims at improving the accessibility of
German Archaeological Institute (DAI). the archaeological information available in vari-
Many other glossaries and thesauri have been cre- ous sources (scientific papers, texts addressed to
ated as monolingual resources for cataloguing pur- general audiences, web sites, structured databases,
poses. Such as the vocabularies developed by the etc.) by creating language resources useful to NLP
FISH (Forum on Information Standards in Her- and MT tasks across languages. This will ease the
itage)4 and maintained as LOD resources by the availability of the information that can be used to
Heritage Data5 for English, or the thesauri and structure and connect different types of knowledge
controlled vocabularies developed by the Italian bases together, both structured databases and un-
Institute for Cataloguing (Istituto Centrale per Il 6
http://www.iccd.beniculturali.it/it/
strumenti-terminologici
2 7
https://www.getty.edu/research/tools/ http://stlab.istc.cnr.it/stlab/
vocabularies/aat/about.html project/arco/
3 8
https://archwort.dainst.org/it/vocab/ http://terminology.collectionstrust.
index.php org.uk/British-Museum-objects/
4 9
http://www.heritage-standards.org.uk/ https://sites.google.com/view/
terminology/ unior-nlp-research-group
5 10
https://www.heritagedata.org/blog/ https://yourterm.org/yourterm-cult/
11
vocabularies-provided/ https://termcoord.eu/
structured text collections. The exploitation of the ICCD resource to read
Indeed, although some scientific communities felt URIs pointing to Getty AAT contributes to build
the need to structure their knowledge by means our multilingual glossary of archaeological terms
of thesauri or ontologies, the scenario is still very along with the corresponding definitions and
fragmented as posed by Felicetti et al. (2018). sources in other languages, namely English,
Nowadays, European archaeological documenta- Spanish, German and Dutch. Among the many
tion consists of a multifaceted series of informa- languages available in the Getty AAT, we decide
tion, produced in different and independent ways to use for our glossary those mentioned above
by each of the various national and international since they show the best coverage in terms of
institutions active in this discipline, by means of linguistic equivalence (translations) starting from
tools and methods that are often very different the Italian terms in the ICCD thesaurus.
from each other. Thus, there is still the need to In order to perform this, we use the Getty AAT
establish a terminological common core shared SPARQL Endpoint14 to access term related infor-
across languages. mation by means of setting queries. In detail, the
In this scenario, the Archaeo-Term project tries querying process consists of a matching operation
to contribute to the improvement of scientific co- between the results of integrated queries in the
operation and advancements by attracting both AAT SPARQL Endpoint.
academia and museums from different countries in We first use a query capable of parsing the ICCD
the creation of a wide multilingual terminological resource and reading each URI which refers to
resource in Archaeology. With this aim in mind, the corresponding English archaeological term.
one of the first results of this project is a multi- In fact, in the Getty AAT, English terms and
lingual Glossary of archaeological terms which is other available corresponding terms in different
mainly useful for the multilingual digitalisation ef- languages are represented as equivalent terms
forts of the museums, but also to scholars, transla- by means of the skos:prefLabel property15
tors and the general public. and as alternative terms in skos:altLabel
property16 . Both properties carry one lexical
3.1 Data and Methodology value and one language tag, associated with the
For the creation of the Archaeo-Term multilingual lexical value, for each URI.
glossary, we start from the RDF/SKOS version Since we try to extract corresponding terms in dif-
of the Italian ICCD Thesaurus12 , one of the ferent languages, we then perform a further query
best practices adopted by the Italian Ministry of able to extract archaeological equivalent terms
Cultural Heritage (MiBAC) to publish institu- along with their language tags and alternative
tional information as LOD, in order to be easily terms along with language tags for each available
findable, reused and freely shared. It contains language per URIs.
1,059 Italian terms which are linked to the LOD In addition to this, we set another query able to
version of the Getty AAT13 , by means of the read URIs and collect corresponding definitions
skos:closeMatch property pointing to the and sources along with their language tags, (both
Getty URIs (Figure 1). This property is used contained in the skos:scopeNote property)17 .
to link two similar concepts that can be used As a first result of such a query looping over ICCD
interchangeably in some information retrieval URIs, we collect archaeological terms, definitions
applications (Cfr. SKOS Recommendation 18 Au- and sources. These queries guarantee the exploita-
gust 2009). We choose to extract the information tion of the Getty AAT resource but, regardless of
stored into the Getty AAT because it is a valuable the language tags, also a combination of each term
and trustworthy resource, created by experts in value associated with each definition and source
the field. value (present in the skos:scopeNote).
12
https://github.com/ICCD-MiBACT/ 14
Standard\-catalografici/blob/ http://vocab.getty.edu/sparql
15
master/strumenti-terminologici/beni\ https://www.w3.org/2012/09/odrl/
%20archeologici/ICCD\_Thesaurus\ semantic/draft/doco/skos_prefLabel.html
16
_definizione\%20del\%20bene\_reperti\ https://www.w3.org/2012/09/odrl/
%20archeologici.rdf semantic/draft/doco/skos_altLabel.html
13 17
For the mapping process see the ARIADNE project de- https://www.w3.org/2012/09/odrl/
scribed in Felicetti et al. (2015) semantic/draft/doco/skos\_scopeNote.html
To the best of our knowledge, in the AAT Terms, Alternative Terms Qualifier, Defini-
we did not find a direct link between the tion and Source) as shown in figure 2.
different language terms values (stored in
skos:prefLabel and skos:altLabel and • a multilingual synoptic table contains all the
the different language literal values (definitions languages singular terms, which are linked to
and sources in skos:scopeNote) represented one another by means of the IDs. This multi-
for the same URI. Therefore, to build our multi- lingual table aims at providing a comprehen-
lingual glossary we rely on a matching operation sive overview on the equivalent terms across
between URIs and language tags related to term the languages.
values (represented in skos:prefLabel and During the evaluation phase, we noticed that 9 Ital-
skos:altLabel), definitions and sources ian terms had two equivalent English terms in the
(both represented in skos:scopeNote). Getty AAT, marked by two closeMatch URIs
In particular, starting from a combination of all to the AAT instead of just one.
term values and literal values (definitions and A manual evaluation revealed that one URI leads
sources) per language present for an URI, we to a more generic term and the other one to a more
apply a matching operation able to select only specific term. For example the Italian term letto is
the terms, definitions and sources concerning the linked both to the Getty AAT ‘Bed’ (generic) and
same language based on the reference URI. This to ‘Canopy Bed’ (specific). In these cases, instead
matching operation allows us to recognise and of following the URI pointing to the specific refer-
organise archaeological terms and their literal ence, we choose to follow the most generic one, in
values, that is definitions and sources, pertaining accordance with the Italian term meaning. We opt
to the same language for each archaeological term for a manual evaluation due to the low presence of
identified by URI. this phenomenon, but, alternatively, it could have
been performed automatically making use of an
external resource such as a dedicated dictionary.
3.2 Results and Evaluation
Furthermore, the evaluation phase revealed a dif-
Once the queries steps are performed, we first re- ference in the granularity of terms between the
place retrieved URIs with numeric IDs in order Italian ICCD Thesaurus and the other languages
to provide an identification code for each entry of coming from the Getty AAT. Indeed, while the
our glossary; then we build monolingual tables for Italian terms result to be highly specific and
each language mentioned above and a multilingual fine-grained, many equivalents in the other lan-
synoptic table. guages are more in a relation of hyperonymity/hy-
For monolingual tables, we automatically classify ponymity. For example, in the Italian Thesaurus
in separated tables all retrieved data based on the there are several semantically and linguistically
language tag for each term entry. On the other different types of relieves: their meanings change
hand, we align the terms in the different languages according to the following adjectives (e.g., Rilievo
based on the shared ID to build the multilingual + storico, funerario, votivo, could be in English
synoptic table. historical, funerary, votive + Relief). Nonethe-
In detail, the Glossary first release18 is organised less, the retrieved equivalent in English extracted
as follows: from the Getty AAT is always ‘Relief’, as well
• For each language forseen in the glos- as in Spanish is always ‘Relieve’ and in Dutch is
sary (Italian, English, Spanish, German and ‘Reliëf’.
Dutch) there is a dedicated monolingual ta- Finally, some terms in the different languages, as
ble, named after the corresponding language well as some definitions, are missing and we plan
locale (e.g., IT for Italian, EN for English) to implement the missing fields in the future. Ta-
which contains 8 fields (ID, Singular Term, ble 1 shows the total number of terms for each
Plural Term, Qualifier19 , PoS, Alternative language in the terminological database. Missing
fields are due to data sparsity, since for each Ital-
18
https://drive.google.com/file/d/ ian term there are not always equivalent terms in
1cKvZPd6bdh7lrZ6plj1gGKatWvopqFo4/view
The Glossary is released under Attribution-NonCommercial cates the subfield the term belongs to, thus allowing the dis-
4.0 International (CC BY-NC 4.0) ambiguation in case of homographs (e.g. Ax (weapon) vs. Ax
19
The ‘Qualifier’ field, enclosed between brackets, indi- (tool))
stamnos
stamnos
Recipiente capace, col collo breve, corpo espanso, a
lte spalle, due anse quasi orizzontali e spesso fornito di coperchio; serviva per contene
re olio, vino e anche monete.
Vedi anche: Vasellame metallico - ICCD [In rete]
iccd.beniculturali.it/getFile.php?id=179 (05 marzo 2018); Dizionario oggetto (OGTD-
OGTT): Vetri [In rete] iccd.beniculturali.it/getFile.php?id=175 (05 marzo 2018)
Immagine tratta da: http://www.metmuseum.org/toah
/images/h2/h2_06.1021.178.jpg
Figure 1: Sample of the Italian term entry “stamnos” in the ICCD RDF/SKOS formalism.
Alternative
Singular Alternative
ID Plural Term Qualifier PoS Terms Definition Source
Term Terms
Qualifier
18 patera patterae (container) Noun pateras (containers) Ancient Roman containers in the form of a Legacy Art & Architecture Thesaurus
shallow bowl without handles, often with a (AAT) data. Compiled without citing
base whose center is pushed up into the sources. Warranted by AAT staff.
body; used for offering libations at religious 1983-1995.
ceremonies or for drinking. For similar
ancient Greek containers, use "phialae."
140 fish plate fish plates (ancient NP (Noun fish-plates (ancient Plates of a special form used by the ancient J. Paul Getty Museum. [online] Los
dish) + Noun) dishes) Greeks, having a central depression and Angeles: J. Paul Getty Trust, 2-.
sometimes a turned-down rim, used for http://www.getty.edu/art/collections/
serving fish. The central depression was (1 January 23).
used to collect the juice or sauce in which the
fish was served. […]
186 tympanum tympanums (wall Noun tympan (wall Architectural elements comprising stone or Harris, Cyril M., ed. Dictionary of
component) component) masonry enclosed by an arch, usually Architecture and Construction. New
supported by a lintel. Tympana are normally York: McGraw-Hill Book Co., 1975. |
set above doors, but also occur in windows Grove Art Online. Oxford University
and wall arcades. They may be ormamented Press, 28-.
with sculptural or painted decoration. http://www.oxfordartonline.com (1
July 28).
521 aryballos aryballoi (Greek Noun aryballas | (Greek Relatively small ancient Greek vessels with a Cook, R. M. Greek Painted Pottery.
vessels) aryballes | vessels) globular body, a short neck, a flat disk- London: Methuen and Co., Ltd.,
aribalos | shaped mouth with a small orifice, and a 1966.
aribalo handle (or sometimes two) extending from
the shoulder to the rim; used for holding oils,
perfumes, and ointments. They are usually
made of terracotta. Uses of the aryballoi
included in funeral rituals and by athletes who
wore them on their wrists, suspended by
thongs or strings.
Figure 2: Example of the English monolingual table.
all the other languages. and by POR Campania FSE 2014-2020 “Dottorati
di Ricerca a Caratterizzazione Industriale”.
We would like to thank Michele Stefanile for
Language Terms
his support as expert in the domain of Archaeol-
Italian (IT) 1059
ogy. Authorship Attribution is as follows: Giu-
English (EN) 1026
lia Speranza is author of Section 2 and 3.2, Raf-
Dutch (NL) 900
faele Manna is author of Section 3.1, Maria Pia di
Spanish (ES) 593
Buono is author of Section 1 and Johanna Monti
German (DE) 376
is author of Section 3 and 4.
Table 1: Number of terms for each language in the
termbase. References
Maria Teresa Cabré. 1999. Terminology: Theory,
4 Conclusions and future works methods, and applications, volume 1. John Ben-
jamins Publishing.
In this paper we present our Archaeo-Term Project
Valentina Anita Carriero, Aldo Gangemi, Maria Letizia
aimed at the creation of a multilingual glossary Mancinelli, Ludovica Marinucci, Andrea Giovanni
on archaeology. The Glossary is the result of an Nuzzolese, Valentina Presutti, and Chiara Veninata.
extraction and merging process from two already 2019a. Arco ontology network and lod on italian
available resources released according to the RDF cultural heritage. In ODOCH@ CAiSE, pages 97–
102.
Data Model, namely the RDF/SKOS version of the
Italian ICCD Thesaurus and the LOD version of Valentina Anita Carriero, Aldo Gangemi, Maria Letizia
the multilingual Getty AAT. Mancinelli, Ludovica Marinucci, Andrea Giovanni
The Archaeo-Term glossary is an ongoing project Nuzzolese, Valentina Presutti, and Chiara Veninata.
2019b. Arco: The italian cultural heritage knowl-
which will address, as future steps, the comple- edge graph. In International Semantic Web Confer-
tion of missing data (terms, definitions, correspon- ence, pages 36–52. Springer.
dences, examples, etc.) for English, Dutch, Span-
ish and German, as well as the enlargement of the Christian Chiarcos, John McCrae, Philipp Cimiano,
and Christiane Fellbaum. 2013. Towards open
glossary on the basis of the semi-automatic ex- data for linguistics: Linguistic linked data. In New
traction of terminology from specialised corpora Trends of Research in Ontologies and Lexical Re-
and other existing glossaries for the languages cur- sources, pages 7–25. Springer.
rently foreseen.
Michele Cortelazzo. 1994. Lingue speciali. La dimen-
Furthermore, we also plan to implement the sione verticale, Padova.
glossary with other languages such as French,
Swedish, Chinese and Russian. David Crystal. 1997. The cambridge encyclopedia of
language, wyd. 2. New York.
As future work we also plan to convert the result
of Archaeo-Term project into more formalised for- Pamela Faber and Clara Inés López Rodrı́guez. 2012.
mats, i.e., both TBX format (TermBase eXchange) 2.1 terminology and specialized language. A cogni-
to be used in connection with CAT-Tools and tive linguistics view of terminology and specialized
language, 20:9.
Ontolex-Lemon Model (McCrae et al., 2017), fol-
lowing the Linguistic Linked Open Data (LLOD) Achille Felicetti, Ilenia Galluccio, Cinzia Luddi,
principles. Maria Letizia Mancinelli, Tiziana Scarselli, and An-
tonio Davide Madonna. 2015. Integrating termino-
Finally, when we achieve a more complete version
logical tools and semantic archaeological informa-
of the glossary we plan to publish it also on a Re- tion: the iccd ra schema and thesaurus. In EMF-
search Infrastructure Repository such as CLARIN. CRM@ TPDL, pages 28–43.
Acknowledgments Achille Felicetti, Daniel Williams, Ilenia Galluccio,
Douglas Tudhope, and Franco Niccolucci. 2018.
This work has been partially supported by Pro- Nlp tools for knowledge extraction from italian ar-
gramma Operativo Nazionale Ricerca e Inno- chaeological free text. In 2018 3rd Digital Her-
itage International Congress (DigitalHERITAGE)
vazione 2014-2020 - Fondo Sociale Europeo, held jointly with 2018 24th International Conference
Azione I.2 “Attrazione e Mobilità Internazionale on Virtual Systems & Multimedia (VSMM 2018),
dei Ricercatori” Avviso D.D. n 407 del 27/02/2018 pages 1–8. IEEE.
Maurizio Gotti. 2008. Investigating specialized dis-
course. Peter Lang.
John P McCrae, Julia Bosque-Gil, Jorge Gracia, Paul
Buitelaar, and Philipp Cimiano. 2017. The ontolex-
lemon model: development and applications. In
Proceedings of eLex 2017 conference, pages 19–21.
Carlo Meghini, Roberto Scopigno, Julian Richards,
Holly Wright, Guntram Geser, Sebastian Cuy, Johan
Fihn, Bruno Fanini, Hella Hollander, Franco Nic-
colucci, et al. 2017. Ariadne: A research infras-
tructure for archaeology. Journal on Computing and
Cultural Heritage (JOCCH), 10(3):1–27.
Alan K Melby. 2012. Terminology in the age of mul-
tilingual corpora. The Journal of Specialised Trans-
lation, 18:7–29.
Alan Melby. 2015. Tbx: A terminology exchange
format for the translation and localization industry.
201), Handbook of Terminology, pages 393–424.
Sue Ellen Wright, Nathan Rasmussen, Alan K Melby,
and L Warburton. 2010. Tbx glossary: a crosswalk
between termbase and lexbase formats. In Proceed-
ings of developing, updating and coordinating tech-
nologies, dictionaries and lexicons for terminologi-
cal consistency workshop.