Creating a Multilingual Terminological Resource using Linked Data: the case of Archaeological Domain in the Italian language Giulia Speranza, Carola Carlino Sina Ahmadi UNIOR NLP Research Group Insight Centre for Data Analytics University of Naples “L’Orientale” National University of Ireland Naples, Italy Ireland, Galway {gsperanza,ccarlino}@unior.it sina.ahmadi@insight-centre.org Abstract Given the expanding amount of cultural data on the Semantic Web and a plethora of publicly- available resources in various languages as Linked English. The lack of multilingual ter- Open Data (LOD), the Web provides solutions minological resources in specialized do- for enhancing multilingualism in terminologies mains constitutes an obstacle to the ac- (Brugman et al., 2008). Nowadays, many cess and reuse of information. In the Collaboratively-Constructed Resources (CCRs), technical domain of cultural heritage and, or Collaborative Knowledge Bases (CKBs), such in particular, archaeology, such an ob- as Wiktionary1 and Wikipedia2 , are created by de- stacle still exists for Italian language. centralized communities of volunteers in different This paper presents an effort to fill this domains. gap by collecting linguistic data using CCRs differ from Linguistic Knowledge Bases existing Collaboratively-Constructed Re- (LKBs), such as WordNet (Miller, 1995) and sources and those on the Web of linked FrameNet (Baker et al., 1998), which are instead data. The collected data are then used created by experts in specific fields with higher to linguistically enrich the ICCD Archae- quality control. Some scholars, such as Müller and ological Finds Thesaurus– a monolingual Gurevych (2008) and Hovy et al. (2013), pointed Italian thesaurus. Our terminological re- out several weaknesses of LBKs such as the low source contains 446 terms with trans- coverage of domain-specific vocabulary, restric- lations in four languages and is pub- tion to common vocabulary and the difficulty in licly available in the Resource Description continuous maintenance resulting out-dated data. Framework (RDF) in the Ontolex-Lemon Moreover, despite the application of CCRs in model. various natural language processing (NLP) tasks (Zesch et al., 2008; Nakayama et al., 2008; 1 Introduction Meyer and Gurevych, 2012), processing heteroge- neous and often unstructured data linguistically re- Multilingual domain-specific linguistic resources, quires syntactic, lexical and ontological informa- such as thematic dictionaries and terminologi- tion (Bouayad-Agha et al., 2012; Davies, 2009). cal resources (terminologies further in the text), This can be efficiently addressed thanks to the are knowledge repositories providing information current advances in applying computational tech- about terms and their semantic relationships in niques to the disciplines of the humanities, known a specific domain and across languages. Cur- as digital humanities (DH), and accessibility of rently, most European languages, including Ital- linguistic resources on the Web with movements ian, lack terminologies in the field of cultural her- such as the Linguistic Linked Open Data (LLOD) itage (Dong, 2017). With cultural heritage one de- (Chiarcos et al., 2013). fines the tangible and intangible objects that con- Regarding the field of cultural heritage, multi- stitute the culture of each society such as monu- lingualism is still a challenge due to the tendency ments but also songs, traditions and history (Do- of experts to store terminologies monolingually err, 2009). (Vavliakis et al., 2012). We investigated some on- Copyright ©2019 for this paper by its authors. Use per- 1 mitted under Creative Commons License Attribution 4.0 In- https://www.wiktionary.org/ 2 ternational (CC BY 4.0). https://www.wikipedia.org/ line multilingual terminologies such as the Getty plore a collection of documents related to a spe- Vocabularies3 (Baca and Gill, 2015) which con- cific domain. Project FREME (Dojchinovski et tains thesauri in art, architecture and cultural ob- al., 2016) is a framework for multilingual and se- jects, iDAI.vocab–the German Archaeological In- mantic enrichment of digital content where lin- stitute archaeological vocabulary4 , the UNESCO guistic linked open data workflows are used along Thesaurus5 , the European Heritage Network the- with linguistic and NLP ontologies. The Eu- sauri6 and the Loterre Controlled Vocabulary in art roTermBank project (Vasiljevs et al., 2008) aims and archaeology7 . Among these resources, only at improving the terminology infrastructure of the the Art & Architecture Thesaurus (AAT) by Getty European languages by creating a centralized on- and the iDAI.vocab are exploitable due to a partial line terminology bank and collecting terminolo- domain-specific similarity with our dataset; nev- gies from various European institutions to facili- ertheless, none of them provide lexicographic de- tate the production, use and distribution of digital scriptions of the terms. content and promote cultural diversity. In this paper, we propose an approach for semi- Dannélls et al. (2013) also focus on the do- automatically creating a multilingual terminology main of cultural heritage and use Wikipedia to re- in the technical domain of archaeology and cul- trieve translations for the task of text generation. tural heritage by enriching an existing Italian on- Dong (2017) uses three multilingual semantic re- tology with linguistic information. Our approach sources, GeoNames, DBpedia and Wiktionary, to can be applied to any domain and language. Our enrich English information for Chinese Genealog- case study is the archaeological thesaurus pro- ical Linked Data in the field of cultural heritage. vided by the Central Institute for Catalogue and Declerck et al. (2012) use Wiktionary to expand a Documentation (ICCD) for describing archaeo- taxonomy of folk catalogue in English with multi- logical finds in Italian (Felicetti et al., 2013). The lingual translations. enriched information are evaluated by annotators, Providing terminologies in Linked Data has and then converted into the Ontolex-Lemon model been also addressed by previous researchers. in the Resource Description Framework (RDF). Cimiano et al. (2015) present an approach for pub- Our resource provides linguistic information of lishing and linking terminological resources using 446 Italian terms with translations in four lan- linked data principles. They provide a service for guages. transforming term bases in TBX–TermBase eX- change, an open XML-based standard format for 2 Related Work terminological data, to RDF using lemon model. Leveraging resources on the Web for extracting Similarly, McCrae et al. (2011) show the conver- and processing information is a common practice sion of WordNet and Wiktionary data into Lemon in NLP tasks (Lin and Katz, 2003; Cucerzan and model. Sérasset et al. (2015) focused on creating Brill, 2004). Previous studies focusing on extract- a RDF Lemon-based multilingual resource with ing data from CCRs showed that this is a valu- data extracted from Wiktionary. able resource for collecting lexicographic data and promoting multilingualism (Kilgarriff and Grefen- 3 Case Study stette, 2001; Lin and Krizhanovsky, 2011). The dataset used in this study is the Italian ICCD Bourgonje et al. (2016) develop a platform “RA Thesaurus per la descrizione dei reperti for digital curation technologies using a Seman- archeologici” (en. RA Thesaurus for the de- tic Web layer which provides linguistic analysis scription of archaeological finds) published by the and discourse information. This platform allows ICCD (Istituto Centrale per il Catalogo e la Docu- knowledge experts to create digital content and ex- mentazione) in collaboration with the Italian Min- 3 https://www.getty.edu/research/tools/ istry of Cultural Heritage and Activities (MiBAC). vocabularies/ 4 https://archwort.dainst.org The ICCD Thesaurus (Mancinelli, 2014) is an 5 http://vocabularies.unesco.org/ open monolingual Italian vocabulary (last updated browser/thesaurus/en/ in 2014), which was created with the final aim of 6 https://www.coe.int/en/ regulating the terminology to be used to identify web/culture-and-heritage/ herein-heritage-network archaeological finds in Italy. In the ICCD The- 7 https://www.loterre.fr/skosmos/27X/ saurus different levels for the representation of the ara (n) Ornithology Astronomy Metrology Archaeology (Q44703) (Q333) (Q394) (Q23498) Figure 1: An example of the Italian word ara (n) which can appear in various terminological domains. terms are provided: the first level indicates the ob- jectile weapon, thus acquiring a totally different ject itself, e.g. colonna (en. column); other lev- new meaning. Despite being precise and unique els refer to the morphology which indicates the in their terminology, it is not rare to find homo- type and shape of the object, e.g. colonna dorica, graphs and polysemous words also in specialized (en. doric column), and part which specifies the jargons. For example the Italian word ara can be part of the object, e.g. base, capitello (en. base, found at least in four different domains (ornithol- capital). Furthermore, it is enriched with a short ogy, astronomy, metrology and archaeology) with description and sometimes images of the object different meanings but the same written form, as described. The ICCD Thesaurus is published as shown in Figure 1. LOD on a designed platform8 and can be accessed Furthermore, for the specialized domain of ar- through various formats. chaeology, many analogies with the anatomical Regarding archaeological finds, the Italian ter- parts of the human body are observed, e.g. col- minology in this field is composed of both tech- umn foot and neck-amphora. In linguistics and nical terms and common vocabulary from every- rhetoric, this phenomenon is a figure of speech day language. Technical terms may be perceived called catachresis, which is based on mixed as more or less technical on a continuum: there metaphoric and metonymic expressions which al- are technical terms which might be so frequent, low an economic reuse of a previous lexicon. also in the common vocabulary, that their meaning In order to further specify the morphology or is generally understood by the majority of literate the function of a cultural object, many multi- people, e.g. capitello (en. capital), altare (en. al- word expressions (MWEs), mostly composed of tar), and less frequent terms used and known only Noun+Preposition+Noun, are also used in the Ital- by experts in the field, e.g. acroterio (en. acro- ian terminology, e.g. altare a mensa. There are terion), archivolto (en. archivolt). On the other also many compounds such as semicolonna and hand, many common words are used to describe monoansata (respectively, half-column and one- archaeological finds, e.g. bottiglia (en. bottle), handled in English). In addition, a conspicuous collana (en. necklace), which, of course, sound part of domain-specific terminology comes both more comprehensible also to non-experts. from Greek and Latin words (e.g. rhyton, cingu- A jargon, such as the language of archaeology, lum) or presents Greek or Latin prefixoids which often reuse already-existing words instead of cre- contribute to make this specialized lexicon even ating ad hoc new terms, assigning them a different more difficult to understand and highly technical. meaning (Gotti, 1991; Scarpa, 2008; Gualdo and Finally, there are also some loan-words such as Telve, 2011). In fact, several examples of seman- menhir and applique which come from Breton and tic redeterminations were registered in the ICCD French. Thesaurus such as the word ghianda which comes 4 Methodology from a common vocabulary, where it has the gen- eral meaning of acorn, but, in the specialized do- Given a list of terms in the source dataset, we main, is used to identify a particular kind of pro- first retrieve those concepts to which the term is associated on Wikidata, i.e. concepts with 8 http://dati.beniculturali.it/ rdfs:label as a predicate and the term as an SPARQL List of terms Convert to OntoLex-Lemon Semi-automatically linguistically-enriched terminology Gold Filtering concepts Multilingual list of terms Figure 2: Terminological enrichment process object as follows: relevant to our terminological field, therefore se- lected in this step. Following the collection of the SELECT ?ConceptID { candidate concepts, we retrieve the labels of the ?ConceptID rdfs:label "T"@it. concepts in our target languages, namely, English, } French, German and Italian. The choice of the lan- where the ID of the concepts associated with the guages was dependent on our evaluation means. term T are returned. The retrieved terms are then enriched by linguis- Since a word can be used in various domains tic information from Wiktionary. This process is with different senses, it is possible to retrieve more illustrated in Figure 2. than one concept for a term. Therefore, the rel- evance of the retrieved concepts to our termino- 4.1 Conversion to OntoLex-Lemon logical field is examined based on the seman- In the recent years, there have been efforts to tic relationships, such as subclass-of, part-of and create specific data models providing support for instance-of, between the retrieved concepts and representing linguistic data on the Semantic Web. those to which we assume that the terms are as- The OntoLex-Lemon (McCrae et al., 2017) is a sociated. Such concepts, henceforth referred to as model based on the Lexicon Model for Ontologies gold concepts, are collected based on the knowl- (lemon) which provides rich linguistic ground- edge of the experts in the domain and manual col- ing for ontologies, such as representation of mor- lection from Wikidata. The SPARQL query for phological and syntactic properties of lexical en- this verification can be described as follows: tries. This model draws heavily on previous lexi- cal data models, particularly LexInfo (Cimiano et ASK { al., 2011), LIR (Montiel-Ponsoda et al., 2008) and wd:ConceptID (wdt:P361|wdt:P279| LMF (Francopoulo et al., 2006), with improve- wdt:P31)+ wd:GoldConceptID. ments such as being RDF-native, descriptive and } modular justifying its promising adaptability in where wd:ConceptID and linguistic resource management. wd:GoldConceptID refer to the ID of The previous step yields a tabular format of the retrieved concepts and the gold concepts, the lexicographic information, making it possible respectively. P279, P361 and P31 are the to convert the data semi-automatically into RDF Wikipedia properties for suclass-of, part-of and triples in OntoLex-Lemon. Figure 3 illustrates the instance-of properties on Wikidata. A list of equivalent of the Italian entry ascia in the output the gold concepts in the field of archaeology is terminology in RDF Turtle in Ontolex-Lemon. In provided in Appendix A. addition to the linguistic information, each entry is Filtering retrieved data from Wikidata enables linked to the original concept in the source dataset, us to disambiguate the terms based on the con- i.e. ICCD, using the skos:concept property. cepts. For instance, the Italian word calice ap- Similarly, the Wikipedia page describing the term pears as a label for several concepts such as wine is provided using ontolex:denotes property. glass, calyx and chalice, to which only the latter is In addition to OntoLex-Lemon core model, we :lexicon a lime:Lexicon; present outside Italy. On the hand, Wikidata is lime:entry :ascia ; constantly being enriched and may had incomplete lime:language . data when the queries were run. With respect to Wiktionary, among the retrieved terms, 26 terms :ascia a ontolex:LexicalEntry, were available without linguistic descriptions such ontolex:Word ; ontolex:canonicalForm :form_ascia ; as part-of-speech (PoS) tags and gender. We ob- rdfs:label "ascia"@it ; served that the majority of missing terms were lexinfo:partOfSpeech lexinfo:noun ; of Latin or Greek etymology. As Wiktionary is lexinfo:gender lexinfo:feminine . a Collaboratively-Constructed Resource, a man- :form_ascia a ontolex:Form ; ual verification and completion of the retrieved dct:language data was carried out. Some of the erroneous data ; ontolex:writtenRep "ascia"@it ; were due to homographs such as ancora and pol- lexinfo:number lexinfo:singular ; ysemous terms which may belong to more than ontolex:sense :ascia_n_sense ; one grammatical category, such as piatto meaning ontolex:denotes wd:Q2517447; ; “plate” as a noun while “flat” as an adjective. dct:subject wd:Q382995 ; owl:sameAs dati:009000000004 . 5 Conclusion :trans a vartrans:Translation ; vartrans:source :ascia_n_sense ; In this paper, we demonstrated the usage of LOD vartrans:target frl:fr_herminette_sense . and CCR in enriching terminological ontologies. As a case study, we used an ontology in Ital- ian in the field of cultural heritage and archaeol- Figure 3: The description of the term ascia in ogy to create multilingual terminologies. The re- Ontolex-Lemon sults of the manual evaluation and implementation process show that leveraging such resources is a used the following modules: valid option for enriching ontologies linguistically. Nonetheless, since CCRs are created by a commu- • Linguistic Metadata (lime) to describe nity effort, a manual verification was carried out metadata at the level of the lexicon-ontology for creating gold-standard datasets. interface with information such as lexical en- Finally, the effort of this study can be framed tries and language. within the more general context of contributing to the implementation and advancement of the mul- • Syntax and Semantics (synsem) enables us tilingual Web of Data and the LLOD movement. to describes syntactic behaviour. We use syn- The multilingual resource that we are proposing tactic frames to relate a lexical entry to one of can be used in several professional figures among its various syntactic roles, such as the canon- which lexicographers, translators, museum and ical form of the word ascia. exhibition experts, archaeologists and researchers. • Lexinfo (lexinfo) (Cimiano et al., 2011) Further experiments will concern retrieving for describing relevant linguistic categories MWEs as we have not included them in the cur- and properties, particularly part-of-speech rent study due to the scarce availability on Wiki- (POS), gender and number. data and Wiktionary. MWEs are a topic increas- ingly handled in NLP, and their processing is fun- • Variation and Translation (vartrans) is damental for NLP tasks ranging from POS tagging used to describe relations between lexical en- to Machine Translation to obtain better and more tries, particularly translations. reliable results (Monti et al., 2018). We are also in- terested in creating gold concepts more efficiently, Among the 4000 terms provided in the source particularly using topic modelling techniques, and dataset, i.e. the ICCD Thesaurus, only 446 terms integrating more resources, particularly Concept- could be retrieved from Wikipedia. This can be Net (Liu and Singh, 2004) which contains many due to the technicality of the source dataset which resources such as WordNets and DBpedia. is confined to Italian archaeological finds, there- This project is openly available at https:// fore describes cultural objects which might not be github.com/sinaahmadi/sparql4respop. Acknowledgments Rob Davies. 2009. Europeanalocal–its role in improv- ing access to Europes cultural heritage through the Euro- We want to thank SmartApps for providing useful mate- pean digital library. In Proceedings of IACH workshop at rial and information for the realization of this project. This ECDL2009 (European Conference on Digital Libraries), project has been partially supported by the PON Ricerca e In- Aarhus, September. novazione 2014/20 and the POR Campania FSE 2014/2020 funds. Sina Ahmadi is also supported by the European Thierry Declerck, Karlheinz Mörth, and Piroska Lendvai. Union’s Horizon 2020 research and innovation programme 2012. Accessing and standardizing wiktionary lexical en- under grant agreement No 731015. tries for supporting the translation of labels in taxonomies for digital humanities. In Proceedings of LREC. References Martin Doerr. 2009. Ontologies for cultural heritage. In Murtha Baca and Melissa Gill. 2015. Encoding multilingual Handbook on ontologies, pages 463–486. Springer. knowledge systems in the digital age: the getty vocabular- ies. NASKO, 42(4):232–243. Milan Dojchinovski, Felix Sasaki, Tatjana Gornostaja, Se- bastian Hellmann, Erik Mannens, Frank Salliau, Michele Collin F Baker, Charles J Fillmore, and John B Lowe. 1998. Osella, Phil Ritchie, Giannis Stoitsis, Kevin Koidl, The berkeley framenet project. In Proceedings of the 17th Markus Ackermann, and Nilesh Chakraborty. 2016. international conference on Computational linguistics- FREME: Multilingual semantic enrichment with linked Volume 1, pages 86–90. Association for Computational data and language technologies. In Proceedings of the Linguistics. Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 4180–4183, Por- Nadjet Bouayad-Agha, Gerard Casamayor, Simon Mille, torož, Slovenia, May. European Language Resources As- Marco Rospocher, Horacio Saggion, Luciano Serafini, and sociation (ELRA). Leo Wanner. 2012. From ontology to NL: Generation of multilingual user-oriented environmental reports. In Hang Dong. 2017. Enrichment of cross-lingual information International Conference on Application of Natural Lan- on Chinese genealogical Linked Data. iConference 2017 guage to Information Systems, pages 216–221. Springer. Proceedings Vol. 2. Peter Bourgonje, Julian Moreno-Schneider, Jan Nehring, Achille Felicetti, Tiziana Scarselli, Maria Letizia Mancinelli, Georg Rehm, Felix Sasaki, and Ankit Srivastava. 2016. and Franco Niccolucci. 2013. Mapping ICCD archaeo- Towards a platform for curation technologies: enriching logical data to CIDOC-CRM: the RA schema. A Mapping text collections with a Semantic Web layer. In European of CIDOC CRM Events to German Wordnet for Event De- Semantic Web Conference, pages 65–68. Springer. tection in Texts, 11. Hennie Brugman, Véronique Malaisé, and Laura Hollink. 2008. A common multimedia annotation framework for Gil Francopoulo, Monte George, Nicoletta Calzolari, Mon- cross linking cultural heritage digital collections. In ica Monachini, Nuria Bel, Mandy Pet, and Claudia So- 6th International Conference on Language Resources and ria. 2006. Lexical markup framework (LMF). In Evaluation (LREC 2008). International Conference on Language Resources and Evaluation-LREC 2006, page 5. Christian Chiarcos, Philipp Cimiano, Thierry Declerck, and John P McCrae. 2013. Linguistic linked open data Maurizio Gotti. 1991. I linguaggi specialistici: caratteris- (llod). introduction and overview. In Proceedings of tiche linguistiche e criteri pragmatici. La Nuova Italia. the 2nd Workshop on Linked Data in Linguistics (LDL- 2013): Representing and linking lexicons, terminologies Riccardo Gualdo and Stefano Telve. 2011. Linguaggi spe- and other language data, pages i–xi. cialistici dell’italiano. Carocci. Philipp Cimiano, Paul Buitelaar, John McCrae, and Michael Eduard Hovy, Roberto Navigli, and Simone Paolo Ponzetto. Sintek. 2011. Lexinfo: A declarative model for the 2013. Collaboratively built semi-structured content and lexicon-ontology interface. Web Semantics: Science, Ser- artificial intelligence: The story so far. Artificial Intelli- vices and Agents on the World Wide Web, 9(1):29–51. gence, 194:2–27. Philipp Cimiano, John P McCrae, Vı́ctor Rodrı́guez-Doncel, Adam Kilgarriff and Gregory Grefenstette. 2001. Web Tatiana Gornostay, Asunción Gómez-Pérez, Benjamin as corpus. In Proceedings of Corpus Linguistics 2001, Siemoneit, and Andis Lagzdins. 2015. Linked termi- pages 342–344. Corpus Linguistics. Readings in a Widen- nologies: applying linked data principles to terminological ing Discipline. resources. In Proceedings of the eLex 2015 Conference, pages 504–517. Jimmy Lin and Boris Katz. 2003. Question answering from Silviu Cucerzan and Eric Brill. 2004. Spelling correction the web using knowledge annotation and knowledge min- as an iterative process that exploits the collective knowl- ing techniques. In Proceedings of the twelfth international edge of web users. In Proceedings of the 2004 Conference conference on Information and knowledge management, on Empirical Methods in Natural Language Processing, pages 116–123. ACM. pages 293–300. Feiyu Lin and Andrew Krizhanovsky. 2011. Multilingual Dana Dannélls, Aarne Ranta, Ramona Enache, Mariana ontology matching based on Wiktionary data accessible Damova, and Maria Mateva. 2013. Multilingual access via SPARQL endpoint. arXiv preprint arXiv:1109.0732. to cultural heritage content on the Semantic Web. In Pro- ceedings of the 7th Workshop on Language Technology Hugo Liu and Push Singh. 2004. Conceptneta practical for Cultural Heritage, Social Sciences, and Humanities, commonsense reasoning tool-kit. BT technology journal, pages 107–115. 22(4):211–226. Maria Letizia Mancinelli. 2014. Strumenti termino- Appendix A logici. Scheda RA. reperti archeologici. thesaurus per la definizione del bene. introduzione e indicazioni per luso. ICCD - Servizio beni archeologici. architecture Q12271 John McCrae, Dennis Spohr, and Philipp Cimiano. 2011. archaeology Q10855079 Linking lexical resources and ontologies on the semantic artificial physical object Q8205328 web with Lemon. In Extended Semantic Web Conference, art Q735 pages 245–259. Springer. archaeological artifact Q220659 John P McCrae, Julia Bosque-Gil, Jorge Gracia, Paul Buite- architectural element Q391414 laar, and Philipp Cimiano. 2017. The Ontolex-Lemon model: development and applications. In Proceedings of architectural order Q217175 eLex 2017 conference, pages 19–21. container Q987767 Christian M Meyer and Iryna Gurevych. 2012. Wiktionary: vase Q191851 A new rival for expert-built lexicons? Exploring the pos- clothing in ancient Greece Q522648 sibilities of collaborative lexicography. na. clothing in ancient Rome Q2457980 George A Miller. 1995. Wordnet: a lexical database for en- tool Q39546 glish. Communications of the ACM, 38(11):39–41. roof tile Q268547 Johanna Monti, Ruslan Mitkov, Violeta Seretan, and Gloria religious object Q21029893 Corpas Pastor. 2018. Multiword units in machine trans- visual artwork Q4502142 lation and translation technology. In Ruslan Mitkov, Jo- hanna Monti, Violeta Seretan, and Gloria Corpas Pastor, costume accessory Q1065579 editors, Multiword units in machine translation and trans- sculpture Q860861 lation technology, pages 1–38. John Benjamins Publishing religious object Q21029893 Company. accessory Q362200 Elena Montiel-Ponsoda, Guadalupe Aguado De Cea, building component Q19603939 Asunción Gómez-Pérez, and Wim Peters. 2008. Mod- elling multilinguality in ontologies. Coling 2008: Com- bijou Q3575260 panion volume: Posters, pages 67–70. Table 1: Concepts used for disambiguation of Christof Müller and Iryna Gurevych. 2008. Using wikipedia Wikidata concepts (gold concepts) and wiktionary in domain-specific information retrieval. In Workshop of the Cross-Language Evaluation Forum for European Languages, pages 219–226. Springer. Kotaro Nakayama, Minghua Pei, Maike Erdmann, Masahiro Ito, Masumi Shirakawa, Takahiro Hara, and Shojiro Nishio. 2008. Wikipedia mining wikipedia as a corpus for knowledge extraction. Federica Scarpa. 2008. La traduzione specializzata. Un ap- proccio didattico professionale. Milano: Hoepli, 2nd edi- tion. Gilles Sérasset. 2015. Dbnary: Wiktionary as a lemon- based multilingual lexical resource in RDF. Semantic Web, 6(4):355–361. Andrejs Vasiljevs, Signe Rirdance, and Andris Liedskalnins. 2008. Eurotermbank: Towards greater interoperability of dispersed multilingual terminology data. In Proceedings of the First International Conference on Global Interoper- ability for Language Resources ICGL, pages 213–220. Konstantinos N Vavliakis, Georgios Th Karagiannis, and Per- icles A Mitkas. 2012. Semantic Web in cultural heritage after 2020. In Proceedings of the 11th International Se- mantic Web Conference (ISWC), Boston, MA, USA, pages 11–15. Torsten Zesch, Christof Müller, and Iryna Gurevych. 2008. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In LREC, volume 8, pages 1646–1652.