=Paper= {{Paper |id=Vol-3161/poster11 |storemode=property |title=Integrating Terminological and Ontological Principles into a Lexicographic Resource (poster) |pdfUrl=https://ceur-ws.org/Vol-3161/poster11.pdf |volume=Vol-3161 |authors=Rute Costa,Ana Salgado,Margarida Ramos,Fahad Kahn,Sara Carvalho,Toma Tasovac,Bruno Almeida,Mohamed Khemakhem,Laurent Romary,Silva Raquel |dblpUrl=https://dblp.org/rec/conf/mdtt/CostaSRKCTAKRS22 }} ==Integrating Terminological and Ontological Principles into a Lexicographic Resource (poster)== https://ceur-ws.org/Vol-3161/poster11.pdf
Integrating Terminological and Ontological Principles into
a Lexicographic Resource


           Rute Costa1, Ana Salgado 1, 2, Margarida Ramos 1, Fahad Khan3, Sara Carvalho1,4, Toma Tasovac 5,
           Bruno Almeida1,6, Mohamed, Khemakhem 7, Laurent Romary 8, Raquel Silva 1

1
  CLUNL – Centro de Linguística da Universidade Nova de Lisboa, Lisboa, Portugal
2
  Academia das Ciências de Lisboa, Lisboa, Portugal
3
  CNR – Istituto di Linguistica Computazionale “Antonio Zampollo” Pisa, Italy
4
  CLLC – Cetnro de Línguas. Literaturas e Culturas, Aveiro. Portugal
5
  BCDH – Belgrade Center for Digital Humanities, Belgrade. Serbia
6
  ROSSIO - ROSSIO Infrastructure - Social Sciences, Arts and Humanities, Lisboa, Portugal
7
  ArcaScience. Paris, France
8
  ALMAnaCH – Automatic Language Modelling and ANAlysis & Compuatational Humanities, INRIA, Paris,
France

                                  Abstract
                                  In this paper we will present the research that is taking place at the NOVA CLUNL1 where an
                                  international team is working on a financed project MORDigital2. MORDigital’s goal is to
                                  encode the selected editions of Diccinario de Lingua Portugueza by António de Morais Silva
                                  (MOR), first published in 1789.

                                  Keywords 3
                                  dictionary, lexicography, digital humanities, standards




1. Introduction

         MORDigital’s ultimate goals are, on the one hand, to promote accessibility to cultural heritage
while fostering reusability and, on the other hand, to contribute towards a more significant presence of
lexicographic digital content in Portuguese through open tools and standards. MOR represents a
significant legacy, since it marks the beginning of Portuguese dictionaries, having served as a model
for all subsequent lexicographic production. The team follows a new paradigm in lexicography, which
results from the convergence between lexicography, terminology, computational linguistics, and
ontologies as an integral part of digital humanities and linked (open) data. In the Portuguese context,
this research fills a gap concerning searchable online retrodigitised dictionaries, built on current


1 https://clunl.fcsh.unl.pt/grupos_clunl/lexicologia-lexicografia-terminologia/
2
    https://www.fct.pt/apoios/projectos/consulta/vglobal_projecto?idProjecto=164850&idElemConcurso=14818

1st International Conference on “Multilingual digital terminology today. Design, representation formats and
management systems”, June 16 – 17, Padova, Italy
EMAIL: rute.costa@fcsh.unl.pt (A. 1); anacastrosalgado@gmail.com (A. 2); fahad.khan@ilc.cnr.it (A. 3) ; mvramos@fcsh.unl.pt (A.4) ;
sara.carvalho@ua.pt (A.5) ; ttasovac@humanistika.org (A.6) ; brunoalmeida@fcsh.unl.pt (A.7); mohamed.khemakhem@inria.fr (A.8)
laurent.romary@inria.fr (A.9) ; (raq.silva@fcsh.unl.pt A.10)
ORCID: 0000-0002-3452-7228 (A. 1); 0000-0002-6670-3564 (A. 2); 0000-0002-1551-7438 (A. 3) ; 0000-0001-7209-3806 (A.4) ; 0000-
0002-7501-5405 (A.5); 0000-0002-3919-993X (A.6); 0000-0002-5777-5574 (A.7); 0000-0003-3529-2990 (A.8); 0000-0002-0756-0508
(A.9); 0000-0002-0505-4863 (A.10)
                               © 2022 Copyright for this paper by its authors.
                               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Wor
    Pr
       ks
        hop
     oceedi
          ngs
                ht
                I
                 tp:
                   //
                    ceur
                       -
                SSN1613-
                        ws
                         .or
                       0073
                           g
                               CEUR Workshop Proceedings (CEUR-WS.org)
standards and methodologies which promote data sharing and harmonisation, namely TEI Lex-04 and
Ontolex-Lemon5. The team will further ensure the connection to other existing systems and lexical
resources, particularly in the Portuguese-speaking world.
        For this paper, after posing the theoretical background (terminology and lexicography) that
/underpins our methodology, we will present 4 interrelated tasks:

           1. Structuration of MOR’s digitised versions using GROBID-Dictionaries6, a specific software
           for the parsing, extraction and structuring of information extracted from dictionary text. In our
           case, the tool will be used to parse the constituent parts of each dictionary entry, which involves
           the preparation of a native encoding format that is compliant with the XML/TEI metamodel.
           2. Presentation of a systematic analysis of the Mathematical Sciences and Medical Sciences
           domains, their related domain labels [6], [1] and other mechanisms, such as the use of formulae
           present in the definition which identifies the specialised field of knowledge. We will propose a
           hierarchical organisation that constitutes the foundation of domain ontologies.
           3. Representation of the model in OWL resorting to Protégé7, a free, open-source ontology
           editor. This means each class or individual in the ontology will be assigned a URI (Universal
           Resource Identifier), used to reference the label present in each of the lexicographic entries in
           accordance – whenever possible – with the TEI schemas.
           4. Conversion of the TEI Lex-0 output of Task 4 into linked data using the RDF-based model
           Ontolex-Lemon; the conversion will be based on work already carried out in the scope of
           previous initiatives in rendering the two models more interoperable. The Ontolex-Lemon model
           has recently been extended by a lexicography module – lexicog8 –, which facilitates
           interoperability in modelling dictionaries as linked data.

At the end of the paper, we will discuss the results, highlighting the challenges that we faced.


2. Acknowledgements

   This paper is supported by the MORDigital – Digitalização do Diccionario da Lingua Portugueza
de António de Morais Silva [PTDC/LLT-LIN/6841/2020] project financed by the Portuguese National
Funding through the FCT – Fundação para a Ciência e Tecnologia.

3. References

         [1] R. Costa, S. Carvalho, A. Salgado, A. Simões, T. Tasovac (2020). Ontologie des marques
             de domaines appliquée aux dictionnaires de langue générale, in [éditeur : Xavier Blanco] La
             lexicographie en tant que méthodologie de recherche en linguistique Revue de Philologie
             Française et Romane – Langue(s) & Parole, n. 5 . Mons: Edition du CIPA. pp. 201–230.
             ISSN papier 2466-7757, ISSN numérique 2684-6691.
         [2] R. Costa, A. Salgado, B. Almeida (2021). SKOS as a key element for linking lexicography
             to digital humanities. Information Organization in Digital Humanities: A Global
             Perspective. Coll. Digital Research in the Arts and Humanities. [Editors: Koraljka Golub /
             Ying-Hsang Liu], Routledge, pp. 178–204. ISBN 97803675516.
         [3] R. Costa, A. Salgado, F. Khan, S. Carvalho, L. Romary, B. Almeida, M. Khemakhem. M.
             Ramos, R. Silva, T. Tasovac (2021). MORDigital: the advent of a new lexicographical
             Portuguese project. Electronic lexicography in the 21st century. Proceedings of the eLex
             2021 conference., Lexical Computing CZ s.r.o., Brno, Czech Republic, pp. 321–324. ISSN
             2533-5626.

4
  https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html
5
  https://www.w3.org/community/ontolex/
6
  https://github.com/MedKhem/grobid-dictionaries
7
  https://protege.stanford.edu/
8
  https://www.w3.org/2019/09/lexicog/
[4] F. Kahn, A. Salgado (2021). Modelling Lexicographic Resources Using CIDOC CRM,
    FRBRoo and Ontolex Lemon. In: A. Bikakis et al., eds., SWODCH 2021 – Semantic Web
    and Ontology Design for Cultural Heritage 2021. Proceedings of the International Joint
    Workshop on Semantic Web and Ontology Design for Cultural Heritage co-located with the
    Bolzano Summer of Knowledge 2021 (BOSK 2021). Bozen-Bolzano: CEUR-WS, pp. 1–
    12. ISSN 1613-0073.
[5] F. Khan, L. Romary, A. Salgado, J. Bowers, M. Khemakhem, T. Tasovac (2020). Modelling
    Etymology in LMF/TEI: The ‘Grande Dicionário Houaiss da Língua Portuguesa’ Dictionary
    as a Use Case. In: N. Calzolari et al., eds., LREC 2020 Conference Proceedings. Paris:
    ELRA, pp. 3172–3180. ISBN 979-10-95546-34-4.
[6] A. Salgado, R. Costa, (2019). Marcas temáticas en los diccionarios académicos ibéricos:
    estudio comparativo. RILEX: Revista sobre investigación léxicos, 2(2), pp. 37–63. e-ISSN
    2605-3136.
[7] A. Salgado, R. Costa, T. Tasovac (2019). Improving the consistency of usage labelling in
    dictionaries with TEI Lex-0. Lexicography: Journal of ASIALEX. e-ISSN 2197-4306.
[8] A. Salgado, R. Costa, T. Tasovac, A. Simões, Alberto (2019). TEI Lex-0 In Action:
    Improving the Encoding of the Dictionary of the Academia das Ciências de Lisboa. In: I.
    Kosem et al., eds., Electronic lexicography in the 21st century. Proceedings of the eLex
    2019 conference. 1–3 October 2019, Sintra, Portugal. Brno: Lexical Computing CZ, s.r.o.,
    pp. 417–433. ISSN 2533-5626.