<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Integrating Terminological and Ontological Principles into a Lexicographic Resource</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rute Costa</string-name>
          <email>rute.costa@fcsh.unl.pt</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ana Salgado</string-name>
          <email>anacastrosalgado@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Margarida Ramos</string-name>
          <email>mvramos@fcsh.unl.pt</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fahad Khan</string-name>
          <email>fahad.khan@ilc.cnr.it</email>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Carvalho</string-name>
          <email>sara.carvalho@ua.pt</email>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Toma Tasovac</string-name>
          <email>ttasovac@humanistika.org</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bruno Almeida</string-name>
          <email>brunoalmeida@fcsh.unl.pt</email>
          <xref ref-type="aff" rid="aff5">5</xref>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Khemakhem</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laurent Romary</string-name>
          <email>laurent.romary@inria.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raquel Silva</string-name>
          <email>raq.silva@fcsh.unl.pt</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ALMAnaCH - Automatic Language Modelling and ANAlysis &amp; Compuatational Humanities, INRIA</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Academia das Ciências de Lisboa</institution>
          ,
          <addr-line>Lisboa</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>ArcaScience.</institution>
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>BCDH - Belgrade Center for Digital Humanities</institution>
          ,
          <addr-line>Belgrade.</addr-line>
          <country country="RS">Serbia</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>CLLC - Cetnro de Línguas. Literaturas e Culturas</institution>
          ,
          <addr-line>Aveiro.</addr-line>
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>CLUNL - Centro de Linguística da Universidade Nova de Lisboa</institution>
          ,
          <addr-line>Lisboa</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>CNR - Istituto di Linguistica Computazionale “Antonio Zampollo” Pisa</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff7">
          <label>7</label>
          <institution>ROSSIO - ROSSIO Infrastructure - Social Sciences</institution>
          ,
          <addr-line>Arts and Humanities, Lisboa</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we will present the research that is taking place at the NOVA CLUNL1 where an international team is working on a financed project MORDigital2. MORDigital's goal is to encode the selected editions of Diccinario de Lingua Portugueza by António de Morais Silva (MOR), first published in 1789. 1 https://clunl.fcsh.unl.pt/grupos_clunl/lexicologia-lexicografia-terminologia/ 2 https://www.fct.pt/apoios/projectos/consulta/vglobal_projecto?idProjecto=164850&amp;idElemConcurso=14818</p>
      </abstract>
      <kwd-group>
        <kwd>3 dictionary</kwd>
        <kwd>lexicography</kwd>
        <kwd>digital humanities</kwd>
        <kwd>standards</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>MORDigital’s ultimate goals are, on the one hand, to promote accessibility to cultural heritage
while fostering reusability and, on the other hand, to contribute towards a more significant presence of
lexicographic digital content in Portuguese through open tools and standards. MOR represents a
significant legacy, since it marks the beginning of Portuguese dictionaries, having served as a model
for all subsequent lexicographic production. The team follows a new paradigm in lexicography, which
results from the convergence between lexicography, terminology, computational linguistics, and
ontologies as an integral part of digital humanities and linked (open) data. In the Portuguese context,
this research fills a gap concerning searchable online retrodigitised dictionaries, built on current
standards and methodologies which promote data sharing and harmonisation, namely TEI Lex-04 and
Ontolex-Lemon5. The team will further ensure the connection to other existing systems and lexical
resources, particularly in the Portuguese-speaking world.</p>
      <p>
        For this paper, after posing the theoretical background (terminology and lexicography) that
/underpins our methodology, we will present 4 interrelated tasks:
1. Structuration of MOR’s digitised versions using GROBID-Dictionaries6, a specific software
for the parsing, extraction and structuring of information extracted from dictionary text. In our
case, the tool will be used to parse the constituent parts of each dictionary entry, which involves
the preparation of a native encoding format that is compliant with the XML/TEI metamodel.
2. Presentation of a systematic analysis of the Mathematical Sciences and Medical Sciences
domains, their related domain labels [6], [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and other mechanisms, such as the use of formulae
present in the definition which identifies the specialised field of knowledge. We will propose a
hierarchical organisation that constitutes the foundation of domain ontologies.
3. Representation of the model in OWL resorting to Protégé7, a free, open-source ontology
editor. This means each class or individual in the ontology will be assigned a URI (Universal
Resource Identifier), used to reference the label present in each of the lexicographic entries in
accordance – whenever possible – with the TEI schemas.
4. Conversion of the TEI Lex-0 output of Task 4 into linked data using the RDF-based model
Ontolex-Lemon; the conversion will be based on work already carried out in the scope of
previous initiatives in rendering the two models more interoperable. The Ontolex-Lemon model
has recently been extended by a lexicography module – lexicog8 –, which facilitates
interoperability in modelling dictionaries as linked data.
      </p>
      <p>At the end of the paper, we will discuss the results, highlighting the challenges that we faced.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Acknowledgements</title>
      <p>This paper is supported by the MORDigital – Digitalização do Diccionario da Lingua Portugueza
de António de Morais Silva [PTDC/LLT-LIN/6841/2020] project financed by the Portuguese National
Funding through the FCT – Fundação para a Ciência e Tecnologia.</p>
    </sec>
    <sec id="sec-3">
      <title>3. References</title>
      <p>[4] F. Kahn, A. Salgado (2021). Modelling Lexicographic Resources Using CIDOC CRM,
FRBRoo and Ontolex Lemon. In: A. Bikakis et al., eds., SWODCH 2021 – Semantic Web
and Ontology Design for Cultural Heritage 2021. Proceedings of the International Joint
Workshop on Semantic Web and Ontology Design for Cultural Heritage co-located with the
Bolzano Summer of Knowledge 2021 (BOSK 2021). Bozen-Bolzano: CEUR-WS, pp. 1–
12. ISSN 1613-0073.
[5] F. Khan, L. Romary, A. Salgado, J. Bowers, M. Khemakhem, T. Tasovac (2020). Modelling
Etymology in LMF/TEI: The ‘Grande Dicionário Houaiss da Língua Portuguesa’ Dictionary
as a Use Case. In: N. Calzolari et al., eds., LREC 2020 Conference Proceedings. Paris:
ELRA, pp. 3172–3180. ISBN 979-10-95546-34-4.
[6] A. Salgado, R. Costa, (2019). Marcas temáticas en los diccionarios académicos ibéricos:
estudio comparativo. RILEX: Revista sobre investigación léxicos, 2(2), pp. 37–63. e-ISSN
2605-3136.
[7] A. Salgado, R. Costa, T. Tasovac (2019). Improving the consistency of usage labelling in
dictionaries with TEI Lex-0. Lexicography: Journal of ASIALEX. e-ISSN 2197-4306.
[8] A. Salgado, R. Costa, T. Tasovac, A. Simões, Alberto (2019). TEI Lex-0 In Action:
Improving the Encoding of the Dictionary of the Academia das Ciências de Lisboa. In: I.
Kosem et al., eds., Electronic lexicography in the 21st century. Proceedings of the eLex
2019 conference. 1–3 October 2019, Sintra, Portugal. Brno: Lexical Computing CZ, s.r.o.,
pp. 417–433. ISSN 2533-5626.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Carvalho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salgado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Simões</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tasovac</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Ontologie des marques de domaines appliquée aux dictionnaires de langue générale</article-title>
          , in [éditeur : Xavier Blanco]
          <article-title>La lexicographie en tant que méthodologie de recherche en linguistique Revue de Philologie Française et Romane - Langue(s) &amp; Parole, n. 5 . Mons: Edition du CIPA</article-title>
          . pp.
          <fpage>201</fpage>
          -
          <lpage>230</lpage>
          . ISSN papier 2466-
          <fpage>7757</fpage>
          , ISSN numérique 2684-
          <fpage>6691</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salgado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Almeida</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>SKOS as a key element for linking lexicography to digital humanities</article-title>
          . Information Organization in Digital Humanities:
          <string-name>
            <given-names>A Global</given-names>
            <surname>Perspective. Coll</surname>
          </string-name>
          .
          <article-title>Digital Research in the Arts and Humanities</article-title>
          . [Editors: Koraljka Golub / Ying-Hsang Liu], Routledge, pp.
          <fpage>178</fpage>
          -
          <lpage>204</lpage>
          . ISBN 97803675516.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salgado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Carvalho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Romary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khemakhem. M. Ramos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tasovac</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>MORDigital: the advent of a new lexicographical Portuguese project. Electronic lexicography in the 21st century</article-title>
          .
          <source>Proceedings of the eLex 2021 conference., Lexical Computing CZ s.r.o.</source>
          ,
          <string-name>
            <surname>Brno</surname>
          </string-name>
          , Czech Republic, pp.
          <fpage>321</fpage>
          -
          <lpage>324</lpage>
          . ISSN 2533-5626.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>