=Paper= {{Paper |id=Vol-3427/paper3 |storemode=property |title=Extracting Knowledge-rich Information From Definitions. A Corpus-based Approach to Building a Conceptual-based Terminological Resource |pdfUrl=https://ceur-ws.org/Vol-3427/paper3.pdf |volume=Vol-3427 |authors=Margarida Ramos,Rute Costa |dblpUrl=https://dblp.org/rec/conf/mdtt/RamosC23 }} ==Extracting Knowledge-rich Information From Definitions. A Corpus-based Approach to Building a Conceptual-based Terminological Resource== https://ceur-ws.org/Vol-3427/paper3.pdf
Extracting knowledge-rich information from definitions. A
corpus-based approach to building a conceptual-based
terminological resource
Margarida Ramos 1, Rute Costa 1
1
 NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa, Avenida de Berna 26-C, 1069-061
Lisboa, Portugal

                               Abstract
                               This paper aims to describe a text-mining approach on a domain corpus (cork) within the theoretical
                               framework of the dual dimension of terminology to create a terminological dictionary and correlate
                               it with an ontology. We will make some considerations on (i) domain specificities; (ii) lexical
                               markers; (iii) automatic corpus processing using Sketch Engine; (iv) representation of lexical
                               networks using CmapTools; and (v) representation of the concept system using Protégé. The goal
                               of the ontology is to logically support the coherence and quality of the natural language definitions
                               contained in the terminological resource.

                               Keywords1
                               terminology; definition; domain-ontology; knowledge-rich information; domain corpus;
                               terminological dictionary.

1. Introduction
    This paper aims to demonstrate a method for developing a terminological dictionary based on a
domain ontology. To this end, we will describe the methods used to capture specialised lexical and
conceptual knowledge from the corpus and use it to develop a dedicated ontology. The terminological
resource will consist of a linguistic description of the specialised concepts, based on the formal
definitions of the concepts that make up the cork ontology, the OntoCork [11].
    The method used in this paper is corpus-driven. The corpus was compiled based on rigorous criteria
specific to terminological work [10], where the specialised context of text production is a key-element.
In this sense, the corpus is composed of technical explanatory and normative (standards) texts. For
corpus analysis, we used Sketch Engine2 to find and systematise lexical-semantic relationships. During
the corpus analysis process, we found two types of relevant knowledge-rich information [6]: definitions
and definitional contexts. Definitions are one of the components of the glossary’s microstructure that
can be found at the end of the normative texts. The purpose of these definitions is to achieve a consensus
among the members of the cork community. On the other hand, the definitional contexts are integral
parts of the texts and have relevant specialised lexical-semantic markers in their structure.
    Our method encompasses two stages:
    (i) From the linguistic analysis of the lexical markers, and the corresponding lexical-semantic
relations observed between the terms, we systematise the results into lexical maps using CmapTools3.
    (ii) Based on the previous stage, we proceed to the conceptual analysis and subsequent formal
representation. The conceptual analysis grounds the identification of conceptual relations obtained by
interpreting the lexical-semantic relations observed between two terms. To infer conceptual relations –


2nd International Conference on “Multilingual Digital Terminology Today. Design, Representation Formats and Management Systems”
(MDTT 2023), June 29–30, 2023, Lisbon, Portugal
EMAIL: mvramos@fcsh.unl.pt (M. Ramos); rute.costa@fcsh.unl.pt (R. Costa).
ORCID: 0000-0001-7209-3806 (M. Ramos); 0000-0002-3452-7228 (R. Costa)
                                  © 2020 Copyright for this paper by its authors.
                                  Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Wor
    Pr
       ks
        hop
     oceedi
          ngs
                ht
                I
                 tp:
                   //
                    ceur
                       -
                SSN1613-
                        ws
                         .or
                       0073
                           g

                                  CEUR Workshop Proceedings (CEUR-WS.org)
2
    https://www.sketchengine.eu/
3
    https://cmap.ihmc.us/
                                                                                                                                  1
such as the associative type – and to identify characteristics that will help us in the process of building
concept systems, we resort to deductive mechanisms employing the Aristotelian formula: X=Y+DC to
build OntoCork.

2. Domain corpus: cork

   The Cork Corpus was built up from texts produced within the cork industry. The internal and
external criteria [1],[8] used to build our specific-domain corpus are systematised in Table 1.

Error! Reference source not found. 1
Internal and external criteria of the cork corpus
 Criteria                                                 Purpose/description
 Degree of specialisation                                 Produced by experts and semi-experts
 Source validation                                        Entities recognised as an authority
 Type                                                     Technical-explanatory; normative
 Content adequacy                                         On cork/Cork stopper
 Synchronism (≤ 10 years)                                 Given the fast evolution of technology

     The corpus comprises 98 texts written in European Portuguese (see Figure 1).

                                    newsletters   specialised      books                 instruction manuals
     corpora collection
                               brochures 18%      periodicals       8%                            4%
                                                                               industrial guide
                                  6%                  4%
                        studies                                                      1%
                          7%      reports                                                standards
                                    8%                                     decree-laws       9%
              academic articles                                 theses         6%
                   16%                                           13%
Figure 1: Corpora collection


    These texts were produced by experts from different organisations and in different domains related
to the cork industry. The texts were collected according to the following criteria: (1) texts produced by
and for the scientific community in the domain of cork; (2) texts produced by experts for quasi-experts;
and (3) texts produced by experts for non-experts.

3. Terminological data extraction

     Considering the 98 documents of the corpus, we have obtained the quantitative data shown in Table
2.

Error! Reference source not found. 2
Quantitative data of the corpus
                                                          Total number
 Tokens                                                   1,712,652
 Words                                                    1,217,968
 Sentences                                                48,031

   For the corpus exploration and linguistic data analysis, we mainly focused on 43 texts produced in
two communicative settings, namely (i) expert–semi-expert, and (ii) expert–quasi-
experts/professionals, while the remaining 55 texts were used as a reference corpus [2] so that we could
compare a given terminological data extraction (see Figure 2). The corpus was processed using Sketch
Engine, with which we compiled, annotated, and queried the corpus employing advanced searches in
Corpus Query Language (CQL) format, where regular expressions (regex) are applied.

                                                                                                               2
               Scientific : expert-expert                               Communicative setting of text production
               Regulatory: semi-expert - expert                                  4
                                                                  6
               Marketing : semi-expert - non-expert                       16
                                                                                                      16

               Narrative-Informative : semi-expert - non-expert                      43
                                                                                                      27
                                                                         29
               Economics : expert- semi-expert

               Technical-explanatory & normative : expert -
               quasi-expert / professional                                                                   n=98

Error! Reference source not found. 2: Subcorpus under focus (43 texts) based on the communicative
setting of text production

    Among the results presented in Table 2, the most frequent terms are “cortiça” [cork]4 and “rolha”
[stopper]. Given the high frequency of these two terms, we analysed the contexts in which they occur
in the subcorpus (43 texts) using the Word Sketch function as a first option, with which we identified
some candidate terms such as “ROLHA COLMATADA” [colmated stopper] (in capital letters). We then
moved on to simple queries (concordances) to search for polylexical terms containing adjectives in their
pattern. Once the most common morphosyntactic structures of terms were identified, we decided to
improve our search for terms and definitions employing advanced queries, namely through regex, so
that we could capture knowledge-rich contexts (KRC) [6], e.g., definitions (definitions found in context)
and definitional contexts (contexts explaining what the concept is; thus valuable for understanding
and/or elaborating proper definitions).

3.1.         Exploring the corpus with text mining methods

   Based on the patterns we have identified within definitional contexts, we explored the subcorpus
with advanced queries using regexes. For this paper, we will highlight two specific regexes that proved
productive in isolating lexical relations between terms, but also in finding definitional contexts where
the generic term is expanded in its syntax (see Table 3).

Error! Reference source not found. 3
Linguistic expressions commonly used by experts
        Definitional contexts (pt)                                Literal translation into English
    (1) Rolha que foi submetida a um tratamento químico           en: Stopper that was submitted to chemical treatment with
        com o objectivo de desinfectar e/ou homogeneizar          the aim of disinfecting and/or homogenising the colour and /
        a cor e/ou branquear.                                     or bleaching
    (2) Rolha cuja superfície lateral foi submetida a uma         en: Stopper whose side surface was submitted to an abrasion
        operação de abrasão para a tornar cilíndrica ou           operation to make it cylindrical or to reduce its diameter.]
        diminuir o seu diâmetro.


    The first regex has the following structure: "rolha"[tag="V.P.*SF"], whose formulation aims to
match patterns as ONLY forms of “rolha” [stopper] followed by ANY past participle ONLY in the singular
and feminine inflection. For the elaboration of this regex, we considered the linguistic expressions used
repeatedly by the experts, such as the past participle co-occurring with a term (see Table 4). The
outcomes of this query, namely 69 hits, delivered the most productive patterns for identifying lexical
markers, such as “x foi submetida a y” [x was submitted to y], as well as terms whose morphosyntactic
structures fall under our search patterns, such as [Noun + Past Participle], e.g., “x acabada” [finished
X] or “X terminada” [finalised X] where X is a term and Y corresponds to a structure that has proved

4
    Our translation
                                                                                                                             3
to be rich in knowledge information, i.e. information provided by the experts that allows us to perceive
their conceptualisations [9].
    Considering the satisfactory results obtained, we decided to expand its formulation (regex 2):
"rolha"[(tag="D.*"|(tag="S.*")]?[tag="A.*"]?"cortiça"?[]{0,4}"rolha"[]{0,4}[tag="V.
P.*SF"]. In this case, we want to match a context in which the terms “rolha” [stopper] and “cortiça”
[cork] may co-occur with either adjectives, past participles, or found duplicated, in addition to the
functional forms. Out of the 55 hits matched by this regex, 48 were either a description or a definition.
   From the whole set of descriptions or definitions semi-automatically extracted from the Cork
Corpus, we decided to select ten (10) definitions for linguistic and conceptual analysis (see Table 4).

Error! Reference source not found. 4
Ten (10) definitions to organise a typology of cork stoppers
 #   10 definitions (literal translations from pt)              10 definitions (pt) extracted from the Cork Corpus

     stopper                                                    rolha
     Product obtained from natural cork and / or                Produto obtido da cortiça natural e / ou de cortiça
 1   agglomerated cork, consisting of one or more               aglomerada, constituído por uma ou mais peças,
     pieces, intended to seal bottles or other containers       destinado a vedar garrafas ou outros recipientes e a
     and to preserve their contents. (5.1 - NORM)               preservar o seu conteúdo. (5.1 - NORM)
     STOPPER                                                    ROLHA
     piece of cork, usually cylindrical, conical or prismatic   peça de cortiça, em geral cilíndrica, troncocónica ou
     quadrangular, sometimes with rounded or                    prismática quadrangular, por vezes de arestas
 2   chamfered lateral edges, consisting of one or              laterais boleadas ou chanfradas, constituída por um
     several glued elements and intended to seal the            ou vários elementos colados e destinada a vedar os
     containers or contribute to their water tightness.         recipientes ou a contribuir para a sua
     (7.8 – TECH)                                               estanquicidade (7.8 – TECH)
     natural cork stopper                                       rolha de cortiça natural
     Stopper consisting entirely of natural cork                Rolha totalmente constituída por cortiça natural.
     Note: Natural cork stoppers that have been                 Nota: As rolhas naturais que tenham sido
 3
     submitted to the sealing operation (see 6.5.5) are         submetidas à operação de colmatagem (ver 6.5.5)
     commonly referred to as colmated natural stoppers.         são comummente designadas por rolhas naturais
     (5.5 – NORM)                                               colmatadas. (5.5 – NORM)
     colmated natural cork stopper                              rolha de cortiça natural colmatada
     The colmated natural cork stopper is a stopper             A rolha de cortiça natural colmatada é uma rolha
     made of natural cork in which its lenticels are filled     feita de cortiça natural em que são obturadas as
 4
     with a mixture of glues and cork powder from the           suas lenticelas com uma mistura de colas e pó de
     dimensional finishing processes of natural cork            cortiça proveniente dos acabamentos dimensionais
     stoppers. (6.1 – REP)                                      das rolhas de cortiça natural. (6.1 – REP)
     agglomerated cork stopper                                  rolha de cortiça aglomerada
     Stopper obtained by the agglutination of cork              Rolha obtida pela aglutinação de granulado de
     granules with a size between 0,25 mm and 8 mm,             cortiça com dimensão compreendida entre 0,25mm
 5
     with addition of binders, by means of extrusion or         e 8mm, com adição de ligantes, através de extrusão
     moulding and composed of at least 51% by weight            ou moldagem e composta, pelo menos, por 51 % de
     of cork granules. (5.5 – NORM)                             granulado de cortiça, em peso. (5.5 – NORM)
     agglomerated stopper:                                      rolha aglomerada:
 6   piece of agglomerated cork, obtained by extrusion          peça de cortiça aglomerada, obtida por extrusão ou
     or moulding (3.1 – STUD)                                   moldagem (3.1 – STUD)
                                                                rolha n+n
     n+n stopper
                                                                Rolha formada por um corpo de cortiça aglomerada
     Stopper formed by a body of agglomerated cork and
                                                                e “n” discos de cortiça natural colados num ou em
 7   “n” disks of natural cork glued to one or both ends.
                                                                ambos os topos.
     N.B.: In this designation, “n” indicates the number
                                                                Nota: Nesta designação, “n” indica o número de
     of disks used. (5.5 – NORM)
                                                                discos utilizados. (5.5 – NORM)
     technical stopper                                          rolha técnica
     Technical stoppers are composed of a very dense            As rolhas técnicas são constituídas por um corpo de
     body of agglomerated cork with disks of natural            cortiça aglomerada, muito denso, com discos de
 8   cork glued to one end - or to both ends. Technical         cortiça natural colados no seu topo – ou em ambos
     stoppers with one disk on each end are called 1+1          os topos. As rolhas técnicas com um disco em cada
     technical stoppers; those with two disks of natural        topo são designadas rolhas técnicas 1+1. Com dois
     cork on each end are called 2+2 technical stopper;         discos de cortiça natural em cada topo chamam-se

                                                                                                                        4
                        and those with two disks glued at only one of the        rolhas técnicas 2+2, e com dois discos em apenas
                        ends are called 2+0 technical stoppers. (6.1 – REP)      um dos topos chamam-se rolhas técnicas 2+0. (6.1 –
                                                                                 REP)
                        rounded stopper                                          rolha boleada
 9                      Stopper whose edges of one or two ends were              Rolha cujas arestas de um ou dois topos foram
                        rounded by abrasion. (5.5 – NORM)                        arredondadas, por abrasão. (5.5 – NORM)
                        marked stopper
                                                                                 ROLHA MARCADA
 10                     Stopper whose lateral surface or ends were marked
                                                                                 Rolha cuja superfície lateral ou topos foram
                        in ink or by fire (7.6 – TECH)
                                                                                 marcados a tinta ou a fogo. (7.6 – TECH)


   For this paper, we will consider only one definition, namely  [natural cork
stopper] (see line 3 in Table 4), to demonstrate our linguistic and conceptual analysis. However, instead
of using the definitional statement written in Portuguese, we have decided to use its literal translation
into English for clarity.

Error! Reference source not found. 5
Linguistic analysis of the definition of 
 Concept
 
 Definition in context
 stopper consisting entirely of natural cork

 Note: Natural cork stoppers that have been submitted to the sealing operation (see 6.5.5) are commonly
 referred to as colmated natural stoppers

                                                                                (Literal translation). Source: (Cork Corpus 5.5 – NORM)
                                                                                           Lexical-semantic
                                            Analysis          Lexical marker (LM)                                        Interpretation
                                                                                               relations
                                                                                                                     stopper [GENERIC]
                                     natural cork stopper
                                                              ‘is a’ = Ø                HYPERNYMY - HYPONYMY        natural cork stopper
                                     [is a] stopper
                                                                                                                    [SPECIFIC]
                                     natural cork stopper     ‘consisting entirely                                  natural cork stopper
 LINGUISTIC DIMENSION




                                     [consists entirely of]   of’                       HOLONYMY-MERONYMY           [OBJECT]
                                     natural cork                                                                   natural cork [STUFF]
                                     natural cork stopper                                                           sealing operation
                                     [is submitted to] the    ‘submitted to’            HOLONYMY-MERONYMY           [ACTIVITY]
                                     sealing operation                                                              ? = [FEATURE]
                                                                                                                     natural cork stopper
                                     colmated natural         ‘commonly referred
                                                                                                                    [GENERIC]
                                     stopper [is a] natural   to as’                    HYPERNYMY - HYPONYMY
                                                                                                                    colmated natural
                                     cork stopper             same as = ‘is a’
                                                                                                                    stopper [SPECIFIC]
                                     colmated natural         results from
                                                                                                                     sealing operation
                                     stopper [results
                                                                                        HOLONYMY-MERONYMY           [ACTIVITY]
                                     from] the sealing        = inferred from
                                                                                                                    colmated = [FEATURE]
                                     operation                ‘submitted to’

    Table 5 represents the first moment of our study, where we describe the deconstruction of the
definition and present its linguistic analysis. The aim is to analyse the lexical-semantic relations
between terms. The definition of  is given in the main sentence, followed by
some encyclopaedic information, namely the note. While the first sentence provides essential
information for understanding what a  is made of, the encyclopaedic information
conveys information about what the object is when submitted to a specific operation. The first
information that we obtain from the analysis is that a  “is a stopper”. In this
statement, “is a” is a lexical marker that relates term A “natural cork stopper” and term B “stopper”,
giving us a clear hypernym-hyponym relation, where “natural cork stopper” is the hyponym of the
hypernym “stopper”.
                                                                                                                                            5
    In the second sentence – inserted as a note in the definition – another piece of information is obtained
from the analysis of the statement “natural cork stoppers that have been submitted to sealing operation”.
Here, the lexical marker is “submitted to” [submetidas à] and relates the term “natural cork stopper”
[rolha natural] to the term “sealing operation” [operação de colmatagem]. The term “sealing operation”
– which indicates an operation/activity – is related by the lexical marker “submitted to” [submetidas à]
to the term “natural cork stopper” – which we already know to be an object. The interpretation of their
meanings allows us to infer that the lexical-semantic relation established is meronymy, subtype
[ACTIVITY-FEATURE] [5] (see Map 1 for the former, and Map 1.1 for the latter, in Figure 3).




Error! Reference source not found.3: Lexical Map 1 and Lexical Map 1.1

4. The conceptual analysis
  The conceptual analysis corresponds to the second stage of the analysis of the definition in focus.
The differential characteristics found in this definition are expressed by /natural cork/, /natural/,
                                                                                                   6
/colmated/ and /sealing operation/. The observations of this analysis are systematised in Table 6 and are
based on the lexical markers found in the linguistic analysis of the definition. At the same time, based
on the linguistic interpretation of the data, we extrapolated to conceptual relation identifiers.

Error! Reference source not found. 6
The conceptual analysis of the definition of 
                                                                   Aristotelian formula
                                                                         (X=Y+DC)
                                                 X [species] = Y [genus] + DC [differential characteristic]
                                            Conceptual                                                                  Differential
                                                               Conceptual                           Transcription in
                            Analysis          relation                            Interpretation                       characteristi
                                                                relation                                X=Y+DC
                                             identifier                                                                      cs
                                                                                      stopper         natural cork
                                               is_a
                         natural cork                                                [GENUS]             stopper
                         stopper [is a]                       SUBSUMPTION          natural cork       [SPECIES] =
                                          [corresponds to
                         stopper                                                      stopper            stopper
                                             LM ‘is a’]
                                                                                    [SPECIES]       [GENUS] + DC ?
                                                                                                      natural cork
                                                                                   natural cork
                                          has_substance                                                  stopper
                         natural cork                                                stopper
 CONCEPTUAL DIMENSION




                                                                                                       [SPECIES]
                         stopper [is                                               [PRODUCT]                             /natural
                                          [corresponds to     ASSOCIATIVE                              = stopper
                         made of]                                                  natural cork                           cork/
                                           LM ‘consisting                                               [GENUS]
                         natural cork                                                 [RAW
                                            entirely of’]                                            + natural cork
                                                                                   MATERIAL]
                                                                                                           [DC]
                                          has_substance
                         natural cork                                                                 natural cork
                                                                                  cork [MATTER]
                         stopper [is                                                                [GENUS] = cork
                                          [corresponds to     ASSOCIATIVE             natural                           /natural/
                         made of]                                                                      [GENUS] +
                                           LM ‘consisting                          [PROPERTY]
                         natural cork                                                                 natural [DC]
                                            entirely of’]
                                                                                                     ? [SPECIES] =
                         natural cork      has_process
                                                                                     sealing         natural cork
                         stopper [is
                                                                                   operation =          stopper          /sealing
                         submitted        [corresponds to     ASSOCIATIVE
                                                                                   [PROCESS]          [GENUS] +         operation/
                         to] sealing       LM ‘submitted
                                                                                  ? = [RESULT]          sealing
                         operation              to’]
                                                                                                    operation [DC]

                                               is_a                               natural cork          colmated
                        colmated
                                                                                      stopper       natural stopper
                        natural
                                          [corresponds to                            [GENUS]           [SPECIES] =
                        stopper [is a]                        SUBSUMPTION                                              /colmated/
                                               the LM                               colmated          natural cork
                        natural cork
                                             ‘commonly                           natural stopper         stopper
                        stopper
                                            referred as’]                           [SPECIES]           [GENUS]
                                                                                                    + colmated [DC]


    As systematised in Table 6, we propose three conceptual relation identifiers, namely, (1)
has_substance, (2) is_a, and (3) has_process.
         (1) has_substance is expressed by the lexical marker “consisting entirely of” [totalmente
constituída por], which refers to the substance of the object. As we know from the linguistic analysis,
the term “natural cork” points to the notion of substance, a material that a given object can be made of.
Since  is an object made of a substance, we propose the conceptual relation identifier
has_substance to represent such a semantic relation. This semantic relation mirrors a pragmatic
association - e.g., a thematic connection through virtue or experience, or a dependency between
concepts established by the proximity of time and space [3] - in which a  is a [PRODUCT]
obtained from a substance, more specifically a [RAW MATERIAL]. From the interpretation of this
information, we assume that an associative conceptual relation is in place, subtype PRODUCT – RAW
MATERIAL, in which stopper points to the meaning of PRODUCT, and natural cork points to the meaning
of RAW MATERIAL. This interpretation can be represented as follows: [stopper] PRODUCT has_substance
[natural cork] RAW MATERIAL.
The dichotomy PRODUCT – RAW MATERIAL has twofold importance at this point of the conceptual
analysis: on the one hand, it underpins the subtype of the associative relation, while on the other hand,
it is included in the Aristotelian formula [8],[6] known as X = Y + DC, where X=specific concept;
Y=genus; and DC=differential characteristics. The purpose of using such a formula is to identify, for
                                                                                                        7
the task of concept modelling, the characteristics stated in the definition under analysis. In order to use
such a formula, one must first identify two concepts: the specific concept and its genus. [We will
develop this further in the paper].
         (2) is_a relation:  is the subordinate concept, which we have labelled
[SPECIES], and  is the superordinate concept, which we have labelled [GENUS]. This
assumption can be represented as: [natural cork stopper] SPECIES is_a [ stopper] GENUS. Once the genus
and the species have been identified, we can then insert these two elements in the formula X SPECIES =
Y GENUS + DC, where: X = natural cork stopper; Y = stopper. Differential characteristics are inferred in
a second stage: considering that [stopper] PRODUCT has_substance [natural cork] RAW MATERIAL, we can
conclude that X [natural cork stopper] = Y [stopper] + DC [natural cork]. The first statement of the
definition conveys the information represented by the first interpretation above, with the dichotomy
[SPECIES-GENUS], which can be represented in the form of a conceptual map (see Figure 4). Conceptual
map 1 is built by applying a differentiae dichotomy in which the differential characteristic /natural cork/
underlies one of the subdivision criteria5.




Error! Reference source not found.4: Conceptual Map 1 - two composition types of
 in CmapTools

Conceptual Map 1 (Figure 4) is the conceptual representation of the first statement of the definition,
from which we have inferred that a  is_a . Two axes of analysis
are considered in this map: Substance and Parts (the ‘Parts’ axis was inferred from Definition 1; see
Table 4). The conceptual information represented here, namely the axes of analysis Substance and Parts
– whose underlying characteristics are /natural cork/, /mono piece/ and /multi piece/ – will be some of
the coordinates for the elaboration of the formal description of the concept NaturalCorkStopper in

    5
      According to (ISO/FDIS 1087), the “subdivision criterion [is the] type of characteristic according to which a superordinate concept is
divided into subordinated concepts.” (2019 (E), p. 5).
                                                                                                                                          8
Protégé. Finally,  will help us to formally describe types of
 composed of several Parts – not only made of , but also of
 and . Here, the characteristics fall under the axis of analysis
‘Parts’ and are the coordinates for modelling multi-part concepts.
         (3) has_process: Following the same method, the analysis of the note from which we obtained
the information:  is submitted to /sealing operation/, was represented in a second
map (Figure 5). This piece of information grounds the conceptual relation identifier we have named as
has_process.




Error!    Reference       source      not       found.5:                Conceptual         map         of


    Conceptual Map 2 (Figure 5) is the representation of the two sentences of the definition in focus.
Therefore, three axes of analysis are now considered: Substance, Parts, and Finishing Processes, to
which the characteristics /with sealing operation/ and /without sealing operation/, were added. As
represented in Conceptual Map 2, the characteristics /with sealing operation/ and /without sealing
operation/ led us to a different level of concept representation, i.e., the concept
, verbally designated by “colmated cork
stopper”, is a specialisation of , in turn, verbally designated by
“natural cork stopper”. Therefore, these two concepts should not be treated at the same level, nor should
they be defined in the same definitional context, either in natural language or in (semi)formal languages.
    The conceptual relations we have inferred from the analysis of the lexical markers observed in the
first five definitions (see Table 4), is summarised in Table 7.

Error! Reference source not found. 7
Overview of the conceptual relations inferred from lexical markers
                                                                                                        9
                      Conceptual relation   Conceptual     A typology of definitional texts governed by
Lexical marker
                      identifier            relation      the DC
‘is a’                is_a                  SUBSUMPTION   stopper [SPECIES]= product [GENUS] + [any DC
                                                          added to the genus]
 ‘commonly            is_a                  SUBSUMPTION   colmated natural stopper [SPECIES] = natural cork
referred as’                                              stopper [GENUS] + colmated [DC added to the
                                                          genus]
‘is a’                is_a                  SUBSUMPTION   colmated natural cork stopper [SPECIES] = stopper
                                                          [GENUS] + [any DC added to the genus]
‘intended to’         has_function          ASSOCIATIVE   stopper [SPECIES] = product [GENUS] + to seal
                                                          bottles [FUNCTION=DC]
‘obtained from’       has_raw_material      ASSOCIATIVE   stopper [SPECIES] = product [GENUS] + natural cork
                                                          [SUBSTANCE=DC]
‘obtained from’       has_raw_material      ASSOCIATIVE   stopper [SPECIES] = product [GENUS] +
                                                          agglomerated cork [SUBSTANCE=DC]
‘obtained from’       has_substance         ASSOCIATIVE   natural cork [SPECIES] = cork [GENUS] + natural
                                                          [SUBSTANCE=DC]
‘obtained from’       has_substance         ASSOCIATIVE   natural cork [SPECIES] = cork [GENUS] +
                                                          agglomerated [SUBSTANCE=DC]
‘intended to’         has_function          ASSOCIATIVE   stopper [SPECIES] = piece of cork [GENUS] + to seal
                                                          containers [FUNCTION=DC]
‘piece of’            has_substance         ASSOCIATIVE   stopper [SPECIES] = piece [GENUS] + cork
                                                          [SUBSTANCE=DC]
‘usually’             has_shape             ASSOCIATIVE   stopper [SPECIES] = piece of cork [GENUS] +
                                                          cylindrical [SHAPE=DC]
‘usually’             has_shape             ASSOCIATIVE   stopper [SPECIES] = piece of cork [GENUS] + conical
                                                          [SHAPE=DC]
‘usually’             has_shape             ASSOCIATIVE   stopper [SPECIES] = piece of cork [GENUS] +
                                                          prismatic quadrangular [SHAPE=DC]
‘sometimes            has_process           ASSOCIATIVE   stopper [SPECIES] = piece of cork [GENUS] +
with’                                                     rounded edges [PROCESS=DC]
‘sometimes            has_process           ASSOCIATIVE   stopper [SPECIES] = piece of cork [GENUS] +
with’                                                     chamfered edges [PROCESS=DC]
‘‘consisting          has_substance         ASSOCIATIVE   natural cork stopper [SPECIES]
entirely of’                                              = stopper [GENUS]
                                                          + natural cork [SUBSTANCE=DC]
‘consisting           has_substance         ASSOCIATIVE   natural cork [GENUS] = cork [GENUS] + natural
entirely of’                                              [SUBSTANCE=DC]
‘submitted to’        has_process           ASSOCIATIVE   ? [SPECIES] = natural cork stopper [GENUS] + sealing
                                                          operation [DC]
‘is made of’          has_raw_material      ASSOCIATIVE   colmated natural cork stopper [SPECIES] = stopper
                                                          [GENUS] + natural cork [SUBSTANCE=DC]
‘is made of’          has_substance         ASSOCIATIVE   colmated natural cork stopper [SPECIES] = natural
                                                          cork stopper [GENUS] + colmated [SUBSTANCE=DC]
 ‘its lenticels are   has_process           ASSOCIATIVE   colmated natural cork stopper [SPECIES] = natural
filled’                                                   cork stopper [GENUS] + filled lenticels
                                                          [PROCESS=DC]
‘results from’        has_process           ASSOCIATIVE   cork powder [SPECIES] =
                                                          natural cork [GENUS] + dimensional finishing
                                                          process [PROCESS=DC]
‘consisting of’       has_part              PARTITIVE     stopper [SPECIES] = product [GENUS] + one piece
                                                          [PARTS=DC]

                                                                                                              10
    ‘obtained from’      has_part                      PARTITIVE              stopper [SPECIES] = product [GENUS] + several
                                                                              pieces [PARTS=DC]
    ‘consisting of’      has_part                      PARTITIVE              stopper [SPECIES] = piece of cork [GENUS] + one
                                                                              element [PARTS=DC]
    ‘consisting of’      has_part                      PARTITIVE              stopper [SPECIES] = piece of cork [GENUS] + several
                                                                              elements [PARTS=DC]


    As shown in Table 7, differential characteristics (DC) can be any characteristic in a given definition
according to the formula of an intensional definition [4], so that, depending on what is added to the
intension of the [GENUS], the understanding of the concept’s place in the concept system is provided.
The same happens with the associative relation, although with several other axes of analysis involved.
Here, DC share semantic labels in a more productive variety, namely [SUBSTANCE]; [FUNCTION];
[PROCESS] and [SHAPE], given the prolific semantic relations identified between concepts.

5. Building the ontology

   For the task of building OntoCork, we used the editor Protégé [7]. OntoCork is an ontology in which
the concepts of the domain of cork are systematised through logical constructs. The descriptive domain
properties (conceptual relations) elaborated to develop the ontology ground on the five axes of analysis
that we have previously retained, namely Part, Substance, Shape, Finishing Process, and Function, as
systematised in Table 8.

Error! Reference source not found. 8
Five core conceptual relations of the ontology
                         Format in
    Axis of analysis                            Type of conceptual relation
                         Protégé
    FUNCTION             hasFunction            associative relation, subtype [OBJECT-FUNCTION]
                                                associative relation, covering both subtypes [RAW MATERIAL – PRODUCT] and
    SUBSTANCE             IsMadeOf
                                                [MATTER/SUBSTANCE – PROPERTY]
    PARTS                hasStructure           partitive relation [PART-WHOLE]
    FINISHING
                         hasProcess             associative relation, within the subtype [PROCESS-RESULT]
    PROCESS
    SHAPE                hasShape               associative relation, subtype [OBJECT-SHAPE]


   For this paper, we will present the description of the characteristics that build up the formal definition
of , a closure with a body-structure of 1  submitted to ,
in addition to the classification provided by the reasoner HermiT 6 as a  object (see
Figure 6).




6
    HermiT – a plugin reasoner of Protégé (http://www.hermit-reasoner.com/)
                                                                                                                                11
Error!   Reference      source       not      found.6:               Concept         description       of
ColmatedMonoPieceNaturalCorkStopper, in Protégé

   Figure 7 is the ontological representation of ColmatedMonoPieceNaturalCorkStopper, in
Ontograf7, where we can observe several concepts systematised, either vertically: in a hierarchical
dependency, or horizontally: in a pragmatic (associative) dependency, according to the differential
characteristics. For clarity, we have decided to elide the visualisation of the associative relations
between concepts that are not in focus in the following lines.




Error!   Reference    source     not      found.7:              Ontological       representation       of
ColmatedMonoPieceNaturalCorkStopper, in Ontograf

   As illustrated in Figure 7, the ColmatedMonoPieceNaturalCorkStopper is a specification of
MonoPieceNaturalCorkStopperWithFinishingProcess. The subsumption relation is represented by
vertical blue arcs, and the associative relations are represented by horizontal dashed lines. The concepts
7
    https://protegewiki.stanford.edu/wiki/OntoGraf
                                                                                                       12
ColmatedMonoPieceNaturalCorkStopper and LenticelsColmation are linked by the associative
relation, subtype [PROCESS-RESULT]: hasLenticelsColmationOperation. This conceptual relation is based on
the differential characteristic /with sealing operation/, which was drawn from the analysis of the
definition of . Thus, hasLenticelsColmationOperation is the associative relation that
induces the specification of MonoPieceNaturalCorkStopperWithFinishingProcess by differentia.
Finally, it is also possible to see a hierarchical representation of FinishingProcesses, in which the
involved operation of the concept we have just described is assigned as the most specific concept of
this hierarchy. The interpretation of this subsumption is: LenticelsColmation is a kind of
QualityTreatment, which is a kind of SurfaceTreatment, which in turn is a kind of Semi-
finishingProcess, all of these are kinds of FinishingProcesses.


6. Conclusion

    With this research, we wanted to explain the method used to build an ontology from human analyses
of linguistic data. Linguistic and conceptual levels of analysis are to be analysed in relation to one
another but as distinct phenomena. Texts are vehicles for knowledge transfer. Analysing texts to extract
the characteristics of concepts, linguistically expressed by lexical markers pointing to lexical-semantic
relations, allowed us to effectively capture the conceptual relations that are specific to the domain
through the formula X=Y+DC. As demonstrated in this study, we were able to propose a preliminary
conceptual organisation of the subject field. We have bridged three main aspects in our study: (i) the
classical aspects of the Aristotelian logic; (ii) the methodology of our terminological work – where
characteristics play a fundamental role in the analysis or the drafting of intensional definitions; and (iii)
the formal definitions, for which we have used Protégé and the inherent Web Ontology Language
(OWL) [12] to formally describe the concepts of the domain in order to relate them via abstract syntaxes
and thus achieve formal reasoning, as concepts are consistently defined in a ‘reason-able’ ontology.
    In future work, we intend to model the conceptual and the linguistic information contained in the
resources we have developed, namely the ontology, the corpus, and a glossary (in progress) developed
with Lexonomy8, as linked data with the use of interoperable Linked Open Vocabularies9.


7. References

        [1] Atkins, S., Clear, J., & Ostler, N. (1992). Corpus Design Criteria. Literary and Linguistic
            Computing, 7(1), 1 - 16. doi:10.1093/llc/7.1.1
        [2] Baker, P., Hardie, A., & McEnery, T. (2006). A Glossary of Corpus Linguistics. Edinburgh:
            Edinburgh University Press.
        [3] ISO 704. (2009). Travail terminologique - Principes et méthodes. NF ISO 704, 1er tirage 2009-
            12-P. La Plaine Saint-Denis: Association Française de Normalisation.
        [4] ISO/FDIS 1087. (2019 (E)). Terminology work and terminology science - Vocabulary. Suisse:
            ISO.
        [5] L'Homme, M. C. (2004). La Terminologie: principes et techniques - Paramètres. Montréal,
            Canadá: Les presses de l'Université de Montréal.
        [6] Meyer, I. (2001). Extracting Knowledge-Rich contexts for terminography: a conceptual and
            methodological framework. In D. Bourigault, C. Jacquemin, & M.-C. L'Homme (Eds.), Recent
            Advances in Computational Terminology (Vol. 2, pp. 279 - 302). Amsterdam / Philadelphia:
            John Benjamins B.V.
        [7] Musen, M. A. (2015). The Protégé project: A look back and a look forward.
            doi:10.1145/2557001.25757003
        [8] Pearson, J. (1998). Terms in context. Amsterdam: John Benjamins B.V.
        [9] Pottier, B. (1992). Théorie et analyse en Linguitique (2, corrigée ed.). Paris: HACHETTE,
            Supérieur.
8
    https://github.com/elexis-eu/lexonomy
9
    https://lov.linkeddata.es/dataset/lov/
                                                                                                          13
[10]Ramos, M. (2020). Knowledge Organization and Terminology: application to Cork. Lisboa:
    NOVA FCHS; LISTIC - Université Savoie Mont Blanc. Obtido de
    http://hdl.handle.net/10362/111722
[11]Ramos, M. (2020). OntoCork. NOVA FCSH. doi:https://doi.org/10.34619/a27q-1ryd
[12]W3C. (2004). OWL Web Ontology Language Reference. (M. Dean, & G. Schreiber, Eds.)
    Retrieved from W3C Recommendation 10 February 2004: https://www.w3.org/TR/owl-ref/




                                                                                       14