=Paper= {{Paper |id=Vol-1718/paper5 |storemode=property |title=Design of a Extraction System for Definitional Contexts from Biomedical Corpora |pdfUrl=https://ceur-ws.org/Vol-1718/paper5.pdf |volume=Vol-1718 |authors=César Aguilar,Olga Acosta |dblpUrl=https://dblp.org/rec/conf/ijcai/AguilarA16 }} ==Design of a Extraction System for Definitional Contexts from Biomedical Corpora == https://ceur-ws.org/Vol-1718/paper5.pdf
    Design of a Extraction System for Definitional Contexts from Biomedical Corpora
                                         César Aguilar and Olga Acosta
                            Pontificia Universidad Católica de Chile, Santiago de Chile

                                                  caguilara@uc.cl
                                    Cognitiva Latinoamérica, Santiago de Chile

                                               oacosta@cognitiva.la

                          Abstract                                [In general Discursive Pattern], the [paraprofessional workers Term
                                                                  + Typographical Pattern] [are defined as Verbal Phrase] [those persons
     In this paper we show a general advance about the
     desgin of a methodology for extracting definitional          who are engaged in the provision of social care or social
     contexts from corpus of biomedicine in Spanish,              services, but who do not have professional training or
     taking into account a set of processes performed by          qualifications Definition]
     the following modules: (i) a term extractor based in        According to this example, the term paraprofessional
     a hybrid method, (ii) a set of verbs that configure         workers is emphasized by the use of bold font; the verbal
     the syntactic structure of a definitional context, (iii)    phrase are defined as links the term paraprofessional
     a chunker able to recognize those noun phrases that         workers to the actual definition those persons who are
     introduce a definition, considering the lexical             engaged... The term, the verbal phrase and the definition are
     relation of hyponymy/hypernymy, where the                   discursive units introduced by the pragmatic pattern in
     hyponym is the term defined, and the hypernym is            general.
     the Genus Term which represents a conceptual                   We conceive our method considering three central tasks:
     category associated with such term.
                                                                       A term extraction that recognizes candidates to
1    Introduction                                                       terms using a hybrid method based grammatical
It is not surprising that, given the overwhelming amount of             rules and stochastic techniques [Acosta, Aguilar
biomedical knowledge recorded in physical and electronic                and Infante, 2015].
texts, currently there is an interest for developing semantics         The use of a set of verbs that configure some
resources and tools oriented to improve the search and                  specific kind of verbal phrase, called predicative
classification of biomedical concepts. Projects such as Gene            phrases [Rothstein, 1983; Bowers, 1993; 2001],
Ontology [Smith et al., 2005], or BioText Search Engine                 whose function is to link terms and definitions in
[Hearst et al., 2007] are good examples of systems capable              a DC.
to extract and organize concepts, taking into account lexical-         The identification of lexical relations, particular-
semantic relationships expressed in natural language.                   ly hyponymy/hyperonymy relations, in order to
   Most of these projects have been developed for English,
                                                                        detect candidate to analytical (or Aristotelian)
having in mind the big amount of documents produced. A
paradigmatic example is PubMed, a search engine with                    definitions, following the method proposed by
accessing primarily the MEDLINE database of references                  Hearts [1992], Wilks, Slator and Guthrie [1996],
and abstracts on biomedical topics. PubMed has been used                as well as Acosta, Sierra and Aguilar [2011;
in experiments oriented to the automatic classification of              2015].
concepts extracted from large-corpora [Smith et al., 2005].            Our paper is organized as follow: in the section 2
   However, in Latin America, including Chile, there are no             we describe in more detail the extraction of DCs
such projects in NLP. In order to fill this gap, we sketch              from specialized corpora, attending the role of
here a method for extracting definitional contexts                      the predicative phrases (henceforth, PrPs) as
(abbreviated DCs), which are discursive structures that                 grammatical linker among terms and definitions.
contain relevant information to define a term. A DC has at              Then, in section 3, we briefly explain our term
least three constituents: a term, a definition, and a verbal
                                                                        extractor, and show some results generated
phrase that links both previous. Concurrently, we can
identify other linguistic or metalinguistic units, whose                searching biomedical terms in Spanish. In sec-
function is to highlight the presence of a DC in a text, e.g.:          tion 4, we show and describe a set of verbs that
discursive and typographical patterns [Sierra et al., 2008;             syntactically work as head of PrPs, and introduce
Acosta, Sierra and Aguilar, 2011]. An example is:                       analytical definitions in a DC. In section 5 we
                                                                        expose of methodology employed for identify
       hyponyms and hyperonyms expressed in a bio-               3 Term Extraction
       medical Spanish documents, specifically situated
                                                                 We have developed a methodology for extracting single-
       in DCs.
                                                                 word and multi-word terms from text-corpora, reported in
                                                                 Acosta, Aguilar and Infante (2015). Such methodology is
                                                                 supported for a hybrid approach, which including both a
2 Extraction of DCs                                              linguistic and a statistical phases.
The development of methods and electronic tools for                 In the linguistic part, the most frequent syntactic patterns
extracting conceptual information from texts has become an       are used to filter out candidate terms while, at the same
important task in NLP, mainly related with computational         time, removing non-relevant words from these candidates.
lexicography [Wilks, Slator and Guthrie, 1996],                  In the statistical part, a corpus comparison approach is used
terminology [Malaisé, Zweigenbaum and Bachimont, 2005]           to rank domain words [Kit and Liu, 2008]. A word
and, in recent years, the building of ontologies [Navigli and    occurring in both the reference and the domain corpus is
Velardi, 2004; Velardi, Faralli and Navigli, 2013].              ranked using relative frequency ratio [Manning and Schütze,
Reviewing in detail the criteria used to perform this type of    1999]. Given that words closely related with a domain
extraction, we can recognize three ideas in common:              should have a higher occurrence probability in that domain
                                                                 than in a reference corpus, we view a large reference corpus
      Concepts are represented, in a natural language, by       as an effective method for assigning relevance to domain
       words, phrases or sentences. Thus, a definition is a      words occurring in both corpora. If this ranking process is
       linguistic structure useful for expressing this con-      effective, the domain words will have higher weights than
       ceptual information [Sierra et al, 2008].                 words not related to the domain.
      If definitions are linguistic representations of con-        For determining what word is a good candidate of term,
       cepts, then it is possible to recognize regular pat-      we consider the notions of termhood and unithood proposed
       terns in lexical, syntactic, semantic and discursive      by Kageura and Umino [1996]. The termhood is described
       levels [Wilks, Slator and Guthrie, 1996].                 as the degree that a linguistic unit is related to domain-
      The use of statistical methods and computational          specific concepts. In contrast, the unithood refers to the
       tools for searching and extracting these regular          strength of syntagmatic combinations and collocations
       patterns in large corpora. Therefore, the results are     which can be recognized as potential candidates to terms.
       evaluated in order to determine if such patterns             Thus, in the final stage, the word ranking can be used to
                                                                 extract multi-word candidate terms, so that words with high
       represent good or bad candidates to definitions
                                                                 weights will contribute to increase the ranking of noun
       [Malaisé, Zweigenbaum and Bachimont, 2005].
                                                                 phrases when they are present (multi-word termhood). In the
In line with these works and ideas, Sierra et al. [2008]         case of the unithood, we consider this to be assured in part
delineate a method for recognizing and extracting terms and      for a syntactic filter [Vivaldi and Rodríguez, 2007] and the
definitions expressed in DCs. As we have mentioned before,       occurrence frequency of the noun phrase as a whole.
terms, PrPs and definitions configure the core of a DC,          Additionally, we propose implementing linguistic heuristics
because these units show a recurrent use in specialized          for automatically build a stopword list of non-relevant
documents. Additionally, discursive and typographical            adjectives from the domain corpus. This latter is relevant
patterns could be seen as optional units whose function is to    since adjectives (primarily relational adjectives) have a
introduce or indicate a potential DC in a text. We can           compositional interpretation so that traditional measures
represent the relation between all these units in this scheme:   (e.g., mutual information) fail in the task of showing the
                                                                 unithood of multi-word candidates.
                                                                    We put attention in the terms represented for noun
                                                                 phrases (NPs) whose modifier is a relational adjective,
                                                                 because they assign a set of properties derived from an
                                                                 entity. In biomedical terminology, relational adjectives
                                                                 represent an important element for building specialized
        Figure 1, constitutive units of a DC structure
                                                                 terms, e.g.: inguinal hernia, venereal disease, psychological
                                                                 disorder and others. For extracting these NPs with
Having in mind this scheme, our proposal for extracting
                                                                 relational adjectives, we build a chunker that distinguishes
DCs in biomedical texts considers the identification of the
main units, that is: terms, PrPs and definitions. Each unit in   the following patterns:
analyzed for a particular module, and the integration of all                             
modules configures the architecture of our extraction
system.                                                                                  
                                                                                 
Where RG, AQ and VAE tags correspond to adverbs,                      Table 1, percentages of precision in the extraction of
adjectives and the verb estar (Eng. To Be), respectively. The     terms using the adjective filter taken from reference corpus
tags  correspond to determinants,
pronouns, punctuation signs and prepositions. The                                       LLR     RD     RFR     TS
expression  is a restriction to reduce                        500      74.2    76.4     79    33.2
noise, since elements wrongly tagged as adjectives are                         1000     66.4    70.5   72.7    28.9
                                                                               1500     58.9    64.7   67.3    24.6
extracted without this constraint. These tags are part of the
                                                                               2000     53.9    64.5   60.7    18.7
system of annotation proposed for FreeLing (Carreras et al.,                   2500     50.1    63.8   56.6    14.9
2004), which we have employed for tagging two corpora:                         3000     48.4    60.1   53.8    12.4
                                                                               3500     48.6    53.6   53.3
        A domain corpus composed for texts about hu-                          4000     49.4    48.6   49.5
         man body diseases and related topics (surgeries,                      4500     44.0    44.0   44.0
         treatments, and so on) collected from Med-                            5000     39.6    39.6   39.6
         linePlus in Spanish. The size of this corpus is 1.2
         million tokens.                                              Table 2, percentages of recall in the extraction of terms
        A reference corpus conformed for news and arti-              using the adjective filter taken from reference corpus
         cles extracted from an online newspaper1 from
         2014. The size of this corpus is about 5 millions                              LLR     RD     RFR     TS
         of tokens.                                                             500     16.5    17.0   17.5    7.4
                                                                                1000    29.5    31.3   32.3   12.8
Using these chunker and patterns we perform an experi-                          1500    39.2    43.1   44.8   16.4
ment for identifying terms, comparing whit four measures                        2000    47.8    57.3   53.9   16.6
proposed by the following works:                                                2500    55.6    70.8   62.8   16.6
                                                                                3000    64.4    80.1   71.6   16.6
                                                                                3500    75.5    83.3   82.9
        The log-likelihood ratio implemented by Gelbuk                         4000    87.6    86.3   87.8
         et al. [2010], abbreviated as LLR.                                     4500    87.8    87.8   87.8
        The word rank difference employed by Kit and                           5000    87.8    87.8   87.8
         Liu [2008], abbreviated RD.
        The relative frequence reason, considered by             4 DCs and PrPs
         Manning y Schütze [1999], abbreviated RFR.
                                                                  In the case of PrPs, according to the analysis reported by
        Finally, a binomial approximation using the              Sierra et al. [2008], as well Aguilar, Acosta and Sierra
         standard normal distribution applied by Drouin           [2010], these phrases configure the syntactic core of a DC.
         [2003] for the TermoStat extraction system, ab-          Syntactically, all PrP is structured around a relation X-is-a-
         breviated simply TS.                                     Subject-of/Y-is-a-predicate-of. This relation is regulated by
                                                                  a syntactic rule named rule of predicate linking, proposed
From a general point of view, in our experiment an im-            by Rothstein [1983]. This rule establishes a relation of satu-
portant step is to eliminate the noise from terms removing        ration among the subject and the predicate, deriving two
the non-relevant adjectives automatically obtained from the       basic conditions:
domain corpus, as well as those words whose relative fre-
quency in the reference corpus is greater than that in the           I.    X is the subject of the predicate of Y, if X is
domain corpus.                                                             linked to Y.
   When we detect all the no-relevant adjectives, we gener-          II.   If Y is the predicate of X, then Y cannot be
ate a list as a filter for removing it, and then we can extract            predicated of anything else other than X.
those NPs with relational adjectives.
   Finally, once applied this filter, we obtained a precision     Following Rothstein’s explanation, Bowers [1993, 2001]
of around 72.7%, considering the RFR measure, and the RD          develops a simple model to describe the syntactic configura-
measure with 70.5%, specifically in the first 1000 candi-         tion of these phrases. The PrP is mapped by a functional
dates detected).                                                  head, and its grammatical behaviour is similar to that of
   On the other hand, in the case of the global recall, we ob-    phrases such as Inflexional Phrase (IP) or Complement
tained proximally 73% also in the 1000 candidates. In the         Phrase (CP).
tables 1 and 2 we show the results of our experiment, con-          Based on this description, we can infer two types of
trasting precision and recall.                                    predicative phrases: a primary predication, i.e., those
                                                                  predicative phrases conformed by a subject to the left of the
                                                                  verb, and a predicate that is located to the right of the verb:
                                                                     [Conjunctivitis [is [an inflammation             of   the
   1 La Jornada. WEB site: www.lajornada.com.mx. Mexican             conjunctiva of the eye NP] PrP] NP]
newspaper with information available online.
  In contrast, a secondary predication integrates a subject in a     0.58, and a recall of 0.83 for analytical definitions
  pre-verbal position, and an object and its predicate, both         linked to verbs used in primary predications as ser (to
  after the verb. In this case, the predicate affects the object     be), significar (to mean/to signify), and also verbs used
  of a sentence:                                                     in secondary predications as concebir (to conceive)
                                                                     definir (to define), entender (to undestand), identificar
      [Watson and Crick [define [the DNA [as a molecule              (to identify), etc. Attending the individual score of these
      [that carries the genetic instructions used in the             verbs, the most relevant are concebir (precision
      development, functioning and reproduction of all               0.71/recall 0.98) definir (precision 0.84/recall 0.98),
      known living organisms CP] PrP]NP]VP]IP]                       contrasting whit others like entender (precision
                                                                     0.36/recall 0.95), and identificar (precision 0.31/recall
  A relevant difference observed in both examples is the
  explicit mention of the author(s) of the definition in the DC.     0.90).
  According to Aguilar, Acosta and Sierra [2010], it is
  possible to determine two specific patterns:                       5   Hyponymy/hyperonymy extraction
                                                                     The results of the extraction of DCs using PrPs allow to
    (i)  A pattern that follows the sequence Term + PrP
                                                                     develop a method for recognize analytical definitions,
         + Definition, which is recognized as a primary              focusing in the detection of the Genus Term introduces
         predication.
                                                                     for the verb that works as a head of PrP. We face this
    (ii) Other pattern that follows the sequence Author +
                                                                     task of detection taking into account the prototype
         Term + PrP + Definition, which is recognized as             theory proposed by Rosch and Lloyd [1978], applied to
         a secondary predication.
                                                                     the description of categorization processes. Based on
  Taking into account such kinds of PrPs, we can identify            this theory, we can recognize a distinction among basic
  analytical definitions, assigning to its components, Genus         and subordinate categories: in the first case the single-
  Term and Differentia, a specific syntactic pattern. Thus, in       word terms represented for nouns as enfermedad
  the case of definitions associated to primary predications,        (disease), corazón (heart), sistema (system), etc., which
  the pattern is:                                                    represent basic categories, as opposed with the second
                                                                     case where multi-words terms represent subordinates
    Table 3, construction pattern for primary predication            categories: enfermedad venérea (venereal disease), paro
               linked to analytical definition                       cardiaco (heart atack), sistema nervioso (nervous
 Definition       Genus Term                 Differentia             system), and others.
 Analytical      Noun Phrase =        CP = Relative Pronoun +            We used this distintion (single-word versus multi-
 (Primary           Noun +                       IP                  word) not only for identifying terms, but also hyponyms
   PrP)           {AdjP/PP}*           PP = Preposition + NP         and hypernyms, attending the role of the relational
                                       AdjP = Adjective + NP         adjectives and the preposition de (of/from). We
                                                                     formulate a set of possible term patterns recognizible in
  In contrast, in the case of analytical definitions related to      medical documents:
  secondary predications, the construction pattern is:
                                                                                      Table 5, Term patterns
   Table 4, construction pattern for secondary predication                       Pattern                      Example
               linked to analytical definition                       Noun + Adjective (Spanish)    Enfermedad cardiovascular
 Definition     Adverb/       Genus Term         Differentia         Adjective + Noun (English)    Cardiovascular disease
               Preposition                                           Noun + Prepositional Phrase   Enfermedad de Alzheimer
Analytical    Como           NP = Noun       CP = Relative           (Spanish)                     Alzheimer's disease
(Secondary    Por            + {AdjP         Pronoun + IP            Noun + Noun                   Diabetes mellitus
PrP)                         /PP}*           PP = Preposition +      Acronyms                      VIH
                                             NP                                                    HIV
                                             AdjP = Adjective +      Noun + Letter                 Vitamina A
                                             NP                                                    Vitamin A
                                                                     Letter + Noun                 H Pylori
  The use of these patterns of PrPs for extracting terms
  and definitions has allowed to reach good results. For             In our experiments for finding hyponyms and
  example: Sierra et al. [2008], as well as Alarcón, Sierra          hypernyms, we only consider relational adjectives
  and Bach [2008] explored a specialized corpora about               [Acosta, Aguilar and Sierra, 2013; Acosta, Sierra and
  human genome and medicine (among others), integrated               Aguilar, 2011; 2015], exploring a corpus of medical
  to the system BwanaNet developed by the IULA-                     texts in Spanish, with a size of 1.3 million of words,
  UPF2, and they obtained a precision level around                  collected from MedLinePlus, the search engine of
                                                                     PubMed.
                                                                         In order to identify patterns of NPs associated to
      2 For more reference about BwanaNet, see the following link:   hypernyms and hyponims, we develop an heuristic
  http://bwananet.iula.upf.edu/index.htm                             based on the detection of relational adjetives. Thus, we
consider H as set of all single-word hyperonyms
implicit in a corpus, and F the set of the most frequent
hyperonyms in a set of candidate analytical definitions
by establishing a specific frequency threshold m:
                 F = {x  x  H, freq(x)  m}
On the other hand, NP is the set of noun phrases
representing candidate categories:
    NP = {np  head(np) F, modifier(np)  adjective}                    Figure 2, methodology for extracting subordinate categories

Subordinate categories C of a basic level b are those              We obtain a set of NPs associated to relational adjectives
holding:                                                           and its frequency. Then, the NPs with hyperonyms as head
        b                                                          are selected, and we calculate the pointwise mutual
    C       = {np  head(np) F, modifier(np) relational-         information (PMI) for each combination. Given its use in
                           adjective}                              collocation extraction, we select a PMI measure, where PMI
Where modifier (np) representing an adjective modifier             thresholds are established in order to filter non-relevant
from a noun phrase np with head b. Returning with                  (NR) information. We considered the normalized PMI
Rosch and Lloyd [1978], these subcategories show                   measure proposed by Bouma(2009):
relevant differences respect to a basic level of                   This normalized variant is due to two issues: to use
categorization.

6       Desing a system for DC extraction
In the following section, we sketch our method for                 association measures whose values have a fixed
searching DCs, integrating in modules the tasks previusly          interpretation, and to reduce sensibility to low frequencies
exposed.                                                           of data occurrence.
6.1 Methodology                                                    6.2 Corpus analysis and computational tools
We focus our efforts in analytical definitions, assuming that      As we have mentioned, our corpus is constituted for a set of
such definitions are the best source finding hyponymy-             medical documents, basically human body diseases and
hyperonymy relations. Our method started to pre-processing         related topics (surgeries, treatments, and so on), collected
a text corpus, in order to tokenize it. Then we annotate this      from MedlinePlus in Spanish. Additionally, we use NLTK
corpus with POS tags, using the TreeTagger [Schmid,                module [Bird, Klein and Loper, 2009], a set of open codes
1994].                                                             programming in Python language for analysing texts, in
   Once made it, we employ syntactical and semantic filters        order to create a chunk parser for searching candidates to
for generating the first candidates of analytical definitions.     terms and hypernyms represented for NPs.
The syntactical filter consists on a chunk grammar consider-          Integrating all the tasks exposed (the extraction of terms,
ing verb characteristics of analytical definitions, and its con-   the detection of PrPs associated to definitions, and the
textual patterns [Sierra et al., 2010], as well as syntactical     recognition of hyponyms/hypernyms), we conceive our
structure of the most common constituents such as term,            methodology having in mind the following sequence of
synonyms, and hypernyms.                                           steps:
   On the other hand, the semantic phase filters candidates
                                                                    i)   Processing a corpus and inserted POS tags for
by means of a list of noun heads indicating relations part-              starting the extraction.
whole and causal as well as empty heads semantically not            ii) Appliying the syntactic and semantics filters for
related with term defined. An additional step extracts terms             generating candidates to DCs.
and hypernyms from candidate set.                                   iii) We confirm the quality of these candidates if: (a)
   In the case of the extraction of subordinate categories, we           they are linked to a term linked to a PrP, and (b) they
consider NPs with relational adjectives as modifiers of a                introduce a hyponymy/hyperonymy relation among
term. The Figure 2 shows this process:                                   the term and the Genus Term of a definition.
                                                                     In the figure 3 we sketch our method:
                                    Figure 3, architecture of prototype system for extracting DCs
The architecture proposed here is an advance in the
identification of DCs. According to the results reported by      7   Final considerations
Acosta, Sierra and Aguilar [2015], the levels of precision
and recall increase significantly when it is included the        In this paper we have delineate a method for extracting DCs
detection of hyponyms and hypernyms, in comparison to the        from biomedical corpus in Spanish. Based on our
results showed by Alarcón, Sierra and Bach [2008]:               preliminary results, we consider that we have achieved a
                                                                 considerable improvement taking into account the role of
               Table 6, Comparison of results                    the hyponymy/hyperonymy relations as an important
                                                                 element to validate autentical analytical definitions
                                      Precision      Recall      expressed in DCs.
Alarcón, Sierra and Bach [2008]         41%           46%           This consideration allows to observe a particular relation
Acosta, Sierra and Aguilar [2015]       62%           58%        among syntactic structures and lexical-semantic information
                                                                 formulated in such definitions: on the one hand, it is not
Hypernyms, as generic classes of a domain, are expected to       enough to search DCs based only syntactic sequences,
be related to a great deal of modifiers such as relational       although such structures can be considered as an interface
adjectives reflecting more specific categories (e.g.,            for accessing such lexical-semantic information.
cardiovascular disease) than hyperonyms, or simply                  On the other hand, this task for recognizing hyponyms
sensitive descriptions to a specific context (e.g., rare         and hypernyms DCs ca be an important step for building
disease). In the table 7, we show the hypernym enfermedad        ontologies based on text information, in line with the model
(Ing. disease) and the first most related subset of 50           proposed by Buitelaar, Cimiano and Magnini [2005]. The
adjectives, taking into account its PMI values. In this          hyponymy/hyperonymy relation allows to infer a conceptual
example, only 30 out of 50 (60%) are relevant relations. In      hierarchy between terms (in our case, situated in a
total, disease is related to 132 adjectives, of which 76 (58%)   biomedical domain), according to the categorization
can be considered relevant:                                      formulated by experts of a specific area. Although it is
                                                                 necessary to explore other lexical-semantic relations (e. g.
 Table 7, First 50 adjectives linked to the noun enfermedad      synonymy of meronymy), we can start initially with the
                                                                 advances achieved by our methodology, in order to
                                                                 implement as well as possible our prototype system.

                                                                 References
                                                                 [Acosta, Sierra and Aguilar, 2011] Olga Acosta, Gerardo
                                                                    Sierra and César Aguilar. Extraction of Definitional
                                                                    Contexts using Lexical Relations. International Journal
                                                                    of Computer Applications, 34(6): 46-53, November
                                                                    2011.
                                                                 [Acosta, Aguilar and Infante, 2015] Olga Acosta, César
                                                                    Aguilar and Tomás Infante. Reconocimiento de términos
   en español mediante la aplicación de un enfoque de               Wooldridge and Jerry Ye. BioText search engine:
   comparación entre corpus. Linguamática, 7(2):19–34,              beyond abstract search. Bioinformatics, 23(16): 2196-
   December 2015.                                                   2197, August 2007.
 [Acosta, Aguilar and Sierra, 2015] Olga Acosta, César           [Kageura and Umino, 1996] Kio Kageura and Bin Umino.
   Aguilar and Gerardo Sierra. Extracting definitional              Methods of automatic term recognition: A review.
   contexts in Spanish through the identification of                Terminology, 3(2):259-289, . 1996.
   hyponymy-hyperonymy relations. In Jan Žižka and               [Kit and Liu, 2008] Chunyu Kit and Xiaoyue Liu.
   František Dařena (eds.), Modern Computational Models             Measuring mono-word termhood by rank difference via
   of Semantic Discovery in Natural Language, pages 48-             corpus comparison. Terminology, 14(2):204-229, 2008.
   70. IGI Global, Hershey, Pennsylvania, USA, 2015.
                                                                 [Malaisé, Zweigenbaum, and Bachimont, 2005] Malaisé,
[Aguilar, Acosta and Sierra, 2010] César Aguilar, Olga              Véronique, Zweigenbaum, Pierre and Bachimont, Bruno.
   Acosta and Gerardo Sierra. Recognition and extraction            Mining defining contexts to help structuring differential
   of definitional contexts in Spanish for sketching a lexical      ontologies, Terminology 11(1):21-53, 2005.
   network. In Thamar Solorio and Ted Pedersen (eds.),
   Proceedings of 1st young investigators workshop on            [Manning and Schütze, 1999] Chris Manning and Hinrich
   computational approaches to languages of the Americas,           Schütze. Foundations of Statistical Natural Language
   pages 109-116, ACL Publications, Stroudsburg, USA,               Processing. MIT Press, Cambridge, Massachusetts,
   2010.                                                            1999.
[Alarcón, Sierra and Bach, 2008] Rodrigo Alarcón, Gerardo        [Navigli and Velardi, 2004] Roberto Navigli and Paola
   Sierra and Carme Bach. ECODE: A Pattern Based                    Velardi. Learning Domain Ontologies from Document
   Approach for Definitional Knowledge Extraction. In               Warehouses and Dedicated Web Sites. Computational
   Elisenda Bernal and Janet DeCesaris (eds.), Proceedings          Linguistics, 30(2):151-179, 2004.
   of the XIII EURALEX International Congress, pages             [Rosch and Lloyd, 1978] Eleanor Rosch and Barbara Lloyd.
   923-928, IULA-UPF, Barcelona, España, 2008.                      Cognition and categorization, Erlbaum, Hillsdale, New
[Bird, Klein and Loper, 2009] Steven Bird, Ewan Klein and           Jersey, 1978.
   Edward Loper. Natural Language Processing whit                [Rothstein, 1983] Susan Rothstein, The syntax forms of
   Python. O'Reilly, Sebastropol, California, USA, 2009.            predication, Ph. D. Thesis, MIT, Cambridge,
[Bowers, 2001] John Bowers. The syntax of predication,              Massachusetts, 1983.
   Linguistic Inquiry, 24(4):591-636, 1993.                      [Schmid, 1994] Helmut Schmid. Probabilistic Part-of-
[Bowers, 1993] John Bowers, Predication. In Mark Baltin             Speech Tag-ging Using Decision Trees. In Proceedings
   and Chris Collins (eds.), The Handbook of                        of International Conference of New Methods in Lan-
   Contemporary Syntactic Theory. Blackwell, Oxford,                guage. Manchester, UK, 1994. WEB Site: www.cis.uni-
   UK:299-333.                                                      muenchen.de/~schmid/tools/TreeTagger/.
[Buitelaar, Cimiano and Magnini, 2005] Paul Buitelaar,           [Sierra et al., 2008] Gerardo Sierra, Rodrigo Alarcón, César
   Philipp Cimiano and Bernardo Magnini. Ontology                   Aguilar and Carme Bach. Definitional verbal patterns for
   learning from text. IOS Press, Amsterdam, The                    semantic relation extraction. Terminology, 14(1):74–98,
   Netherlands, 2005.                                               2008.
[Drouin 2003] Patrick Drouin. Term extraction using non-         [Smith et al., 2005] Barry Smith, Werner Ceusters, Bert
   technical corpora as a point of leverage. Terminology,           Klagges, Jacob Köhler, Anand Kumar, Jane Lomax,
   9(1):99-115, 2003.                                               Chris Mungall, Fabian Neuhaus, Alan L Rector and
                                                                    Cornelius Rosse. Relations in biomedical ontologies.
[Gelbuk et al., 2010] Alexander Gelbukh, Grigori Sidorov,           Genome Biology, 6 (5):R-46, 2005.
   Eduardo Lavin, y Liliana Chanona. Automatic Term
   Extraction using log-likelihood based comparison with         [Velardi, Faralli and Navigli, 2013] Paola Velardi, Stefano
   general reference corpus. In Christina Hopfe, Yacine             Faralli and Roberto Navigli. OntoLearn Reloaded: A
   Rezgui, Elisabeth Métais, Alun Preece and Haijiang Li            Graph-based Algorithm for Taxonomy Induction.
   (eds.), Natural Language Processing and Information              Computational Linguistics, 39(3):665-707, 2013.
   Systems. LNCS, pages 248-255, Springer, Berlin, 2010.         [Vivaldi and Rodríguez, 2007] Vivaldi, Jorge, y Horacio
[Hearst, 1992] Marti Hearst. Automatic acquisition of               Rodríguez. Evaluation of terms and term extraction
   hyponyms from large text corpora. In Proceedings of the          systems: A practical approach". Terminology, 13(2):225-
   Fourteenth International Conference on Computational             248, 2007.
   Linguistics, pages 539-545, Nantes, France, ACL               [Wilks, Slator and Guthrie, 1995] Yorick Wilks, Brian M.
   Publications, 1992.                                              Slator and Louise M. Guthrie. Electric words, MIT
[Hearst et al., 2007] Marti Hearst, Anna Divoli, Harendra           Press, Cambridge, Massachusetts, 1995.
   Guturu, Alex Ksikes, Preslav Nakov, Michael