<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Design of a Extraction System for Definitional Contexts from Biomedical Corpora</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>César Aguilar</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Acosta</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>In this paper we show a general advance about the desgin of a methodology for extracting definitional contexts from corpus of biomedicine in Spanish, taking into account a set of processes performed by the following modules: (i) a term extractor based in a hybrid method, (ii) a set of verbs that configure the syntactic structure of a definitional context, (iii) a chunker able to recognize those noun phrases that introduce a definition, considering the lexical relation of hyponymy/hypernymy, where the hyponym is the term defined, and the hypernym is the Genus Term which represents a conceptual category associated with such term.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>It is not surprising that, given the overwhelming amount of
biomedical knowledge recorded in physical and electronic
texts, currently there is an interest for developing semantics
resources and tools oriented to improve the search and
classification of biomedical concepts. Projects such as Gene
Ontology [Smith et al., 2005], or BioText Search Engine
[Hearst et al., 2007] are good examples of systems capable
to extract and organize concepts, taking into account
lexicalsemantic relationships expressed in natural language.</p>
      <p>Most of these projects have been developed for English,
having in mind the big amount of documents produced. A
paradigmatic example is PubMed, a search engine with
accessing primarily the MEDLINE database of references
and abstracts on biomedical topics. PubMed has been used
in experiments oriented to the automatic classification of
concepts extracted from large-corpora [Smith et al., 2005].</p>
      <p>However, in Latin America, including Chile, there are no
such projects in NLP. In order to fill this gap, we sketch
here a method for extracting definitional contexts
(abbreviated DCs), which are discursive structures that
contain relevant information to define a term. A DC has at
least three constituents: a term, a definition, and a verbal
phrase that links both previous. Concurrently, we can
identify other linguistic or metalinguistic units, whose
function is to highlight the presence of a DC in a text, e.g.:
discursive and typographical patterns [Sierra et al., 2008;
Acosta, Sierra and Aguilar, 2011]. An example is:
[In general Discursive Pattern], the [paraprofessional workers Term
+ Typographical Pattern] [are defined as Verbal Phrase] [those persons
who are engaged in the provision of social care or social
services, but who do not have professional training or
qualifications Definition]
According to this example, the term paraprofessional
workers is emphasized by the use of bold font; the verbal
phrase are defined as links the term paraprofessional
workers to the actual definition those persons who are
engaged... The term, the verbal phrase and the definition are
discursive units introduced by the pragmatic pattern in
general.</p>
      <p>We conceive our method considering three central tasks:</p>
      <p>A term extraction that recognizes candidates to
terms using a hybrid method based grammatical
rules and stochastic techniques [Acosta, Aguilar
and Infante, 2015].</p>
      <p>The use of a set of verbs that configure some
specific kind of verbal phrase, called predicative
phrases [Rothstein, 1983; Bowers, 1993; 2001],
whose function is to link terms and definitions in
a DC.</p>
      <p>
        The identification of lexical relations,
particularly hyponymy/hyperonymy relations, in order to
detect candidate to analytical (or Aristotelian)
definitions, following the method proposed by
Hearts [
        <xref ref-type="bibr" rid="ref12">1992</xref>
        ], Wilks, Slator and Guthrie [1996],
as well as Acosta, Sierra and Aguilar [2011;
2015].
      </p>
      <p>Our paper is organized as follow: in the section 2
we describe in more detail the extraction of DCs
from specialized corpora, attending the role of
the predicative phrases (henceforth, PrPs) as
grammatical linker among terms and definitions.
Then, in section 3, we briefly explain our term
extractor, and show some results generated
searching biomedical terms in Spanish. In
section 4, we show and describe a set of verbs that
syntactically work as head of PrPs, and introduce
analytical definitions in a DC. In section 5 we
expose of methodology employed for identify
hyponyms and hyperonyms expressed in a
biomedical Spanish documents, specifically situated
in DCs.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Extraction of DCs</title>
      <p>The development of methods and electronic tools for
extracting conceptual information from texts has become an
important task in NLP, mainly related with computational
lexicography [Wilks, Slator and Guthrie, 1996],
terminology [Malaisé, Zweigenbaum and Bachimont, 2005]
and, in recent years, the building of ontologies [Navigli and
Velardi, 2004; Velardi, Faralli and Navigli, 2013].
Reviewing in detail the criteria used to perform this type of
extraction, we can recognize three ideas in common:


</p>
      <p>Concepts are represented, in a natural language, by
words, phrases or sentences. Thus, a definition is a
linguistic structure useful for expressing this
conceptual information [Sierra et al, 2008].</p>
      <p>If definitions are linguistic representations of
concepts, then it is possible to recognize regular
patterns in lexical, syntactic, semantic and discursive
levels [Wilks, Slator and Guthrie, 1996].</p>
      <p>The use of statistical methods and computational
tools for searching and extracting these regular
patterns in large corpora. Therefore, the results are
evaluated in order to determine if such patterns
represent good or bad candidates to definitions
[Malaisé, Zweigenbaum and Bachimont, 2005].</p>
      <p>In line with these works and ideas, Sierra et al. [2008]
delineate a method for recognizing and extracting terms and
definitions expressed in DCs. As we have mentioned before,
terms, PrPs and definitions configure the core of a DC,
because these units show a recurrent use in specialized
documents. Additionally, discursive and typographical
patterns could be seen as optional units whose function is to
introduce or indicate a potential DC in a text. We can
represent the relation between all these units in this scheme:
Having in mind this scheme, our proposal for extracting
DCs in biomedical texts considers the identification of the
main units, that is: terms, PrPs and definitions. Each unit in
analyzed for a particular module, and the integration of all
modules configures the architecture of our extraction
system.</p>
    </sec>
    <sec id="sec-3">
      <title>Term Extraction</title>
      <p>We have developed a methodology for extracting
singleword and multi-word terms from text-corpora, reported in
Acosta, Aguilar and Infante (2015). Such methodology is
supported for a hybrid approach, which including both a
linguistic and a statistical phases.</p>
      <p>In the linguistic part, the most frequent syntactic patterns
are used to filter out candidate terms while, at the same
time, removing non-relevant words from these candidates.
In the statistical part, a corpus comparison approach is used
to rank domain words [Kit and Liu, 2008]. A word
occurring in both the reference and the domain corpus is
ranked using relative frequency ratio [Manning and Schütze,
1999]. Given that words closely related with a domain
should have a higher occurrence probability in that domain
than in a reference corpus, we view a large reference corpus
as an effective method for assigning relevance to domain
words occurring in both corpora. If this ranking process is
effective, the domain words will have higher weights than
words not related to the domain.</p>
      <p>For determining what word is a good candidate of term,
we consider the notions of termhood and unithood proposed
by Kageura and Umino [1996]. The termhood is described
as the degree that a linguistic unit is related to
domainspecific concepts. In contrast, the unithood refers to the
strength of syntagmatic combinations and collocations
which can be recognized as potential candidates to terms.</p>
      <p>Thus, in the final stage, the word ranking can be used to
extract multi-word candidate terms, so that words with high
weights will contribute to increase the ranking of noun
phrases when they are present (multi-word termhood). In the
case of the unithood, we consider this to be assured in part
for a syntactic filter [Vivaldi and Rodríguez, 2007] and the
occurrence frequency of the noun phrase as a whole.
Additionally, we propose implementing linguistic heuristics
for automatically build a stopword list of non-relevant
adjectives from the domain corpus. This latter is relevant
since adjectives (primarily relational adjectives) have a
compositional interpretation so that traditional measures
(e.g., mutual information) fail in the task of showing the
unithood of multi-word candidates.</p>
      <p>We put attention in the terms represented for noun
phrases (NPs) whose modifier is a relational adjective,
because they assign a set of properties derived from an
entity. In biomedical terminology, relational adjectives
represent an important element for building specialized
terms, e.g.: inguinal hernia, venereal disease, psychological
disorder and others. For extracting these NPs with
relational adjectives, we build a chunker that distinguishes
the following patterns:
&lt;RG&gt;&lt;AQ&gt;
&lt;VAE&gt;&lt;AQ&gt;
&lt;D.*|P.*|F.*|S.*&gt;&lt;AQ&gt;&lt;NC&gt;






Where RG, AQ and VAE tags correspond to adverbs,
adjectives and the verb estar (Eng. To Be), respectively. The
tags &lt;D.*|P.*|F.*|S.*&gt; correspond to determinants,
pronouns, punctuation signs and prepositions. The
expression &lt;D.*|P.*|F.*|S.*&gt; is a restriction to reduce
noise, since elements wrongly tagged as adjectives are
extracted without this constraint. These tags are part of the
system of annotation proposed for FreeLing (Carreras et al.,
2004), which we have employed for tagging two corpora:
A domain corpus composed for texts about
human body diseases and related topics (surgeries,
treatments, and so on) collected from
MedlinePlus in Spanish. The size of this corpus is 1.2
million tokens.</p>
      <p>A reference corpus conformed for news and
articles extracted from an online newspaper1 from
2014. The size of this corpus is about 5 millions
of tokens.</p>
      <p>Using these chunker and patterns we perform an
experiment for identifying terms, comparing whit four measures
proposed by the following works:</p>
      <p>The log-likelihood ratio implemented by Gelbuk
et al. [2010], abbreviated as LLR.</p>
      <p>The word rank difference employed by Kit and
Liu [2008], abbreviated RD.</p>
      <p>The relative frequence reason, considered by
Manning y Schütze [1999], abbreviated RFR.</p>
      <p>
        Finally, a binomial approximation using the
standard normal distribution applied by Drouin
[
        <xref ref-type="bibr" rid="ref10">2003</xref>
        ] for the TermoStat extraction system,
abbreviated simply TS.
      </p>
      <p>From a general point of view, in our experiment an
important step is to eliminate the noise from terms removing
the non-relevant adjectives automatically obtained from the
domain corpus, as well as those words whose relative
frequency in the reference corpus is greater than that in the
domain corpus.</p>
      <p>When we detect all the no-relevant adjectives, we
generate a list as a filter for removing it, and then we can extract
those NPs with relational adjectives.</p>
      <p>Finally, once applied this filter, we obtained a precision
of around 72.7%, considering the RFR measure, and the RD
measure with 70.5%, specifically in the first 1000
candidates detected).</p>
      <p>On the other hand, in the case of the global recall, we
obtained proximally 73% also in the 1000 candidates. In the
tables 1 and 2 we show the results of our experiment,
contrasting precision and recall.</p>
      <p>1 La Jornada. WEB site: www.lajornada.com.mx. Mexican
newspaper with information available online.
In the case of PrPs, according to the analysis reported by
Sierra et al. [2008], as well Aguilar, Acosta and Sierra
[2010], these phrases configure the syntactic core of a DC.
Syntactically, all PrP is structured around a relation
X-is-aSubject-of/Y-is-a-predicate-of. This relation is regulated by
a syntactic rule named rule of predicate linking, proposed
by Rothstein [1983]. This rule establishes a relation of
saturation among the subject and the predicate, deriving two
basic conditions:</p>
      <p>I. X is the subject of the predicate of Y, if X is
linked to Y.</p>
      <p>II. If Y is the predicate of X, then Y cannot be
predicated of anything else other than X.</p>
      <p>
        Following Rothstein’s explanation, Bowers [
        <xref ref-type="bibr" rid="ref7 ref8">1993, 2001</xref>
        ]
develops a simple model to describe the syntactic
configuration of these phrases. The PrP is mapped by a functional
head, and its grammatical behaviour is similar to that of
phrases such as Inflexional Phrase (IP) or Complement
Phrase (CP).
      </p>
      <p>Based on this description, we can infer two types of
predicative phrases: a primary predication, i.e., those
predicative phrases conformed by a subject to the left of the
verb, and a predicate that is located to the right of the verb:
[Conjunctivitis [is [an inflammation
conjunctiva of the eye NP] PrP] NP]
of the
In contrast, a secondary predication integrates a subject in a
pre-verbal position, and an object and its predicate, both
after the verb. In this case, the predicate affects the object
of a sentence:
[Watson and Crick [define [the DNA [as a molecule
[that carries the genetic instructions used in the
development, functioning and reproduction of all
known living organisms CP] PrP]NP]VP]IP]
A relevant difference observed in both examples is the
explicit mention of the author(s) of the definition in the DC.
According to Aguilar, Acosta and Sierra [2010], it is
possible to determine two specific patterns:
(i) A pattern that follows the sequence Term + PrP
+ Definition, which is recognized as a primary
predication.
(ii) Other pattern that follows the sequence Author +
Term + PrP + Definition, which is recognized as
a secondary predication.</p>
      <p>Taking into account such kinds of PrPs, we can identify
analytical definitions, assigning to its components, Genus
Term and Differentia, a specific syntactic pattern. Thus, in
the case of definitions associated to primary predications,
the pattern is:
Definition
In contrast, in the case of analytical definitions related to
secondary predications, the construction pattern is:
The use of these patterns of PrPs for extracting terms
and definitions has allowed to reach good results. For
example: Sierra et al. [2008], as well as Alarcón, Sierra
and Bach [2008] explored a specialized corpora about
human genome and medicine (among others), integrated
to the system BwanaNet developed by the
IULAUPF2, and they obtained a precision level around
2 For more reference about BwanaNet, see the following link:
http://bwananet.iula.upf.edu/index.htm
0.58, and a recall of 0.83 for analytical definitions
linked to verbs used in primary predications as ser (to
be), significar (to mean/to signify), and also verbs used
in secondary predications as concebir (to conceive)
definir (to define), entender (to undestand), identificar
(to identify), etc. Attending the individual score of these
verbs, the most relevant are concebir (precision
0.71/recall 0.98) definir (precision 0.84/recall 0.98),
contrasting whit others like entender (precision
0.36/recall 0.95), and identificar (precision 0.31/recall
0.90).
5</p>
    </sec>
    <sec id="sec-4">
      <title>Hyponymy/hyperonymy extraction</title>
      <p>The results of the extraction of DCs using PrPs allow to
develop a method for recognize analytical definitions,
focusing in the detection of the Genus Term introduces
for the verb that works as a head of PrP. We face this
task of detection taking into account the prototype
theory proposed by Rosch and Lloyd [1978], applied to
the description of categorization processes. Based on
this theory, we can recognize a distinction among basic
and subordinate categories: in the first case the
singleword terms represented for nouns as enfermedad
(disease), corazón (heart), sistema (system), etc., which
represent basic categories, as opposed with the second
case where multi-words terms represent subordinates
categories: enfermedad venérea (venereal disease), paro
cardiaco (heart atack), sistema nervioso (nervous
system), and others.</p>
      <p>We used this distintion (single-word versus
multiword) not only for identifying terms, but also hyponyms
and hypernyms, attending the role of the relational
adjectives and the preposition de (of/from). We
formulate a set of possible term patterns recognizible in
medical documents:
In our experiments for finding hyponyms and
hypernyms, we only consider relational adjectives
[Acosta, Aguilar and Sierra, 2013; Acosta, Sierra and
Aguilar, 2011; 2015], exploring a corpus of medical
texts in Spanish, with a size of 1.3 million of words,
collected from MedLinePlus, the search engine of
PubMed.</p>
      <p>In order to identify patterns of NPs associated to
hypernyms and hyponims, we develop an heuristic
based on the detection of relational adjetives. Thus, we
consider H as set of all single-word hyperonyms
implicit in a corpus, and F the set of the most frequent
hyperonyms in a set of candidate analytical definitions
by establishing a specific frequency threshold m:</p>
      <p>F = {x  x  H, freq(x)  m}
On the other hand, NP is the set of noun phrases
representing candidate categories:</p>
      <p>NP = {np  head(np) F, modifier(np)  adjective}
Subordinate categories C of a basic level b are those
holding:</p>
      <p>C
b
= {np  head(np) F, modifier(np)
relationaladjective}
Where modifier (np) representing an adjective modifier
from a noun phrase np with head b. Returning with
Rosch and Lloyd [1978], these subcategories show
relevant differences respect to a basic level of
categorization.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Desing a system for DC extraction</title>
      <p>In the following section, we sketch our method for
searching DCs, integrating in modules the tasks previusly
exposed.
6.1</p>
    </sec>
    <sec id="sec-6">
      <title>Methodology</title>
      <p>We focus our efforts in analytical definitions, assuming that
such definitions are the best source finding
hyponymyhyperonymy relations. Our method started to pre-processing
a text corpus, in order to tokenize it. Then we annotate this
corpus with POS tags, using the TreeTagger [Schmid,
1994].</p>
      <p>Once made it, we employ syntactical and semantic filters
for generating the first candidates of analytical definitions.
The syntactical filter consists on a chunk grammar
considering verb characteristics of analytical definitions, and its
contextual patterns [Sierra et al., 2010], as well as syntactical
structure of the most common constituents such as term,
synonyms, and hypernyms.</p>
      <p>On the other hand, the semantic phase filters candidates
by means of a list of noun heads indicating relations
partwhole and causal as well as empty heads semantically not
related with term defined. An additional step extracts terms
and hypernyms from candidate set.</p>
      <p>In the case of the extraction of subordinate categories, we
consider NPs with relational adjectives as modifiers of a
term. The Figure 2 shows this process:
We obtain a set of NPs associated to relational adjectives
and its frequency. Then, the NPs with hyperonyms as head
are selected, and we calculate the pointwise mutual
information (PMI) for each combination. Given its use in
collocation extraction, we select a PMI measure, where PMI
thresholds are established in order to filter non-relevant
(NR) information. We considered the normalized PMI
measure proposed by Bouma(2009):
This normalized variant is due to two issues: to use
association measures whose values have a fixed
interpretation, and to reduce sensibility to low frequencies
of data occurrence.
6.2</p>
    </sec>
    <sec id="sec-7">
      <title>Corpus analysis and computational tools</title>
      <p>As we have mentioned, our corpus is constituted for a set of
medical documents, basically human body diseases and
related topics (surgeries, treatments, and so on), collected
from MedlinePlus in Spanish. Additionally, we use NLTK
module [Bird, Klein and Loper, 2009], a set of open codes
programming in Python language for analysing texts, in
order to create a chunk parser for searching candidates to
terms and hypernyms represented for NPs.</p>
      <p>Integrating all the tasks exposed (the extraction of terms,
the detection of PrPs associated to definitions, and the
recognition of hyponyms/hypernyms), we conceive our
methodology having in mind the following sequence of
steps:
i)
ii)
iii)</p>
      <p>Processing a corpus and inserted POS tags for
starting the extraction.</p>
      <p>Appliying the syntactic and semantics filters for
generating candidates to DCs.</p>
      <p>We confirm the quality of these candidates if: (a)
they are linked to a term linked to a PrP, and (b) they
introduce a hyponymy/hyperonymy relation among
the term and the Genus Term of a definition.</p>
      <p>In the figure 3 we sketch our method:
The architecture proposed here is an advance in the
identification of DCs. According to the results reported by
Acosta, Sierra and Aguilar [2015], the levels of precision
and recall increase significantly when it is included the
detection of hyponyms and hypernyms, in comparison to the
results showed by Alarcón, Sierra and Bach [2008]:
Hypernyms, as generic classes of a domain, are expected to
be related to a great deal of modifiers such as relational
adjectives reflecting more specific categories (e.g.,
cardiovascular disease) than hyperonyms, or simply
sensitive descriptions to a specific context (e.g., rare
disease). In the table 7, we show the hypernym enfermedad
(Ing. disease) and the first most related subset of 50
adjectives, taking into account its PMI values. In this
example, only 30 out of 50 (60%) are relevant relations. In
total, disease is related to 132 adjectives, of which 76 (58%)
can be considered relevant:
In this paper we have delineate a method for extracting DCs
from biomedical corpus in Spanish. Based on our
preliminary results, we consider that we have achieved a
considerable improvement taking into account the role of
the hyponymy/hyperonymy relations as an important
element to validate autentical analytical definitions
expressed in DCs.</p>
      <p>This consideration allows to observe a particular relation
among syntactic structures and lexical-semantic information
formulated in such definitions: on the one hand, it is not
enough to search DCs based only syntactic sequences,
although such structures can be considered as an interface
for accessing such lexical-semantic information.</p>
      <p>On the other hand, this task for recognizing hyponyms
and hypernyms DCs ca be an important step for building
ontologies based on text information, in line with the model
proposed by Buitelaar, Cimiano and Magnini [2005]. The
hyponymy/hyperonymy relation allows to infer a conceptual
hierarchy between terms (in our case, situated in a
biomedical domain), according to the categorization
formulated by experts of a specific area. Although it is
necessary to explore other lexical-semantic relations (e. g.
synonymy of meronymy), we can start initially with the
advances achieved by our methodology, in order to
implement as well as possible our prototype system.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Acosta, Sierra and Aguilar</source>
          , 2011]
          <string-name>
            <given-names>Olga</given-names>
            <surname>Acosta</surname>
          </string-name>
          , Gerardo Sierra and
          <string-name>
            <given-names>César</given-names>
            <surname>Aguilar</surname>
          </string-name>
          .
          <article-title>Extraction of Definitional Contexts using Lexical Relations</article-title>
          .
          <source>International Journal of Computer Applications</source>
          ,
          <volume>34</volume>
          (
          <issue>6</issue>
          ):
          <fpage>46</fpage>
          -
          <lpage>53</lpage>
          ,
          <year>November 2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Acosta, Aguilar and Infante</source>
          , 2015]
          <string-name>
            <given-names>Olga</given-names>
            <surname>Acosta</surname>
          </string-name>
          , César Aguilar and
          <string-name>
            <given-names>Tomás</given-names>
            <surname>Infante</surname>
          </string-name>
          .
          <article-title>Reconocimiento de términos en español mediante la aplicación de un enfoque de comparación entre corpus</article-title>
          .
          <source>Linguamática</source>
          ,
          <volume>7</volume>
          (
          <issue>2</issue>
          ):
          <fpage>19</fpage>
          -
          <lpage>34</lpage>
          ,
          <year>December 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Acosta, Aguilar and Sierra</source>
          , 2015]
          <string-name>
            <given-names>Olga</given-names>
            <surname>Acosta</surname>
          </string-name>
          , César Aguilar and
          <string-name>
            <given-names>Gerardo</given-names>
            <surname>Sierra</surname>
          </string-name>
          .
          <article-title>Extracting definitional contexts in Spanish through the identification of hyponymy-hyperonymy relations</article-title>
          .
          <source>In Jan Žižka and František Dařena (eds.)</source>
          ,
          <source>Modern Computational Models of Semantic Discovery in Natural Language</source>
          , pages
          <fpage>48</fpage>
          -
          <lpage>70</lpage>
          . IGI Global, Hershey, Pennsylvania, USA,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Aguilar, Acosta and Sierra</source>
          , 2010]
          <string-name>
            <given-names>César</given-names>
            <surname>Aguilar</surname>
          </string-name>
          , Olga Acosta and
          <string-name>
            <given-names>Gerardo</given-names>
            <surname>Sierra</surname>
          </string-name>
          .
          <article-title>Recognition and extraction of definitional contexts in Spanish for sketching a lexical network</article-title>
          .
          <source>In Thamar Solorio and Ted Pedersen (eds.)</source>
          ,
          <source>Proceedings of 1st young investigators workshop on computational approaches to languages of the Americas</source>
          , pages
          <fpage>109</fpage>
          -
          <lpage>116</lpage>
          ,
          <string-name>
            <given-names>ACL</given-names>
            <surname>Publications</surname>
          </string-name>
          , Stroudsburg, USA,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Alarcón, Sierra and Bach</source>
          , 2008]
          <string-name>
            <given-names>Rodrigo</given-names>
            <surname>Alarcón</surname>
          </string-name>
          , Gerardo Sierra and
          <string-name>
            <given-names>Carme</given-names>
            <surname>Bach</surname>
          </string-name>
          .
          <article-title>ECODE: A Pattern Based Approach for Definitional Knowledge Extraction</article-title>
          . In Elisenda Bernal and Janet DeCesaris (eds.),
          <source>Proceedings of the XIII EURALEX International Congress</source>
          , pages
          <fpage>923</fpage>
          -
          <lpage>928</lpage>
          , IULA-UPF, Barcelona, España,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Bird, Klein and Loper</source>
          , 2009]
          <string-name>
            <given-names>Steven</given-names>
            <surname>Bird</surname>
          </string-name>
          , Ewan Klein and
          <string-name>
            <given-names>Edward</given-names>
            <surname>Loper</surname>
          </string-name>
          .
          <source>Natural Language Processing whit Python. O'Reilly</source>
          , Sebastropol, California, USA,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Bowers</source>
          , 2001
          <string-name>
            <given-names>] John</given-names>
            <surname>Bowers</surname>
          </string-name>
          .
          <article-title>The syntax of predication</article-title>
          ,
          <source>Linguistic Inquiry</source>
          ,
          <volume>24</volume>
          (
          <issue>4</issue>
          ):
          <fpage>591</fpage>
          -
          <lpage>636</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Bowers</source>
          ,
          <year>1993</year>
          ] John Bowers, Predication. In Mark Baltin and Chris Collins (eds.),
          <source>The Handbook of Contemporary Syntactic Theory. Blackwell</source>
          , Oxford, UK:
          <fpage>299</fpage>
          -
          <lpage>333</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Buitelaar, Cimiano and Magnini</source>
          , 2005]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          , Philipp Cimiano and
          <string-name>
            <given-names>Bernardo</given-names>
            <surname>Magnini</surname>
          </string-name>
          .
          <article-title>Ontology learning from text</article-title>
          . IOS Press, Amsterdam, The Netherlands,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Drouin 2003]
          <article-title>Patrick Drouin. Term extraction using nontechnical corpora as a point of leverage</article-title>
          .
          <source>Terminology</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ):
          <fpage>99</fpage>
          -
          <lpage>115</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Gelbuk et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          , Grigori Sidorov, Eduardo Lavin, y Liliana Chanona.
          <article-title>Automatic Term Extraction using log-likelihood based comparison with general reference corpus</article-title>
          . In Christina Hopfe, Yacine Rezgui, Elisabeth Métais,
          <source>Alun Preece and Haijiang Li (eds.)</source>
          ,
          <source>Natural Language Processing and Information Systems. LNCS</source>
          , pages
          <fpage>248</fpage>
          -
          <lpage>255</lpage>
          , Springer, Berlin,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>[Hearst</source>
          , 1992]
          <string-name>
            <given-names>Marti</given-names>
            <surname>Hearst</surname>
          </string-name>
          .
          <article-title>Automatic acquisition of hyponyms from large text corpora</article-title>
          .
          <source>In Proceedings of the Fourteenth International Conference on Computational Linguistics</source>
          , pages
          <fpage>539</fpage>
          -
          <lpage>545</lpage>
          , Nantes, France, ACL Publications,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Hearst et al.,
          <year>2007</year>
          ]
          <string-name>
            <given-names>Marti</given-names>
            <surname>Hearst</surname>
          </string-name>
          , Anna Divoli, Harendra Guturu, Alex Ksikes, Preslav Nakov,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Wooldridge</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jerry</given-names>
            <surname>Ye</surname>
          </string-name>
          .
          <article-title>BioText search engine: beyond abstract search</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>23</volume>
          (
          <issue>16</issue>
          ):
          <fpage>2196</fpage>
          -
          <lpage>2197</lpage>
          ,
          <year>August 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Kageura and Umino</source>
          , 1996]
          <string-name>
            <given-names>Kio</given-names>
            <surname>Kageura</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bin</given-names>
            <surname>Umino</surname>
          </string-name>
          .
          <article-title>Methods of automatic term recognition: A review</article-title>
          .
          <source>Terminology</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ):
          <fpage>259</fpage>
          -
          <lpage>289</lpage>
          , .
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Kit and Liu</source>
          , 2008]
          <string-name>
            <given-names>Chunyu</given-names>
            <surname>Kit</surname>
          </string-name>
          and
          <string-name>
            <given-names>Xiaoyue</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Measuring mono-word termhood by rank difference via corpus comparison</article-title>
          .
          <source>Terminology</source>
          ,
          <volume>14</volume>
          (
          <issue>2</issue>
          ):
          <fpage>204</fpage>
          -
          <lpage>229</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Malaisé, Zweigenbaum, and
          <string-name>
            <surname>Bachimont</surname>
          </string-name>
          , 2005] Malaisé, Véronique, Zweigenbaum, Pierre and Bachimont, Bruno.
          <article-title>Mining defining contexts to help structuring differential ontologies</article-title>
          ,
          <source>Terminology</source>
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <fpage>21</fpage>
          -
          <lpage>53</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>[Manning and Schütze</source>
          , 1999]
          <string-name>
            <given-names>Chris</given-names>
            <surname>Manning</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hinrich</given-names>
            <surname>Schütze</surname>
          </string-name>
          .
          <article-title>Foundations of Statistical Natural Language Processing</article-title>
          . MIT Press, Cambridge, Massachusetts,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Navigli and Velardi</source>
          , 2004]
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          and
          <string-name>
            <given-names>Paola</given-names>
            <surname>Velardi</surname>
          </string-name>
          .
          <article-title>Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>30</volume>
          (
          <issue>2</issue>
          ):
          <fpage>151</fpage>
          -
          <lpage>179</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>[Rosch and Lloyd</source>
          , 1978]
          <string-name>
            <given-names>Eleanor</given-names>
            <surname>Rosch</surname>
          </string-name>
          and
          <string-name>
            <given-names>Barbara</given-names>
            <surname>Lloyd</surname>
          </string-name>
          .
          <article-title>Cognition and categorization</article-title>
          , Erlbaum, Hillsdale, New Jersey,
          <year>1978</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>[Rothstein</source>
          , 1983]
          <string-name>
            <given-names>Susan</given-names>
            <surname>Rothstein</surname>
          </string-name>
          ,
          <article-title>The syntax forms of predication</article-title>
          ,
          <source>Ph. D. Thesis</source>
          , MIT, Cambridge, Massachusetts,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[Schmid</source>
          , 1994]
          <string-name>
            <given-names>Helmut</given-names>
            <surname>Schmid</surname>
          </string-name>
          .
          <article-title>Probabilistic Part-ofSpeech Tag-ging Using Decision Trees</article-title>
          .
          <source>In Proceedings of International Conference of New Methods in Language. Manchester</source>
          , UK,
          <year>1994</year>
          . WEB Site: www.cis.unimuenchen.de/~schmid/tools/TreeTagger/.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Sierra et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>Gerardo</given-names>
            <surname>Sierra</surname>
          </string-name>
          , Rodrigo Alarcón, César Aguilar and
          <string-name>
            <given-names>Carme</given-names>
            <surname>Bach</surname>
          </string-name>
          .
          <article-title>Definitional verbal patterns for semantic relation extraction</article-title>
          .
          <source>Terminology</source>
          ,
          <volume>14</volume>
          (
          <issue>1</issue>
          ):
          <fpage>74</fpage>
          -
          <lpage>98</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>[Smith</surname>
          </string-name>
          et al.,
          <year>2005</year>
          ]
          <string-name>
            <given-names>Barry</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Werner</given-names>
            <surname>Ceusters</surname>
          </string-name>
          , Bert Klagges, Jacob Köhler, Anand Kumar, Jane Lomax, Chris Mungall, Fabian Neuhaus, Alan L Rector and
          <string-name>
            <given-names>Cornelius</given-names>
            <surname>Rosse</surname>
          </string-name>
          .
          <article-title>Relations in biomedical ontologies</article-title>
          .
          <source>Genome Biology</source>
          ,
          <volume>6</volume>
          (
          <issue>5</issue>
          ):R-46,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <source>[Velardi, Faralli and Navigli</source>
          , 2013]
          <string-name>
            <given-names>Paola</given-names>
            <surname>Velardi</surname>
          </string-name>
          , Stefano Faralli and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli. OntoLearn Reloaded</surname>
          </string-name>
          :
          <article-title>A Graph-based Algorithm for Taxonomy Induction</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>665</fpage>
          -
          <lpage>707</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>[Vivaldi and Rodríguez</source>
          , 2007] Vivaldi, Jorge, y Horacio Rodríguez.
          <article-title>Evaluation of terms and term extraction systems: A practical approach"</article-title>
          .
          <source>Terminology</source>
          ,
          <volume>13</volume>
          (
          <issue>2</issue>
          ):
          <fpage>225</fpage>
          -
          <lpage>248</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>[Wilks, Slator and Guthrie</source>
          , 1995]
          <string-name>
            <given-names>Yorick</given-names>
            <surname>Wilks</surname>
          </string-name>
          ,
          <string-name>
            <surname>Brian</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Slator</surname>
          </string-name>
          and
          <string-name>
            <surname>Louise M. Guthrie</surname>
          </string-name>
          . Electric words, MIT Press, Cambridge, Massachusetts,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>