<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IBI-UPF at BARR-2017: learning to identify abbreviations in biomedical literature System description</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesco Ronzano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura I. Furlong</string-name>
          <email>laura.furlongg@upf.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Integrative Biomedical Informatics Group, Research Programme on Biomedical Informatics (GRIB) Hospital del Mar Medical Research Institute (IMIM) Universidad Pompeu Fabra Barcelona</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>255</fpage>
      <lpage>263</lpage>
      <abstract>
        <p>This paper presents the participation of the IBI-UPF team to the Biomedical Abbreviation Recognition and Resolution (BARR) track organized in the context of the Evaluation of Human Language Technologies for Iberian Languages 2017 (IBEREVAL). The purpose of the track was to automatically identify abbreviation-definition pairs in the abstract of biomedical articles in Spanish. By releasing a sample corpus and two collections of training documents, the organizers provided a total of 1,150 abstracts of biomedical articles, the majority of them in Spanish, manually annotated with respect to the identifications of abbreviations and the corresponding definitions. We tackled the task by implementing an approach articulated in two sequential phases. In the first one, by relying on a set of shallow linguistic features extracted from the textual contents of biomedical abstracts, we trained two token classifiers to spot sequences of one or more tokens that respectively represent abbreviations or definitions. Then, a third classifier is trained to distinguish abbreviations that are candidate short forms of a definition expressed in the same abstract sentence from other types of abbreviations. In a second phase, relations between the abbreviations and definitions previously spotted are identified by means of a set of heuristics based on structural and linguistic traits of the text of each abstract. We evaluate the first phase of our approach by considering the set of Spanish biomedical abstracts manually annotated, provided by the organizers of the BARR track.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Nowadays, automated approaches to mine biomedical texts are becoming key tools to
enable researchers, as well as any other interested actor, to effectively access to and
take advantage of the huge and rapidly growing amount of articles available on-line [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
PubMed1, the main search engine of life science and biomedical papers, currently
includes more than 27 million articles and is growing at a rate of about 7% of new
publications every [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>1 https://www.ncbi.nlm.nih.gov/pubmed/</p>
      <p>
        Abbreviations, acronyms and symbols are extensively used in biomedical texts: their
identification and correct interpretation are essential to automatically analyze this kind
of documents. Several approaches have been proposed during the last decades to extract
abbreviation-definition pairs in biomedical texts [
        <xref ref-type="bibr" rid="ref17 ref9">9, 17</xref>
        ]. Part of them are based on a
mix of pattern-matching and heuristic rules sometimes complemented by corpus
statistics [
        <xref ref-type="bibr" rid="ref14 ref2 ref20 ref22 ref23 ref7">2, 7, 14, 20, 22, 23</xref>
        ] while other ones propose hybrid systems that rely on supervised
learning approaches that are properly trained on manually annotated corpora [
        <xref ref-type="bibr" rid="ref12 ref13 ref21 ref3">3, 12, 13,
21</xref>
        ]. During the last decade, in the biomedical domain, besides scientific papers, also
clinical notes have focused several efforts towards the autormated extraction and
interpretation of abbreviations [
        <xref ref-type="bibr" rid="ref11 ref19 ref4">4, 11, 19</xref>
        ].
      </p>
      <p>
        The Biomedical Abbreviation Recognition and Resolution (BARR) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] track has
been organized in the context of the Evaluation of Human Language Technologies for
Iberian Languages (IBEREVAL 2017) in order to promote the investigation of new
approaches to identify abbreviations together with their definitions in Spanish
biomedical documents. In this paper we describe our participation (UPF-IBI team) to the BARR
track. In particular, in Section 2 we provide more details on the BARR task by
introducing some core aspects of the BARR corpus of biomedical abstracts manually annotated
with respect to abbreviations. In Section 3 we describe the set of Natural Language
Processing tools and resources we exploited to support the automated identification of
abbreviation-definition pairs in biomedical abstracts. Section 4 explains our approach
to face the BARR task. In Section 5 we provide some preliminary evaluation of our
automated abbreviation identification system by considering the training set of manually
annotated abstracts provided by BARR organizers. To conclude, in Section 6 we
summarize the key points of our BARR participation outlining future venues of research to
improve our approach.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>BARR track: task and dataset</title>
      <p>The information extraction task proposed to the participants of the BARR track consists
in the identification of abbreviations (or Short Forms, SFs) that occur in sentences of
Spanish biomedical abstracts and their association to the corresponding definitions,
referred to as Long Forms (LFs). An example of hSF, LFi pair is hTAC, Tomografa Axial
Computarizadai. Besides proposing approaches to mine the broad variety of possible
SFs that can be exploited to refer to a specific LF, BARR participants were also required
to deal with the detection of nested hSF, LFi pairs: in these pairs two or more SFs share
portions of the corresponding LFs or the LF associated to a SF is not constituted by a
consecutive sequence of words. The expression dolor oncolo´gico (DO) y no oncolo´gico
(DNO) includes two nested hSF, LFi pairs: hDO, dolor oncolo´gicoi y hDNO, dolor no
oncolo´gicoi.</p>
      <p>
        In order to train automated approaches for the detection of hSF, LFi pairs (both
simple and nested ones), BARR organizers released a sample corpus and two training
corpora globally providing 1,150 manually annotated abstracts of biomedical articles:
about 90% of these documents are Spanish texts. The evaluation of the abbreviation
extraction approaches proposed in the context of the BARR task is performed by
computing precision, recall and f1-score of each proposed approach with respect to a test
corpus that includes 600 Spanish biomedical abstracts: the extraction of entities (SFs
and LFs) and their relations are considered as two separate tasks. More details
concerning the corpus of biomedical papers released in the context of the BARR track together
with the description of how these documents have been manually annotated can be
found in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Tools and resources</title>
      <p>
        To identify SFs, LFs and their associations, we exploited a mix of machine learning
and heuristic approaches, both based on the characterization of the textual contents of
biomedical abstracts through a set of shallow linguistic and corpus-based features. We
computed these features by processing Spanish abstracts by means of the IXA Pipes
NLP tools [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]: we performed sentence splitting, tokenization, Part of Speech tagging
and constituency parsing. To process Spanish documents, IXA Pipes rely on NLP
models trained on the Spanish texts of the AnCora Corpus2. Besides linguistic analyses, we
determined the frequency of usage of abstracts’ words by relying on a word-frequency
dictionary built from a 2016 dump of the Spanish Wikipedia. We exploited the GATE
Framework [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to integrate the text mining tools just mentioned into a single pipeline.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Method</title>
      <p>Our abbreviation identification approach is composed of two sequential steps: the entity
spotting and the relation extraction phase. The first phase relies on machine learning
approaches to identify and characterize both SFs and LFs. The second phase exploits a
set of heuristics in order to refine the entities previously identified and extract relations
between SFs and LFs. We considered among the heuristics implemented in the second
phase, a set of rules properly built to automatically characterize simple cases of nested
hSF, LFi pairs. In this Section we provide a detailed description of the two phases of
our abbreviation identification approach.
4.1</p>
      <sec id="sec-4-1">
        <title>Phase 1: entity spotting</title>
        <p>The first phase of our approach aims at: (i) extracting abbreviations and LFs; (ii)
selecting, among the spotted abbreviations, the ones that are SFs and thus occur in the same
sentence of the corresponding LF.</p>
        <p>All these information extraction tasks have been performed by training distinct
token-based classifiers. In these classifiers each token is characterized by means of the
following types of features that we exploited to model both the token under
consideration and the ones included in a context window of size [ 2; 2]:
– Part of Speech;
– number of characters, including punctuations;
– percentage of uppercase, numeric and punctuation characters;
2 http://clic.ub.edu/corpus/ancora
– if the first / last char is uppercase;
– if the last char is a punctuation;
– number of repetitions of the token in the abstract;
– match of the token with one of the entries of the Dictionary of Medical
Abbreviations SEDOM3;
– frequency of the token in the Spanish Wikipedia.</p>
        <p>Each one of the types of features listed before generates five feature values for
each token: one describing the token under analysis and four characterizing respectively
the two previous tokens and two following tokens in the same sentence. We plan to
explore in our future work the influence of different window sizes on the performance
of our token-based classifiers, by considering also windows that are symmetric and
notsymmetric with respect to the token to classify. We computed token features scoped
to each sentence, thus setting as missing the feature values of the context tokens that
cannot be determined since they are outside sentence boundaries. We selected our set
of features in order to describe traits of tokens and their context that we considered
relevant to the identification and characterization of abbreviations and LFs. For instance
the presence of high percentages of uppercase letters is proper of many abbreviations.</p>
        <p>By relying on the previous set of features we build three Random Forest classifiers
respectively trained to determine:
– Abbreviation Token Classifier: if a token represents or not an abbreviations;
– Long Form Token Classifier: if a token is at the Beginning, Inside or Outside a</p>
        <p>LF;
– Abbreviation Type Classifier: if a token classified as an abbreviation by the
Abbreviation Token Classifier is a SF or represents another kind of abbreviation (e.g.
an abbreviation for which the Long Form is not provided in the same sentence).</p>
        <p>In our approach presented to the BARR track, after selecting the best subset of
features with respect to the task to perform, we trained each classifier over the whole
set of tokens of the manually annotated Spanish biomedical abstracts provided by the
BARR track organizers. Section 5 includes an initial evaluation of the performance of
our classifiers over the BARR manually annotated Spanish abstracts.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Phase 2: relation extraction</title>
        <p>Once identified SFs and LFs, in this phase we mainly implemented the following set
of heuristics to determine if a SF includes the related LF in the scope of the sentence
where it occurs:
– Long Form sanitizing heuristics:
(A.1) delete all LFs that have all tokens with a length shorter than three characters
or that does not include a noun token;
(A.2) remove the initial token from the text span of the LFs that start with an article;
3 http://www.sedom.es/diccionario/
– SF - LF relations identification heuristics:
(B.1) collect for each SF all the candidate LFs, including the LFs identified by
the classifier and the noun phrases occurring in the same sentence, not overlapping
the SF, distant from the SF at most three characters and spanning a number of
characters bigger than the number of characters of the SF. If the SF is between
parenthesis, we consider only the preceding candidate LFs;
(B.2.1) if there is only one candidate LF: if the candidate LF has been identified by
the Long Form Classifier, create a SF - LF relation. Otherwise, if it is a noun phrase
apply the SF-LF scoring function (described below) and create a SF - LF relation
if the score is greater than 0.
(B.2.2) if there is more than one candidate LF: score each candidate LF by means
of the SF-LF scoring function and chose the one with highest score, greater than
0. If there is more than one candidate LF characterized by the highest score give
precedence to the one that has been identified by the Long Form Classifier, if any,
otherwise choose one of the candidate LFs randomly.</p>
        <p>
          As mentioned in the previous procedure, we defined a SF-LF scoring function that,
given a pair of SF and candidate LF, returns a double value that is equal to 0 if the LF
is not recognized as related to the SF. Otherwise such function returns a number greater
than 0: the greater is this value, the higher we estimate that the candidate LF represents
a definition of the SF. A value equal to 1 spots a perfect match between the SF and
candidate LF. The return values of the SF-LF scoring function have been defined by
relying on the precision estimates of the SF / LF matching strategies defined by [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>We extended the SF - LF relation extraction procedure just described by means of a
set of refinement steps so as to properly deal with special cases including:
– groups of SFs like fibrosis intersticial y atrofia tubular [FI y AT].
– if no LF has been found, starting from the considered SF we try to build the LF by
matching word-initials backwards;
– if no LF has been found, if the SF matches some of the abbreviations of the
Dictionary of Medical Abbreviations SEDOM, we search for the corresponding LF
retrieved from the same Dictionary in the set of candidate LFs previously described.
This approach covers borderline cases like hCO2, Dixido de carbonoi in which it
would have been impossible to determine the SF - LF relation.</p>
        <p>We also defined a basic set of heuristics to spot cases of nested relations between
SF and LFs. We identify the eventual presence of nested relations if, after a candidate
LF two or more SFs are present before the end of the sentence or the occurrence of the
following candidate LF. If this situation occurs we exploit a set of rules based on string
matching and POS tags so as to identify the NESTED entities and the SF - NESTED
relations. In partiuclar, for each SF marked as nested candidate, we search backwards
for non consecutive words matching the initials of the same SF and including at least
one noun token.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluations and runs</title>
      <p>We evaluated the performance of the three Random Forest classifiers described in
Section 4.1 by means of a 10-fold-cross-validation over the 237,603 tokens of manually
annotated BARR abstracts (Table 1).</p>
      <p>From Table 1 we can notice that the identification and characterization of
abbreviations obtain satisfactory performance. As far as concern the identification of LFs, the
Random Forest classifier obtains a low F1-score. This drawback of the first processing
phase of our system (Section 4.1), probably related to the need to define better token
level features for LF identification, is mitigated by the second phase (Section 4.2) in
which the LFs spotted by the Long Form Token Classifier are sanitized and properly
complemented by the LF candidates retrieved by considering nominal phrases.</p>
      <p>We submitted to the BARR track three runs to the entity extraction task and three
runs to the relation extraction task (referred to as run v1, v3 and v4 in both tasks). In
each run we incrementally improved the coverage and complexity of the set of heuristics
exploited with respect to the previous one:
– run v1: initial version of out BARR abbreviation-definition extraction system,
including our implementation of the three token-based classifiers of the entity
spotting phase (see Section 4.1) and an initial implementation of the relation extraction
rules (see Section 4.2);
– run v3: with respect to the run v1, we improved the set of relation extraction rules
by including heuristics to handle the three special cases of SF - LF relation listed
at the end of Section 4.2 (groups of SFs, matching word-initials, LF retrieval from
the Dictionary of Medical Abbreviations SEDOM). Besides improving the
performance of relation extraction, these modifications allowed our system to refine
furtherly the set of entities spotted by the three token-based classifiers of the entity
spotting phase (see Section 4.1);
– run v4: with respect to the run v3, our final run (v4) adds the basic set of heuristics
that are tailored to spot cases of nested relations between SF and LFs, described in
the last part of Section 4.2.</p>
      <p>
        In Table 2 and Table 3 we provide the results of the evaluation of our BARR runs, as
computed by means of the Markyt Web tool [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. In particular, Table 2 shows the results
of the entity and relation extraction tasks for each one of our three runs, against the
training set of BARR abstracts. We can notice that each new run improves the
abbreviationdefinition extraction performance.
      </p>
      <p>A consistent evaluation of our abbreviation identification approach against the BARR
test set has not been possible due to a bug that affected our system: in our text analysis
system we exploited the version 8.4 of the GATE General Architecture for Text
Engineering that did not process the texts inside &lt;![CDATA[ .... ] sections4. As a
consequence we were not able to correctly extract abbreviations from a number of
abstracts since their text was included in &lt;![CDATA[ .... ] sections inside GATE
XML documents we used to store the results of intermediate analysis steps. This bug
has been identified and solved by releasing, in June 2017, a new version of GATE5
(version 8.4.1). We realized the presence of this bug when the BARR evaluation period was
over, by analyzing the results of our approach over the BARR test set: as a consequence,
at the time of writing, we can’t provide a bug-free evaluation of our abbreviation
identification approach against the BARR test set. Table 3 shows the results of our best run
(v4) with respect to the BARR test set. We can notice that, with respect to the
performance against the training set (Table 2), the performance of our approach on the test
set are considerably lower, probably also due to the bug previously described. Once
the BARR test set will publicly released, we plan to consistently evaluate our approach
against test data and analyze in details its performance.</p>
      <p>BARR task / run
Entity extraction (run v4)
Entity extraction (run v3)
Entity extraction (run v1)</p>
      <p>Precision Recall F1-score
0.910 0.909 0.909
0.912 0.899 0.901
0.894 0.891 0.893
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper we described our participation to the BARR track of IBEREVAL 2107
by introducing our approach to automatically identify abbreviations together with their
definitions in Spanish biomedical texts. After a brief introduction of the BARR task,
we presented the two main information extraction phases of our system. The first one
identifies and characterizes abbreviations and candidate Long Forms by means of a set
of token based classifiers. The second phase exploits a collection of heuristics to refine
the results of the the first phase and identify relations between abbreviations and Long
Forms occurring in the same sentence.</p>
      <p>As venue for future research we would like to improve our system by extending and
specializing the set of token-level features exploited to automatically extract
abbreviations and Long Forms. Moreover we would like to perform more data-based validations
and refinement cycles of our relation extraction heuristics. We also plan to evaluate the
bug-free version of our approach on the BARR test set, once this data will be publicly
released.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bermudez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigau</surname>
          </string-name>
          , G.:
          <article-title>Ixa pipeline: Efficient and ready to use multilingual nlp tools</article-title>
          .
          <source>In: LREC</source>
          . vol.
          <year>2014</year>
          , pp.
          <fpage>3823</fpage>
          -
          <lpage>3828</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takagi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Alice: an algorithm to extract abbreviations from medline</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>12</volume>
          (
          <issue>5</issue>
          ),
          <fpage>576</fpage>
          -
          <lpage>586</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <issue>3</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          , Schu¨tze, H.,
          <string-name>
            <surname>Altman</surname>
          </string-name>
          , R.B.:
          <article-title>Creating an online dictionary of abbreviations from medline</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>9</volume>
          (
          <issue>6</issue>
          ),
          <fpage>612</fpage>
          -
          <lpage>620</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chondrogiannis</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karanastasis</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andronikou</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varvarigou</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Building a repository for inferring the meaning of abbreviations used in clinical studies</article-title>
          .
          <source>J. Comput</source>
          <volume>12</volume>
          (
          <issue>1</issue>
          ),
          <fpage>76</fpage>
          -
          <lpage>88</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Text processing with gate</article-title>
          . Gateway Press CA (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>G.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tahsin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodale</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greene</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greene</surname>
            ,
            <given-names>C.S.:</given-names>
          </string-name>
          <article-title>Recent advances and emerging applications in text and data mining for biomedical discovery</article-title>
          .
          <source>Briefings in bioinformatics 17(1)</source>
          ,
          <fpage>33</fpage>
          -
          <lpage>42</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hearst</surname>
            ,
            <given-names>M.S.:</given-names>
          </string-name>
          <article-title>A simple algorithm for identifying abbreviation definitions in biomedical text (</article-title>
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Intxaurrondo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prez-Prez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prez-Rodrguez</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Martin</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santamara</surname>
            , J., de la Pea,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhondi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valencia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loureno</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The biomedical abbreviation recognition and resolution (barr) track: benchmarking, evaluation and importance of abbreviation recognition systems applied to spanish biomedical abstracts (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Islamaj</given-names>
            <surname>Dog</surname>
          </string-name>
          ˘an, R.,
          <string-name>
            <surname>Comeau</surname>
            ,
            <given-names>D.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeganova</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilbur</surname>
          </string-name>
          , W.J.:
          <article-title>Finding abbreviations in biomedical literature: three bioc-compatible modules and four bioc-formatted corpora</article-title>
          .
          <source>Database</source>
          <year>2014</year>
          , bau044 (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Intxaurrondo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Martin</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de la Pea</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prez-Prez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>PrezRodrguez</surname>
          </string-name>
          , G.,
          <string-name>
            <surname>Santamara</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhondi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loureno</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valencia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Resources for the extraction of abbreviations and terms in spanish from medical abstracts: the barr corpus, lexical resources and document collection (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ge</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mathews</surname>
            ,
            <given-names>K.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>Exploiting task-oriented resources to learn word embeddings for clinical abbreviation expansion</article-title>
          .
          <source>In: Proceedings of the 2015 Workshop on Biomedical Natural Language Processing</source>
          . pp.
          <fpage>92</fpage>
          -
          <lpage>97</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Movshovitz-Attias</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          :
          <article-title>Alignment-hmm-based extraction of abbreviations from biomedical text</article-title>
          .
          <source>In: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing</source>
          . pp.
          <fpage>47</fpage>
          -
          <lpage>55</lpage>
          . Association for Computational Linguistics (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Nadeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turney</surname>
          </string-name>
          , P.D.:
          <article-title>A supervised learning approach to acronym identification</article-title>
          .
          <source>In: Conference of the Canadian Society for Computational Studies of Intelligence</source>
          . pp.
          <fpage>319</fpage>
          -
          <lpage>329</lpage>
          . Springer (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Byrd</surname>
          </string-name>
          , R.J.:
          <article-title>Hybrid text mining for finding abbreviations and their definitions</article-title>
          .
          <source>In: Proceedings of the 2001 conference on empirical methods in natural language processing</source>
          . pp.
          <fpage>126</fpage>
          -
          <lpage>133</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. Pe´rez-Pe´rez,
          <string-name>
            <given-names>M.</given-names>
            , Pe´rez-Rodr´ıguez, G.,
            <surname>Rabal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Vazquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Oyarzabal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Fdez-Riverola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Valencia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , Lourenc¸o,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>The markyt visualisation, prediction and benchmark platform for chemical and gene entity recognition at biocreative/chemdner challenge</article-title>
          .
          <source>Database</source>
          <year>2016</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sohn</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Comeau</surname>
            ,
            <given-names>D.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilbur</surname>
          </string-name>
          , W.J.:
          <article-title>Abbreviation definition identification based on automatic precision estimates</article-title>
          .
          <source>BMC bioinformatics 9(1)</source>
          ,
          <volume>402</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Torii</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Z.z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          , Liu, H.:
          <article-title>A comparison study on algorithms of detecting long forms for short forms in biomedical text</article-title>
          .
          <source>BMC bioinformatics 8</source>
          (
          <issue>9</issue>
          ),
          <source>S5</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Vardakas</surname>
            ,
            <given-names>K.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsopanakis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poulopoulou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Falagas</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          :
          <article-title>An analysis of factors contributing to pubmed's growth</article-title>
          .
          <source>Journal of Informetrics</source>
          <volume>9</volume>
          (
          <issue>3</issue>
          ),
          <fpage>592</fpage>
          -
          <lpage>617</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Vo</surname>
            ,
            <given-names>T.N.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>T.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ho</surname>
          </string-name>
          , T.B.:
          <article-title>Abbreviation identification in clinical notes with levelwise feature engineering and supervised learning</article-title>
          .
          <source>In: Pacific Rim Knowledge Acquisition Workshop</source>
          . pp.
          <fpage>3</fpage>
          -
          <lpage>17</lpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Wren</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garner</surname>
            ,
            <given-names>H.R.</given-names>
          </string-name>
          , et al.:
          <article-title>Heuristics for identification of acronym-definition patterns within text: towards an automated construction of comprehensive acronym-definition dictionaries</article-title>
          .
          <source>Methods of information in medicine 41(5)</source>
          ,
          <fpage>426</fpage>
          -
          <lpage>434</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Using svm to extract acronyms from text</article-title>
          .
          <source>Soft Computing-A Fusion of Foundations, Methodologies and Applications</source>
          <volume>11</volume>
          (
          <issue>4</issue>
          ),
          <fpage>369</fpage>
          -
          <lpage>373</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Yamamoto</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamaguchi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bono</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takagi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Allie: a database and a search service of abbreviations and long forms</article-title>
          .
          <source>Database</source>
          <year>2011</year>
          , bar013 (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torvik</surname>
            ,
            <given-names>V.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smalheiser</surname>
            ,
            <given-names>N.R.</given-names>
          </string-name>
          :
          <article-title>Adam: another database of abbreviations in medline</article-title>
          .
          <source>Bioinformatics</source>
          <volume>22</volume>
          (
          <issue>22</issue>
          ),
          <fpage>2813</fpage>
          -
          <lpage>2818</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>