=Paper=
{{Paper
|id=Vol-2150/BARR2_paper5
|storemode=property
|title=Vicomtech at BARR2: Detecting Biomedical Abbreviations with ML Methods and Dictionary-based Heuristics
|pdfUrl=https://ceur-ws.org/Vol-2150/BARR2_paper5.pdf
|volume=Vol-2150
|authors=Montse Cuadros,Naiara Pérez,Iker Montoya,Aitor García Pablos
|dblpUrl=https://dblp.org/rec/conf/sepln/CuadrosPMP18
}}
==Vicomtech at BARR2: Detecting Biomedical Abbreviations with ML Methods and Dictionary-based Heuristics==
Vicomtech at BARR2: Detecting Biomedical
Abbreviations with ML methods and
dictionary-based heuristics
Montse Cuadros1 , Naiara Pérez1 , Iker Montoya2 , and Aitor Garcı́a Pablos1
1
Vicomtech, Paseo Mikeletegi 57, Donostia-San Sebastian, Spain
{mcuadros,nperez,agarciap}@vicomtech.org
2
I+D Lanik S.A, Donostia-San Sebastian, Spain
{iker92montu}@gmail.com
Abstract. This paper presents the system developed by Vicomtech to
participate in the Second Biomedical Abbreviation Recognition and Res-
olution (BARR2) track. For this purpose, we have used simple ma-
chine learning approaches on annotated electronic health records and
the datasets provided in the track. The machine learning approaches
have been tested individually and in combination with heuristics based
on a dictionary of biomedical abbreviations adapted for the task.
Keywords: biomedical nlp · abbreviations · machine learning · dictionary-
based approaches
1 Introduction
This paper describes Vicomtech’s participation in the Second Biomedical Ab-
breviation Recognition and Resolution (BARR2) track of the third IberEval
workshop (IberEval 2018), both in sub-tracks 1 and 2. Sub-track 1 consists in
detecting only explicit occurrences of abbreviation-definition pairs. For sub-track
2, resolution of short forms must be provided regardless whether its definition
is mentioned within the actual document. Both sub-tracks focus on clinical free
text in Spanish.
This paper is organized as follows: Section 2 presents the two tasks in more
detail; Section 3 presents our approaches to the two problems; Section 4 shows
our results; finally, Section 5 contains our concluding remarks.
2 Biomedical Abbreviation Recognition and Resolution
2nd Edition (BARR2)
The Second Biomedical Abbreviation Recognition and Resolution track[1] is or-
ganized in two sub-tasks, sub-track1 and sub-track2. Both tasks require rec-
ognizing abbreviations and acronyms in Spanish clinical texts, and providing
the correct definition for each recognized element. The difference between the
Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018)
2 M.Cuadros et al.
tasks is the number of abbreviations that each subtasks asks for and where the
definitions should originate from.
Sub-track 1 requires detecting all the abbreviations for which the definitions
are given explicitly in the document. Both the short form (i.e., the abbrevia-
tion or acronum) and the long form (i.e., the definition or description) must be
reported. For example, for the following piece of text:
"... se aplicó radiofrecuencia (RF) sobre la vı́a accesoria auriculo-ventricular
(AV) de conduccin bidireccional. Se interrumpe la taquicardia y la preexcitación,
finalizando el procedimiento. Quedó con bloqueo de rama derecha (BRD)
..."
the answer should note the 3 short forms “RF”, “AV”, and “BRD”, along with
their explicit long forms “radiofrecuencia”, “auriculo-ventricular”, and “bloqueo
de la rama derecha”, respectively:
S1888-75462014000200009-1 SHORT_FORM 1524 1526 RF SHORT-LONG
LONG_FORM 1507 1522 radiofrecuencia
S1888-75462014000200009-1 SHORT_FORM 1573 1575 AV SHORT-LONG
LONG_FORM 1551 1571 auriculo-ventricular
S1888-75462014000200009-1 SHORT_FORM 1720 1723 BRD SHORT-LONG
LONG_FORM 1695 1718 bloqueo de rama derecha
Sub-track 2 requires detecting all the abbreviations within the document,
and providing a resolution regardless their appearing explicitly in the text. The
following text excerpt contains such 2 short forms, “RMN” and “MTT”:
Se solicitó una RMN de pie izquierdo, que reveló una fractura de estrés
en el 2o MTT con callo perióstico...
The system developed for this sub-track should be able to find these two elements
and give their long forms, “resonancia magnéitca nuclear” and “metatarso”,
respectively:
S1889-836X2015000200005-2 878 881 RMN resonancia magnética
nuclear resonancia magnético nuclear
S1889-836X2015000200005-2 943 946 MTT metatarso metatarso
The organization[2] has provided a sample set, a training set and a develop-
ment set of the sizes shown in Table 1. The test set provided for evaluating the
approaches was about 10 times bigger than the other sets, containing 2879 clin-
ical tests, even though the submitted runs where eventually evaluated against a
set of the same size as the training set.
3 Methodology
This work is a continuation of [5], where several experiments were performed for
detecting and disambiguating abbreviations in electronic health records (EHR).
323
Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018)
Vicomtech at BARR2 3
Sample set Training set Development set Testing set
Clinical tests 15 318 146 220
Sub-track 1 10 287 178 239
Sub-track 2 89 4,261 1,878 3,414
Table 1. Number of documents in the different sets
In this work, a small corpus of 149 EHRs was compiled manually annotated
with 2,389 abbreviations and acronyms. These EHRs were provided by a local
hospital and belong to different clinical specialties. Of the short forms anno-
tated, 2 clinicians manually disambiguated two sets, one containing the 15th
most ambiguous forms and the other the 30th most ambiguous forms. Finally, a
dictionary of short- and long-form pairs was crafted based on [3] and the anno-
tated corpora. The present work relies on the EHR corpora and the hand-crafted
dictionary, in addition to the datasets provided by the organization of the track.
The following sections describe the approaches taken to the problems of ab-
breviation recognition (both in BARR2 sub-tracks 1 and 2), and of abbreviation
resolution in sub-track 1 (i.e., finding the explicit long form) and sub-track 2.
For the purpose of the BARR2 track, most of the effort has been put to the
problem of recognition.
3.1 Abbreviation recognition
For each sub-track, we have trained several classifiers and envisaged two extra
methods based on regular expressions and the hand-crafted dictionary in order
to improve the recall of the machine learning approaches.
Machine Learning approach Several machine learning classifiers have been
trained with Weka [4] (default settings), using the EHR dataset described above
and both the BARR2 Training sets (BARR2 TS) for sub-track 1 and sub-track
2. The same very cheap features as in [5] have been used for learning the models:
– Uppercase: whether the token is all uppercase
– Digit: whether the token contains digits
– Strange ending: whether the token has a strange ending, where a strange
ending is one that doesn’t fit to the normal ones in tokens which are not
abbreviations
– Length: token length
– Uppercase count: amount of uppercase characters in the token
– Lowercase count: amount of lowercase characters in the token
– Vowel ratio: amount of vowels in the token divided by its length
– Punctuation ratio: amount of punctuation characters in the token divided
by its length
324
Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018)
4 M.Cuadros et al.
Table 2 shows the performance of the trained classifiers in the BARR2 Devel-
opment set in terms of F1-measure. The first column refers to the models trained
on the EHRs only; the second column refers to the models trained on the BARR2
TS only; finally, the last column shows the results on learning the classifiers on
both datasets. Overall, combining both datasets yields worse results than using
the BARR2 TS alone, except with the Random Fields (RF) algorithm, which is
the best classifier obtained, followed by J48. Not surprisingly, the models trained
on EHRs only perform the worst.
EHRs BARR2 TS combined
J48 81.71 89.63 88.52
KNN1 78.97 89.69 88.42
Naive Bayes 56.44 60.05 58.49
REPtree 82.14 89.12 87.82
SMO 0.00 62.65 54.51
RF 79.45 91.14 91.34
Table 2. F1-measure of abbreviation recognition on the Development sets of sub-tracks
1 and 2, trained on 3 different corpora
Taking these results into account, the classifiers selected for the BARR2
competition have been J48 trained with BARR2 TS only and RF trained with
the combined datasets.
Pattern-based approach (Pat) This approach consists of a set of regular
expressions aiming to retrieve the abbreviations and acronyms that the ML
approach does not cover. Basically, it retrieves all the strings of upper- and
lowercase characters that have an uppercase character and are inside brackets.
That is, this approach makes sense mainly in sub-track 1. Additionally, some
tests have been carried out to try to retrieve short forms with digits too, but the
results have worsened.
Dictionary-based approach (Regex) This approach is based on the dictio-
nary introduced above and a set of rules hand-crafted after study and observation
of the abbreviations in several sets of EHR and the literature. For this work, the
dictionary developed in [5] has been refined taking in account the BARR2 Train-
ing and Development set examples. The final version of the dictionary contains
3447 unique pairs of biomedical short- and long-form pairs.
3.2 Abbreviation resolution for sub-track 1
Regarding sub-track 1, the system uses one or the combination of the Machine
Learning approach, Pattern-based approach and Dictionary-based approach to
325
Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018)
Vicomtech at BARR2 5
detect abbreviations candidates. Once the candidates are found and after check-
ing they are surrounded by brackets, an 8th n-gram window before the abbre-
viation is considered as the possible definition. This possible definition is firstly
checked against our dictionary, and if exists, we select it. Otherwise, a set of
heuristics are considered in order to determine if the text before is the defini-
tion. The heuristics are based on: 1) the capital letters of the definition and the
letters of the abbreviation in the same order or backwards, 2) the size of the
definition related to the size of the abbreviation, 3) a priority of sizes definitions
(3-ngrams > 2-ngram >4-ngram > 5-ngram ... ). The different heuristics exclude
the following ones when one is triggered. Finally if a definition is found, both
abbreviation and definitions are selected and their offsets in the original clinical
text are calculated.
3.3 Abbreviation resolution for sub-track 2
Regarding sub-track 2, the system uses one or the combination of the Machine
Learning approach and Dictionary-based approach to detect the abbreviations
candidates. For each possible candidate a definition is selected from our dictio-
nary. Finally the offsets where the abbreviation is found in the clinical text are
provided.
4 Experiments and Evaluation
Vicomtech has submitted a total of 4 systems to sub-track 1 and 4 systems to
sub-track 2. The systems rely on either one of the approaches described above
or their combinations. We have tested them with the Sample set firstly, but
then refined them by using the BARR2 Training and Development sets. Pat and
Regex individually had a lower scores regarding recall, so we have used them
only in combination with the J48 or RF classifiers.
Tables 3 and 4 show the performance of the systems submitted to sub-track
1 and sub-track 2, respectively. In both tables, Training, Development and Test
results are presented. Regarding sub-track 1, adding Pat to the classifier seems
to improve recall a little, but precision worsens accordingly. Regex does not seem
to have hardly any effect. As for sub-track 2, the J48 classifier yields a slightly
better precision and slightly worse recall than RF; in both cases, Regex improves
recall by 1-3 points but worsens precision by more.
Overall, there are no big differences between the systems submitted, and
there is a clear drop in recall in the Test dataset for all. The results seem to be
competitive, but official results of other participants in the track have not been
published at the time of writing, so no remarks can be made in the matter.
5 Concluding Remarks
In this paper we present the results of applying different machine learning ap-
proaches combined with heuristics based on pattern matching and regex based
326
Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018)
6 M.Cuadros et al.
Training Development Test
Recognition method P R F1 P R F1 P R F1
J48 94.09 83.56 88.51 96.75 84.18 90.03 88.12 74.79 80.91
J48 + Pat 90.94 87.76 89.32 93.45 88.70 91.01 87.38 75.63 81.08
J48 + Regex 94.09 83.56 88.51 96.75 84.18 90.03 88.56 74.79 81.71
J48 + Regex + Pat 90.61 87.76 89.16 93.45 88.70 91.01 88.29 76.05 81.71
Table 3. Results of sub-track 1 on the BARR2 Training, Development and Test sets
Training Development Test
Recognition method P R F1 P R F1 P R F1
J48 91.92 82.81 86.81 90.28 78.83 84.17 87.57 70.20 77.93
RF 90.46 84.20 87.22 89.71 80.06 84.61 86.41 70.44 77.61
J48 + Regex 85.58 85.82 85.70 84.75 82.63 83.67 81.58 73.36 77.25
RF + Regex 85.71 85.63 85.68 85.79 83.60 84.68 81.72 72.89 77.05
Table 4. Results of sub-track 2 on the BARR2 Training, Development and Test sets
on abbreviation dictionaries. The results show that both tasks are similar in
terms of precision, recall and F1-measure when seen from the perspective of
the presented results. However, the tasks are quite different, being two different
problems that only share partially the detection of abbreviations. Sub-track 1
aims for detecting definitions expressed in the text, and sub-track 2 aims for
having it in a dictionary. The dictionary has to be precise and sometimes fails
due to changes in the language of the abbreviation or spelling mistakes.
Additionally, there were some exceptions or different abbreviations that we
did not contemplate because the task description was not telling this such as:
S1889-836X2015000100003-1 SHORT_FORM 398 402 P1NP SHORT-LONG LONG_FORM
404 452 propéptido amino-terminal del procolágeno tipo 1
related to:
...resultado en los niveles del P1NP (propéptido amino-terminal del
procolágeno tipo 1)...
which to our first understanding was not at all the goal of sub-track1, which had
to be in the other way round.
Overall, we present a robust method for detecting abbreviations in two dif-
ferent scenarios showing similar results.
327
Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018)
Vicomtech at BARR2 7
6 Acknowledgments
This work has been supported by Vicomtech and the Spanish Ministry of Econ-
omy and Competitiveness (MINECO/FEDER, UE) under the project TUNER
(TIN2015-65308-C5-1-R).
References
1. Intxaurrondo, A., Marimon, M., Gonzalez-Agirre, A., Lopez-Martin, J.A., Ro-
driguez Betanco, H., Santamarı́a, J., Villegas, M., Krallinger, M.: Finding mentions
of abbreviations and their definitions in Spanish Clinical Cases: the BARR2 shared
task evaluation results. In: SEPLN 2018 (2018)
2. Intxaurrondo, A., de la Torre, J.C., Rodriguez Betanco, H., Marimon, M., Lopez-
Martin, J.A., Gonzalez-Agirre, A., Santamarı́a, J., Villegas, M., Krallinger, M.: Re-
sources, guidelines and annotations for the recognition, definition resolution and
concept normalization of Spanish clinical abbreviations: the BARR2 corpus. In:
SEPLN 2018 (2018)
3. Laguna, J.Y.: Diccionario de siglas médicas y otras abreviaturas, epónimos y
términos médicos relacionados con la codificación de las altas hospitalarias (2003)
4. Markov, Z., Russell, I.: An introduction to the weka data mining system. ACM
SIGCSE Bulletin 38(3), 367–368 (2006)
5. Montoya, I.: Análisis, normalización, enriquecimiento y codificación de historia
clı́nica electrónica (HCE). Master’s thesis, Konputazio Ingeniaritza eta Sistema
Adimentsuak Unibertsitate Masterra, Euskal Herriko Unibertsitatea (UPV/EHU)
(2017)
328