=Paper=
{{Paper
|id=Vol-2150/overview-diann-task
|storemode=property
|title=Overview of the DIANN Task: Disability Annotation Task
|pdfUrl=https://ceur-ws.org/Vol-2150/overview-diann-task.pdf
|volume=Vol-2150
|authors=Hermenegildo Fabregat,Juan Martínez-Romo,Lourdes Araujo
|dblpUrl=https://dblp.org/rec/conf/sepln/FabregatMA18
}}
==Overview of the DIANN Task: Disability Annotation Task==
Overview of the DIANN Task: Disability Annotation Task Hermenegildo Fabregat1[0000−0001−9820−2150] , Juan Martinez-Romo1,2[0000000269057051] , and Lourdes Araujo1,2[0000−0002−7657−4794] 1 Universidad Nacional de Educación a Distancia (UNED), Department of Computer Science, Juan del Rosal 16, Madrid 28040, Spain http://nlp.uned.es/ 2 IMIENS: Instituto Mixto de Investigación, Escuela Nacional de Sanidad, Monforte de Lemos 5, Madrid 28019, Spain Abstract. The DIANN task consists of the detection of disabilities in English and Spanish texts, as well as the detection of negated disabilities. The organizers have proposed a task with different elements concerning both, the language and the entities to be detected (disabilities, nega- tion and acronyms). Two evaluation criteria have also been used: exact and partial. All these options have generated a large number of results and different classifications. This overview summarizes the participation of eight teams, all of them with results for both English and Spanish, totaling 37 runs (18 for English and 19 for Spanish). Keywords: Biomedical Entity recognition · Biomedical corpus · Dis- ability recognition · Negation detection 1 Introduction Natural language processing techniques can be very useful for the biomedical domain, due to the large amount of unstructured information that it generates. There are many topics that have been addressed due to their great impact, for example the search of entities in medical texts, such as diseases, drugs and genes. A particular type of entity that has not been specifically considered is disabil- ities. There exist some tools for the annotation of medical concepts, especially in English, such as Metamap[3], and also some others that can be adapted for the annotation of some medical concepts in Spanish, such as Freeling-Med[19]. However, none of them consider terms such as disabilities as a distinctive con- cept. According to World Health Organization[18], the term disability refers to an umbrella term covering impairments, limitations of activities and restrictions on participation. The automatic processing of documents related to disabilities is an interesting research area if we take into account that, world health orga- nization estimates that about 15% of the population suffers from some kind of disability. The task of detecting disabilities is a challenge that involves difficulties such as the freestyle used to write them. They can be mentioned using specific words, such as “blindness”, and also using descriptions such as “visual impair- ment”. Disabilities can also be mentioned in the presence of negation words, as Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) 2 H. Fabregat et al. in “...had no expressive language”. Given the relevance of this problem, as well as its difficulty, the goal of DIANN’s task is to automate the mining process of research articles that mention disabilities in a multilingual scenario. The remainder of this paper is organized as follows. Section 2 presents the task. Section 3 describes the datasets we released for training and test and the evalua- tion criteria. Section 4 summarizes the proposed approaches of the participants. Section 5 presents and discusses the results. Finally, conclusions are presented in Section 6. 2 Task Description DIANN is a named entity recognition task which focuses on disability iden- tification in biomedical research texts. As far as we know, the recognition of disabilities has not been addressed previously. So far, systems oriented to the detection of named entities in biomedical texts have not treated the concept of disability as an isolated entity, categorizing it in most cases as a disease or symptom (they do not make a clear distinction between a disability and a symp- tom or disease). This task aims to deal specifically with this kind of entities. We have compiled a collection of documents in English and Spanish, which has been annotated manually by three people. Due to the ambiguity present in the disability concept, the support of expert medical staff has been necessary during the annotation process. This corpus has been used to evaluate the performance of various named entity recognition systems in two different languages, Spanish and English. In addition to disabilities, negation has been annotated when it affects one or more disabilities. The rest of the negations presented in the corpus have not been annotated. The corpus was divided into two parts, one for training and the other for test. To contextualize the problem, in addition to the training corpus, we provide a list of categories for the different disabilities identified in both Spanish and English languages. According to the scheduling specifications of the task, participants had one month to develop their systems since the publication of the training corpus. Then, we released the test set without annotations and participants had fifteen days to send their results to the task organizers. We indicated to each team of participants that they could present up to three different approaches per language. This document presents the evaluation of the different submissions in three categories (disability recognition, negated disability recognition and joint) through two different evaluation criteria, partial matching and exact matching. 3 Data and Evaluation In this section we discuss the origin and characteristics of the dataset used in this task as well as the format in which it has been presented. We also discuss the methods or criteria used to evaluate the participant systems. 2 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) Overview of the DIANN Task: Disability Annotation Task 3 3.1 Data The dataset has been collected between 2017 and 2018. DIANN’s corpus con- sists of a collection of 500 abstracts from Elsevier journal papers related to the biomedical domain. The document search process has been restricted to docu- ments with the abstract in both, English and Spanish languages, and at least contain a disability in both languages. English data Docs Sents Toks English data Dis Neg Neg-Dis Training 400 4782 70919 Training 1413 40 42 Test 100 1309 18406 Test 243 23 24 Spanish data Docs Sents Toks Spanish data Dis Neg Neg-Dis Training 400 4639 78381 Training 1326 40 41 Test 100 1284 20567 Test 229 22 23 Table 1: Number of articles (docs), Table 2: Number of disabilities (dis), sentences (sents) and tokens (toks) negations (neg) and negated disabil- in each dataset ities (neg-dis) in each dataset The DIANN corpus was divided into two disjointed parts: training set (80%) and test set (20%). Table 1 and table 2 summarizes for both languages the size of the training and test sets and the data contained in them. 3.2 Format and Distribution The dataset is structured in directories. Each folder corresponds to a specific language and contains the documents named with the associated PUBMED identifier. Each document is presented following an XML annotation format. For the disability annotations, the taghas been used: Fragile-X syndrome is an inherited form of mental retardation with a connective tissue component involving mitral valve prolapse. The negation trigger and its scope, has been annotated using the tagsand : In the patients , significant differences were obtained in terms of functional and cogni- tive status (Barthel index of 52.3438 and Pfeiffer test with an average score of 1.48 ± 3.2 (P<.001)). 3 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) 4 H. Fabregat et al. The corpus is available in the following url: https://github.com/gildofabregat/ DIANN-IBEREVAL-2018/tree/master/DIANN_CORPUS 3.3 Evaluation In addition to the exact matching and due to the freedom with which a dis- ability can be expressed, we have used a second evaluation criteria to compare the different systems. This second evaluation criteria, called partial matching, is based on the concept of core-term match introduced in [9]. To use this evalua- tion approach, as you can see below (annotation → annotation core), we have manually generated a file with the core of each annotation of the corpus. irreversible visual loss → visual loss moderate to severe dementia → dementia severe mental disorder → mental disorder For each evaluation criteria, the performance is measured with Fβ=1 rate: (β 2 + 1) ∗ precision ∗ recall Fβ = (1) β 2 ∗ precision + recall where precision is the percentage of named entities found by the system that are correct or partially correct and recall is the percentage of named entities present in the corpus that are found or partially found by the system. 4 Overview of the Submitted Approaches A total of eight teams have participated, adding up to nineteen runs for English and twenty runs for Spanish. Although each document was presented in English and Spanish, none of the participating teams has exploited bilingualism. It may be because the abstracts of both languages are not parallel. They are written by the authors (they are not automatic translations) and sometimes contain differ- ent numbers of sentences and different numbers of disabilities. In this section we explain the different approaches tested by each of the teams. – The SINAI 18[12] team proposed different approaches for each language con- sidered in the task. On the one hand, for English language, they have used Metamap and NegEx[6] to annotate concepts and to analyze negation; on the other hand, for Spanish language, they have used their own UMLS-based en- tity recognition system and, in the case of negation, they have used a method 4 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) Overview of the DIANN Task: Disability Annotation Task 5 based on bags of words. This second approach makes use of NLTK[4] to stan- dardize text and CoreNLP[13] to perform syntactic analysis. Finally, in both approaches, after the recognition of the concepts, they perform both a filter- ing process based on semantic information of the identified concepts and the calculation of the similarity of each UMLS concept filtered with the term “disability” using word2vec[15]. – IxaMed[10] team presented a pipeline composed of a combination of mul- tiple systems. First, they make use of a neural network-based architecture system for disability detection consisting of a Bidirectional Long Short Term Memory network (Bi-LSTM) and a Conditional Random Field (CRF) at the top. For English, this system uses Brown clusters[5] and word embeddings extracted from the MIMIC-III corpus[11]. For Spanish, they calculated the word-embeddings from Electronic Health Records and they did not include any Brown cluster. After disability detection, they have used a rule-based system for the detection of triggers associated with disabilities making use of a generic list of negation triggers. In addition and using a similar rule-based approach, they have designed a third module for the detection of disability- related abbreviations. Finally, and taking into account the aforementioned processes, they have designed a system based on neural networks for the identification of the negation scope. – The IXA 18 02[2] team presented one run for each language, both using the same entity recognition system. Known as ixa-pipe-nerc[1], this system aims to recognize named entities avoiding any linguistic motivated feature. The system makes use of typographic and morphological features of the text. Ixa-pipe-nerc makes use of the implementation of the Perceptron algorithm contained in the Apache OpenNLP project, incorporating the use of several features based on language representations such as Brown clusters taking the 4th, 8th, 12th and 20th node in the path; Clark clusters[7] and word2vec clusters. – The work presented by the GPLSIUA[17] team consists of the use of its own general purpose automatic learning system called CARMEN[16] for disabil- ity annotation and a dictionary-based approach to negation detection. The annotation of disabilities has been divided into two modules, the first of which deals with the process of generating candidate expressions based on the extraction of noun phrases and the second one, based on the use of CARMEN, deals with the process of determining which of the candidate expressions can be considered as a disability. CARMEN consists of a ma- chine learning system that makes use of Random Forest and is trained with syntactic and distributional features. – UPC 018 3[14] team has presented two semi-supervised approaches for the task of recognition of the named entity. The first one is a conditional random 5 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) 6 H. Fabregat et al. field model trained with syntactic features. The second one is a recurrent neural network using Bi-LSTM network and a CRF layer. As they explain, they have made use of a process to reduce possible over-fitting based on the addition of new unlabeled abstracts. Finally, to process the negation and its scope they have made use of a system called ABNER[20] based on CRF. – The system presented by the UPC 018 2[21] team makes use of a CRF to annotate named entities. This system has been trained using both syntactic and some semantic features. For this purpose, they have also used a list of terms that appeared in the annotations that occurred in the training corpus, as a result of being an attribute associated with the inclusion or not in the list extracted from each term to be analyzed. Finally, they have used a NegEx- based system for negation detection, filtering the total of annotations to those where the negation trigger is less than 4 words away from the possible negated disability. – The UC3M 018 1[22] team has submitted a proposal for each language based on the same architecture. The models presented use a two-phase architecture based on two layers of Bidirectional LSTM to capture the context informa- tion and a CRF to obtain the correlation of the information between the labels. Finally, they have jointly addressed entity detection and negation detection, making this approach a sequence to sequence (seq2seq) multi- proposal classification problem. – Finally, the LSI UNED[8] team presented an unsupervised approach to dis- ability annotation that involves a process of generating variants and using lists of disabilities and body functions. The system extracts the noun phrases and creates their possible variants. To find the best candidate and taking into account the lists mentioned above, the total number of variants is filtered according to metrics such as centrality and variation. Finally, for both, the detection of negation in Spanish and for the detection of abbreviations in both languages, the system uses post-processing based on regular expres- sions. For detection of negation in English, the system uses NegEx. 5 Results and Discussion In this section, we discuss the results for both languages based on three cate- gories3 : detection of disabilities, detection of only negated disabilities, and de- tection of both, negated and non-negated disabilities. 1. Disability recognition. These results correspond to the evaluation of the an- notations of the participants without taking into account the negation. This means that all annotated disabilities included in the dataset are evaluated regardless of whether negations have been or not correctly annotated. 3 The results of each of the following tables are sorted according to the Fβ obtained. 6 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) Overview of the DIANN Task: Disability Annotation Task 7 2. Negated disability recognition. These results correspond to the annotation of negated disabilities. That is, only disabilities that are affected by negation are taken into account in this evaluation. In addition to the success of the disability annotation, both the correctness of the negation trigger annotation and the correctness of the negation scope are taken into account. 3. Global results. Finally, these evaluation results correspond to the joint eval- uation of the annotations relating to negated disabilities and the annotations relating to non negated disabilities. Table 3 (exact matching) and the table 4 (partial matching) show the results obtained by the participants in the disability recognition task for both, Spanish (a) and English (b) languages. As can be seen, the IxaMed, UC3M 1 and UPC 3 teams have obtained the best results for the detection of disabilities in Span- ish in both, partial and exact evaluation. These systems, based on a supervised approach (or semi-supervised, in the case of UPC 3), have in common the use of CRFs, being in the case of UPC 3 R1 and R2 systems based only on CRFs and in the rest of cases systems that use CRFs in the top layer of the proposed architecture (UPC 3 R3, IxaMed R1 and UC3M 1 R1 and R2). In the English scenario, the participants have not presented significant modifications respect to the approaches proposed for Spanish. The majority of these variations are modifications of the required resources, both in supervised and unsupervised approaches. Unsupervised approaches such as the one presented by LSI UNED obtain notable improvements in the processing of documents in English, espe- cially if we take into account the results of the partial evaluation. The UPC 3 and IXA 2 teams have presented interesting solutions regarding the possible system over-fitting. The UPC 3 team has carried out a regularization process based on the incorporation of unannotated documents. If we take into ac- count that the division into test and training has been generated trying to avoid the over adjustment of the systems due to the overlap between both sets, the con- sideration of an iterative learning scheme and the inclusion of new unannotated documents in the learning phase is a practice of great interest and that seems to have provided good results. The IXA 2 team, with a Perceptron-based model, has proposed the use of a set of shallow features to avoid any possible errors that might occur when processing the dataset with automatic text processing tools. This is of great interest if we consider that in the biomedical domain a specific terminology is used (disease names, abbreviations, drug names,...) which may not be included by these automatic processing tools and which may generate an accumulation of errors during the training phase. Regarding the annotation of acronyms, only the IxaMed, UPC 2 and LSI UNED teams presented specific solutions for their annotation. Both IxaMed and LSI UNED implemented solu- tions derived from the premise that an acronym is first presented at a maximum distance of X words from a disability. This way of dealing with acronym annota- tion is dependent on the accuracy of capturing the different disabilities. On the other hand, the UPC 2 team has used a boolean attribute to deal with whether a term or expression is part of a list of acronyms. In summary, all systems pro- 7 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) 8 H. Fabregat et al. vide significant features to the task of entity detection. Regarding the results for the disability recognition task, the systems have performed better in general for English than for Spanish. In some cases, the difference between the results of the partial evaluation and the exact evaluation is very clear. However, in most cases the ranking for the systems is preserved. With regard to the processing of negation, the approaches presented, as in the previous category, have been very diverse. While systems like UC3M 1 and IXA 2 deal with negation using the same entity detection system used in the entity annotation task (Neural Networks: IXA 2 - BiLSTM+CRF: UC3M 1 - CRF: UPC 3), others have used tools such as NegEx (English: SINAI, UPC 2 and LSI UNED - Spanish: UPC 2), rule-based systems (Trigger detection for En- glish and Spanish: IxaMed - Scope recognition for Spanish: LSI and GPLSI - Scope recognition for English: GPLSI) and lexicons, word bags, and so on (Trig- ger detection for English: GPLSI - Trigger detection for Spanish: GPLSI, SINAI, LSI UNED). In most cases, the results obtained by the different systems show a strong relationship with the results of the disability detection task, with the GPLSIUA and SINAI teams in Spanish standing out. Although the systems have obtained very satisfactory results (table 5 and table 6), IxaMed, UPC 3 (R3, R1 and R2), IXA 2 (R1, R2 and R3) and UPC 2 stand out. Due to the size of the corpus and the criteria selected to consider a negation, few cases of negation have been included in the DIANN corpus, making it difficult to evaluate the significance of the negation detection in this task. Finally, table 7 and table 8 show the results by jointly evaluating both the detec- tion of disabilities and the recognition of negation. As you can see, these tables summarize the performance shown by the different systems. Due to the small number of negations, the results shown are strongly influenced by the results obtained in the detection of disabilities. 8 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) Overview of the DIANN Task: Disability Annotation Task 9 (a) (a) Spanish P R F Spanish P R F IxaMed R1 0.757 0.817 0.786 IxaMed R1 0.822 0.886 0.853 UC3M 1 R2 0.818 0.646 0.722 UC3M 1 R1 0.882 0.716 0.79 UC3M 1 R1 0.801 0.651 0.718 UC3M 1 R2 0.878 0.694 0.776 UPC 3 R2 0.807 0.603 0.69 UPC 3 R2 0.889 0.664 0.76 UPC 3 R1 0.814 0.594 0.687 UPC 3 R1 0.898 0.655 0.758 UC3M 1 R3 0.801 0.563 0.662 IXA 2 R3 0.712 0.734 0.723 IXA 2 R1 0.65 0.642 0.646 UC3M 1 R3 0.876 0.616 0.723 IXA 2 R3 0.636 0.655 0.645 IXA 2 R1 0.721 0.712 0.716 UPC 3 R3 0.67 0.603 0.634 UPC 3 R3 0.743 0.668 0.703 IXA 2 R2 0.641 0.616 0.628 IXA 2 R2 0.705 0.677 0.69 UPC 2 R1 0.732 0.502 0.596 UPC 2 R1 0.828 0.568 0.674 SINAI 1 R3 0.459 0.345 0.394 LSI UNED R2 0.847 0.533 0.654 LSI UNED R3 0.41 0.249 0.31 LSI UNED R1 0.841 0.533 0.652 LSI UNED R2 0.396 0.249 0.306 LSI UNED R3 0.842 0.511 0.636 LSI UNED R1 0.393 0.249 0.305 SINAI 1 R3 0.512 0.384 0.439 GPLSIUA 1 R1 0.813 0.17 0.282 GPLSIUA 1 R2 0.959 0.205 0.338 GPLSIUA 1 R2 0.796 0.17 0.281 GPLSIUA 1 R1 0.958 0.201 0.332 SINAI 1 R2 0.181 0.415 0.252 SINAI 1 R2 0.204 0.467 0.284 SINAI 1 R1 0.022 0.485 0.042 SINAI 1 R1 0.026 0.568 0.05 (b) (b) English P R F English P R F IxaMed R1 0.786 0.86 0.821 IxaMed R1 0.842 0.922 0.88 UC3M 1 R1 0.778 0.72 0.748 LSI UNED R3 0.856 0.761 0.806 UC3M 1 R2 0.759 0.663 0.708 UC3M 1 R1 0.822 0.761 0.791 UC3M 1 R3 0.775 0.65 0.707 LSI UNED R2 0.815 0.761 0.787 UPC 3 R1 0.799 0.605 0.689 LSI UNED R1 0.808 0.761 0.784 UPC 3 R2 0.795 0.605 0.687 UC3M 1 R2 0.835 0.728 0.778 UPC 2 R1 0.756 0.56 0.643 UC3M 1 R3 0.828 0.695 0.756 UPC 3 R3 0.655 0.617 0.636 UPC 3 R1 0.875 0.663 0.754 LSI UNED R3 0.671 0.597 0.632 UPC 3 R2 0.865 0.658 0.748 LSI UNED R2 0.639 0.597 0.617 UPC 3 R3 0.742 0.7 0.72 LSI UNED R1 0.633 0.597 0.614 UPC 2 R1 0.822 0.609 0.7 IXA 2 R1 0.701 0.531 0.604 IXA 2 R1 0.761 0.576 0.656 IXA 2 R2 0.706 0.494 0.581 IXA 2 R2 0.788 0.551 0.649 SINAI 1 R3 0.625 0.37 0.465 SINAI 1 R3 0.688 0.407 0.512 GPLSIUA 1 R2 0.884 0.251 0.391 GPLSIUA 1 R1 0.94 0.259 0.406 GPLSIUA 1 R1 0.881 0.243 0.381 GPLSIUA 1 R2 0.913 0.259 0.404 SINAI 1 R2 0.222 0.428 0.293 SINAI 1 R2 0.252 0.486 0.332 SINAI 1 R1 0.016 0.593 0.032 SINAI 1 R1 0.019 0.704 0.038 Table 3: Disability recognition - Table 4: Disability recognition - (a) Spanish (b) English) - Exact (a) Spanish (b) English) - Par- matching. Precision (P), Recall tial matching. Precision (P), Re- (R) and F-measure (F). call (R) and F-measure (F). 9 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) 10 H. Fabregat et al. (a) (a) Spanish P R F Spanish P R F IxaMed R1 0.889 0.727 0.8 IxaMed R1 1 0.818 0.9 IXA 2 R1 1 0.545 0.706 UPC 3 R3 1 0.727 0.842 IXA 2 R2 0.929 0.591 0.722 UPC 2 R1 0.895 0.773 0.829 IXA 2 R3 0.923 0.545 0.686 UPC 3 R1 0.941 0.727 0.821 UPC 2 R1 0.737 0.636 0.683 UPC 3 R2 0.941 0.727 0.821 UPC 3 R3 0.688 0.5 0.579 UC3M 1 R3 1 0.682 0.811 UPC 3 R1 0.647 0.5 0.564 IXA 2 R3 1 0.591 0.743 UPC 3 R2 0.647 0.5 0.564 IXA 2 R2 0.929 0.591 0.722 SINAI 1 R3 0.667 0.091 0.16 IXA 2 R1 1 0.545 0.706 SINAI 1 R2 0.333 0.045 0.08 UC3M 1 R2 0.909 0.455 0.606 GPLSIUA 1 R1 0 0 0 SINAI 1 R3 1 0.136 0.24 GPLSIUA 1 R2 0 0 0 UC3M 1 R1 1 0.136 0.24 LSI UNED R1 0 0 0 LSI UNED R1 0.75 0.136 0.231 LSI UNED R2 0 0 0 LSI UNED R2 0.75 0.136 0.231 LSI UNED R3 0 0 0 LSI UNED R3 0.75 0.136 0.231 SINAI 1 R1 0 0 0 SINAI 1 R2 0.667 0.091 0.16 UC3M 1 R1 0 0 0 GPLSIUA 1 R1 0.5 0.091 0.154 UC3M 1 R2 0 0 0 GPLSIUA 1 R2 0.4 0.091 0.148 UC3M 1 R3 0 0 0 SINAI 1 R1 0.125 0.045 0.067 (b) (b) English P R F English P R F UPC 3 R1 0.773 0.739 0.756 IxaMed R1 1 0.913 0.955 UPC 3 R2 0.773 0.739 0.756 UPC 3 R1 0.955 0.913 0.933 UPC 3 R3 0.696 0.696 0.696 UPC 3 R2 0.955 0.913 0.933 GPLSIUA 1 R1 0.647 0.478 0.55 UPC 3 R3 0.913 0.913 0.913 UPC 2 R1 0.647 0.478 0.55 SINAI 1 R3 1 0.826 0.905 GPLSIUA 1 R2 0.611 0.478 0.537 GPLSIUA 1 R1 0.941 0.696 0.8 IXA 2 R1 0.667 0.435 0.526 UPC 2 R1 0.941 0.696 0.8 IXA 2 R2 0.75 0.391 0.514 IXA 2 R1 1 0.652 0.789 SINAI 1 R3 0.526 0.435 0.476 GPLSIUA 1 R2 0.889 0.696 0.78 IxaMed R1 0.476 0.435 0.455 UC3M 1 R3 1 0.609 0.757 SINAI 1 R2 0.306 0.478 0.373 LSI UNED R2 0.875 0.609 0.718 SINAI 1 R1 0.25 0.391 0.305 LSI UNED R3 0.875 0.609 0.718 LSI UNED R2 0.188 0.13 0.154 LSI UNED R1 0.824 0.609 0.7 LSI UNED R3 0.188 0.13 0.154 IXA 2 R2 1 0.522 0.686 LSI UNED R1 0.176 0.13 0.15 SINAI 1 R1 0.556 0.87 0.678 UC3M 1 R1 0 0 0 SINAI 1 R2 0.556 0.87 0.678 UC3M 1 R2 0 0 0 UC3M 1 R2 0.875 0.304 0.452 UC3M 1 R3 0 0 0 UC3M 1 R1 1 0.043 0.083 Table 5: Negated disability Table 6: Negated disability recognition - (a) Spanish (b) En- recognition - (a) Spanish (b) En- glish) - Exact matching. Pre- glish) - Partial matching. Pre- cision (P), Recall (R) and F- cision (P), Recall (R) and F- measure (F). measure (F). 10 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) Overview of the DIANN Task: Disability Annotation Task 11 (a) (a) Spanish P R F Spanish P R F IxaMed R1 0.746 0.795 0.77 IxaMed R1 0.82 0.873 0.846 UC3M 1 R1 0.769 0.568 0.653 UC3M 1 R3 0.889 0.664 0.76 UPC 3 R2 0.772 0.563 0.652 UPC 3 R2 0.88 0.642 0.742 UPC 3 R1 0.779 0.555 0.648 UC3M 1 R2 0.865 0.646 0.74 UC3M 1 R2 0.749 0.559 0.64 UPC 3 R1 0.89 0.633 0.74 IXA 2 R1 0.644 0.616 0.629 UC3M 1 R1 0.864 0.638 0.734 IXA 2 R3 0.626 0.629 0.627 IXA 2 R3 0.7 0.703 0.702 UC3M 1 R3 0.731 0.546 0.625 IXA 2 R1 0.708 0.677 0.692 IXA 2 R2 0.633 0.594 0.613 UPC 3 R3 0.735 0.642 0.685 UPC 3 R3 0.64 0.559 0.597 IXA 2 R2 0.693 0.651 0.671 UPC 2 R1 0.71 0.48 0.573 UPC 2 R1 0.819 0.555 0.661 SINAI 1 R3 0.411 0.284 0.336 LSI UNED R2 0.803 0.48 0.601 LSI UNED R3 0.424 0.245 0.31 LSI UNED R1 0.797 0.48 0.599 LSI UNED R2 0.409 0.245 0.306 LSI UNED R3 0.803 0.463 0.587 LSI UNED R1 0.406 0.245 0.305 SINAI 1 R3 0.468 0.323 0.382 SINAI 1 R2 0.157 0.349 0.217 GPLSIUA 1 R2 0.878 0.157 0.267 GPLSIUA 1 R1 0.692 0.118 0.201 GPLSIUA 1 R1 0.897 0.153 0.261 GPLSIUA 1 R2 0.659 0.118 0.2 SINAI 1 R2 0.18 0.402 0.249 SINAI 1 R1 0.018 0.402 0.035 SINAI 1 R1 0.022 0.48 0.042 (b) (b) English P R F English P R F IxaMed R1 0.746 0.811 0.777 IxaMed R1 0.841 0.914 0.876 UC3M 1 R1 0.749 0.626 0.682 LSI UNED R3 0.843 0.728 0.781 UPC 3 R1 0.772 0.584 0.665 UC3M 1 R3 0.832 0.712 0.767 UPC 3 R2 0.768 0.584 0.664 LSI UNED R2 0.801 0.728 0.763 UC3M 1 R3 0.712 0.609 0.656 LSI UNED R1 0.79 0.728 0.758 UC3M 1 R2 0.706 0.572 0.632 UPC 3 R1 0.87 0.658 0.749 LSI UNED R3 0.657 0.568 0.609 UPC 3 R2 0.859 0.654 0.743 UPC 3 R3 0.626 0.593 0.609 UC3M 1 R2 0.817 0.663 0.732 UPC 2 R1 0.724 0.519 0.604 UC3M 1 R1 0.803 0.671 0.731 LSI UNED R2 0.624 0.568 0.595 UPC 3 R3 0.735 0.695 0.715 LSI UNED R1 0.616 0.568 0.591 UPC 2 R1 0.822 0.588 0.686 IXA 2 R1 0.672 0.49 0.567 IXA 2 R1 0.757 0.551 0.638 IXA 2 R2 0.685 0.457 0.548 IXA 2 R2 0.784 0.523 0.627 SINAI 1 R3 0.573 0.337 0.425 SINAI 1 R3 0.685 0.403 0.508 GPLSIUA 1 R2 0.806 0.239 0.368 GPLSIUA 1 R1 0.942 0.267 0.417 GPLSIUA 1 R1 0.812 0.23 0.359 GPLSIUA 1 R2 0.903 0.267 0.413 SINAI 1 R2 0.199 0.395 0.264 SINAI 1 R2 0.242 0.481 0.322 SINAI 1 R1 0.015 0.543 0.029 SINAI 1 R1 0.019 0.691 0.037 Table 7: Negated and no Table 8: Negated and no negated disability recognition negated disability recognition - (a) Spanish (b) English) - - (a) Spanish (b) English) - Exact matching. Precision (P), Partial matching. Precision (P), Recall (R) and F-measure (F). Recall (R) and F-measure (F). 11 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) 12 H. Fabregat et al. 6 Conclusions In this edition of Ibereval a new task of disabilities identification in biomedi- cal research papers has been proposed. In spite of being the first edition of the task, we consider that it has been a success of participation with a total of 8 participants. The corpus that has been made available to the participants is a very interesting resource since it is a dataset of 1000 annotated abstracts, 500 in Spanish and 500 in English, all of them extracted from journal articles related to the biomedical area and each of them referring to at least one disability, both in its extended form or as an abbreviation. In addition to containing annotations of disabilities, the corpus contains annotations referring to negation when it affects at least one disability. The participants used different approaches or resources that provided the task with different perspectives, all of which were very interesting. In summary, for each of the languages, the participants have not changed their models too sig- nificantly; in most cases, they have made use of alternative resources adapted to the language in which they work. In the case of Spanish, the systems with the best results have been supervised or semi-supervised systems, based on neu- ral network models using a Bidirectional LSTM and CRF, and in the case of English, the use of neural networks has also been predominant among the best systems, although in this case there are unsupervised systems that have obtained a performance equal or higher than the previous ones. Regarding negation, many participants have adapted well known systems such as NegEx or ABNER, al- though there have also been some participants who have implemented their own negation detection systems based on rules or treating the problem like a classi- fication problem. In conclusion, the organizers have proposed a task with different elements con- cerning both, the language and the entities to be detected (disabilities, negation and acronyms). Two evaluation metrics have also been used: exact and partial. All these options have generated a large number of results and different classifi- cations, highlighting the differences between the participating systems according to the aspect taken into account. 7 Acknowledgments This work has been partially financed by the EXTRECM projects (TIN2013- 46616-C2-2-R) and MAMTRA-MED (TIN2016-77820-C3-2-R) and EXTRAE (IMIENS 2017). References 1. Agerri, R., Bermudez, J., Rigau, G.: Ixa pipeline: Efficient and ready to use mul- tilingual nlp tools. In: LREC. vol. 2014, pp. 3823–3828 (2014) 2. Agerri, R., Rigau, G.: Simple language independent sequence labelling for the annotation of disabilities in medical texts. In: Proceedings of the Third Workshop 12 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) Overview of the DIANN Task: Disability Annotation Task 13 on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) (2018) 3. Aronson, A.R.: Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In: Proceedings of the AMIA Symposium. p. 17. American Medical Informatics Association (2001) 4. Bird, S., Loper, E.: Nltk: the natural language toolkit. In: Proceedings of the ACL 2004 on Interactive poster and demonstration sessions. p. 31. Association for Computational Linguistics (2004) 5. Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n- gram models of natural language. Computational linguistics 18(4), 467–479 (1992) 6. Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge sum- maries. Journal of biomedical informatics 34(5), 301–310 (2001) 7. Clark, A.: Combining distributional and morphological information for part of speech induction. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - Volume 1. pp. 59–66. EACL ’03, Association for Computational Linguistics, Stroudsburg, PA, USA (2003). https://doi.org/10.3115/1067807.1067817, https://doi.org/10. 3115/1067807.1067817 8. Fabregat, H., Martinez-Romo, J., Araujo, L.: Uned at diann 2018: Unsupervised system for automatic disabilities labeling in medical scientific documents. In: Pro- ceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) (2018) 9. Fukuda, K.i., Tsunoda, T., Tamura, A., Takagi, T., et al.: Toward information extraction: identifying protein names from biological papers. In: Pac symp bio- comput. vol. 707, pp. 707–718 (1998) 10. Gonaega, I., Atutxa, A., Gojenola, K., Casillas, A., de Ilarraza, A.D., Ezeiza, N., Oronoz, M., Prez, A., de Viaspre, O.P.: A hybrid approach for automatic disabil- ity annotation. In: Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) (2018) 11. Johnson, A.E., Pollard, T.J., Shen, L., Li-wei, H.L., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L.A., Mark, R.G.: Mimic-iii, a freely accessible critical care database. Scientific data 3, 160035 (2016) 12. Lpez-beda, P., Daz-Galiano, M.C., Martn-Valdivia, M.T., Jimnez-Zafra, S.: Sinai at diann - ibereval 2018. annotating disabilities in multi-language systems with umls. In: Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) (2018) 13. Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd an- nual meeting of the association for computational linguistics: system demonstra- tions. pp. 55–60 (2014) 14. Medina, S., Turmo, J., Loharja, H., Padr, L.: Semi-supervised learning for disabili- ties detection on english and spanish biomedical text. In: Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) (2018) 15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed repre- sentations of words and phrases and their compositionality. In: Advances in neural information processing systems. pp. 3111–3119 (2013) 16. Moreno, I., Rom-Ferri, M., Moreda, P.: Carmen: Sistema de entity typing basado en perfiles [carmen: Entity typing system based on profiles]. In: Congreso informtica para tod@s, IPT 2018 (2018) 13 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) 14 H. Fabregat et al. 17. Moreno, I., Rom-Ferri, M., Moreda, P.: Gplsiua team at the diann 2018 task. In: Proceedings of the Third Workshop on Evaluation of Human Language Technolo- gies for Iberian Languages (IberEval 2018) (2018) 18. Organization, W.H., et al.: World report on disability: World health organization (2011) 19. Oronoz, M., Casillas, A., Gojenola, K., Perez, A.: Automatic annotation of medi- cal records in spanish with disease, drug and substance names. In: Iberoamerican Congress on Pattern Recognition. pp. 536–543. Springer (2013) 20. Settles, B.: Abner: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics 21(14), 3191–3192 (2005) 21. Vecino, P.A., Padr, L.: Basic crf approach to diann 2018 shared task. In: Proceed- ings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) (2018) 22. Zavala, R.M.R., Martinez, P., Segura-Bedmar, I.: A hybrid bi-lstm-crf model to dis- abilities named entity recognition. In: Proceedings of the Third Workshop on Eval- uation of Human Language Technologies for Iberian Languages (IberEval 2018) (2018) 14 without dementia