=Paper=
{{Paper
|id=Vol-1650/smbm16Grundke
|storemode=property
|title=TextAI: Enhancing TextAE with Intelligent Annotation Support
|pdfUrl=https://ceur-ws.org/Vol-1650/smbm16Grundke.pdf
|volume=Vol-1650
|authors=Maximilian Grundke,Johannes Jasper,Mariya Perchyk,Jan Philipp Sachse,Ralf Krestel,Mariana Neves 
|dblpUrl=https://dblp.org/rec/conf/smbm/GrundkeJPSKN16
}}
==TextAI: Enhancing TextAE with Intelligent Annotation Support==
<pdf width="1500px">https://ceur-ws.org/Vol-1650/smbm16Grundke.pdf</pdf>
<pre>
           TextAI: Enhancing TextAE with Intelligent Annotation Support

                        Maximilian Grundke, Johannes Jasper, Mariya Perchyk,
                           Jan Philipp Sachse, Ralf Krestel, Mariana Neves
                                        Hasso Plattner Institute
                                          Potsdam, Germany
                                     mariana.neves@hpi.de


                         Abstract                       adopted by users into their set of annotations. The
                                                        system learns from its users and improves predic-
        We present TextAI, an extension to the          tion of relations over time based on previously an-
        annotation tool TextAE, that adds sup-          notated documents. We use the biomedical do-
        port for named-entity recognition and au-       main as use case for our system and as basis for
        tomated relation extraction based on ma-        evaluation. The developed approach, however, can
        chine learning techniques. Our learning         be used across different domains that require man-
        approach is domain-independent and in-          ual annotation. With its automated annotation sug-
        creases the quality of the detected relations   gestions and its multi-user support, TextAI intro-
        with each added training document. We           duces capabilities that are conceptually different
        further aim at accelerating and facilitat-      from other annotation tools.
        ing the manual curation process for natural
        language documents by supporting simul-
        taneous annotation by multiple users.           2   Related Work
1       Introduction
                                                        Automatic Annotation Suggestions. Various
Faced with rapidly growing numbers of publicly          studies on prediction of annotations confirm that
available natural language documents, it is becom-      automated recommendations increase the speed
ing increasingly difficult to extract the underlying    and improve the quality of annotations. Lingren
knowledge in a structured manner. Thus, anno-           et al. (Lingren et al., 2014) determined that au-
tation of documents for the purpose of extracting       tomatic annotation suggestions for named entities
this information is an important task in many re-       result in 13.83% to 21.5% time saving without
search domains today. Creating these annotations        reducing the inter-annotator agreement (IAA) or
is mostly done manually, even though it is a very       qualitative annotator performance. Additionally,
time consuming work and requires deep under-            Fort and Sagot (Fort and Sagot, 2010) and South
standing and domain knowledge (Hirschman et al.,        et al. (South et al., 2014) showed a significant
2012).                                                  gain in quality. Hernandez et al. (Hernandez et
   TextAI1 is a tool developed to support annota-       al., 2014) reported an improvement of non-expert
tors as a first step on the way to minimize the ef-     annotator performance using automated named-
fort of extracting information from written texts.      entity recognition. The WebAnno annotation tool
There exists already a wide range of annotation         (Yimam et al., 2014) includes automatic sugges-
editors (Neves and Leser, 2012). Thus, we chose         tions for three generic structures: spans, rela-
not to implement an additional standalone editor,       tions, and chains. It integrates an external machine
but to build our system around the TextAE2 tool,        learning tool, which requires users to configure the
which is an existing open source editor. We ex-         features themselves. While this allows for domain
tended TextAE by providing additional features          specific optimization, it excludes non-expert users
to manage documents of multiple users and pre-          from using this functionality. By allowing defi-
dicting entities and relations, which can then be       nition of custom annotation labels, WebAnno in-
    1
        https://github.com/LearningToNote               creases its flexibility, however eliminates labels as
    2
        http://textae.pubannotation.org/                a suitable machine learning feature.
Local                 Import                               3.1    Front end
Documents            Sources
                                                           The user interacts with a central front-end com-
                                                           ponent, which allows importing documents from
    Frontend        Middleware
                                       Database Server     multiple sources, such as the local file system, e.g.,
                                                  text     files in the BioC file format (Comeau et al., 2013)
                                      Docs
                                                 mining
                                                           or by querying PubMed3 , based on plain text. Af-
                                                machine
                  Annotation Editor
                                                learning   ter importing documents into TextAI, these can be
                                                           loaded from the database (cf. Back end). Be-
                                                           sides document storage, the IMDB further offers
    Figure 1: Components of the TextAI system.             basic text mining and analysis features that we use
                                                           and expand. Domain independence is achieved
                                                           through the introduction of another level of hier-
Annotation Editors. TextAE is a browser-based              archy called ”tasks”, which can be used to orga-
annotation tool which comes without a server               nize documents under one semantic group. This
backend, but supports importing documents and              has the technical implication that each task has its
annotations based on a simple JSON format. Be-             own machine learning models, hence annotations
ing implemented using HTML and JavaScript,                 within one task have no impact on predictions in
TextAEs functionality can be easily expanded.              another one.
Furthermore, it supports both a wide range of key-
board shortcuts and usability improvements that            3.2    Annotation Editor
aim to increase performance of its users. The brat
                                                           Users can review and edit documents and their an-
rapid annotation tool (Stenetorp et al., 2012) fea-
                                                           notations through the annotation editor. The later
tures a client-server architecture, supports anno-
                                                           also allows to trigger NER, relationship predic-
tation of documents by multiple users simultane-
                                                           tion, and other methods aimed at improving an-
ously and allows comparison of different sets of
                                                           notation quality and speed. We extended TextAE,
annotations for one document. We chose to adapt
                                                           which is a powerful standalone annotation tool
several of these features and enhance TextAE to
                                                           based on JSON-formatted input, that allows load-
support them, while leaving it a simple and easy
                                                           ing text and creating annotations and relations. It
to use editor, as feature-rich tools, such as brat,
                                                           allows adding multiple annotations to each posi-
quickly become complicated to use as they require
                                                           tion in the document and displaying different lay-
complex configuration.
                                                           ers of annotations through color coding. Instead
                                                           of displaying different kinds of annotations, such
3    System Architecture
                                                           as POS tags and domain-specific information, e.g.,
Annotating documents with entities and relations           medical terms, we use different colors for anno-
traditionally involves manually highlighting enti-         tations made by different users on the same doc-
ties in the document and marking relations be-             ument. Because of the HTML span-tag-based
tween them. Our system employs machine learn-              implementation for annotation rendering, TextAE
ing techniques to automate these steps by using            displays overlapping annotations in a stacked way
the workflow depicted in Figure 1. The user inter-         instead of inline.
acts with a central front-end component, which is             Users can create custom labels for their annota-
used to manage annotation tasks, documents and             tions in TextAI. However, we ask the users to map
users, as well as importing documents from mul-            their custom labels to one of the UMLS semantic
tiple sources. Users also interact with the anno-          types4 , in order to improve the ability of our sys-
tation editor for editing and reviewing documents          tem to learn based on annotations made by users
and their annotations. In the back end, we use             and to normalize the annotations made by differ-
an in-memory database (IMDB), which provides               ent users and tasks. For instance, when annotating
document storage, text analysis features and in-           the DDI corpus (Herrero-Zazo et al., 2013), users
tegration with machine learning algorithms. The            could create four labels, one for each drug type
middleware layer mediates between the different               3
                                                              http://www.ncbi.nlm.nih.gov/pubmed
interfaces and the IMDB and deals with all logic              4
                                                              https://metamap.nlm.nih.gov/
concerning user and document management.                   SemanticTypesAndGroups.shtml
Figure 2: Screen-shot of the TextAI annotation editor showing annotations from different users (1), a
drug-drug-interaction relation (2), the user-defined label (3) and the prediction functionality (4).


included in this corpus, but they are asked map ev-     portant domain-dependent information.
ery type to a UMLS semantic type, for instance,
”T2000 - Clinical Drug” or ”T121 - Pharmaco-            3.4    Information Extraction
logic Substance”. Currently, we only allow map-         The automated suggestion of annotations involves
ping a label to one single UMLS semantic type.          two major steps, NER and relation prediction. For
   Annotating large corpora is a task that can in-      NER we apply a simple dictionary-based approach
volve multiple experts, which requires user man-        using the UMLS dictionary and part-of-speech
agement to be available in a collaborative anno-        (POS) tagging of the documents.
tation system. A problem that occurs when mul-              We train two support vector machines6 to de-
tiple people do the same work is the creation of        tect relations between two entities: one to deter-
annotations with similar, but not identical mean-       mine whether a relation exists between two en-
ing. Therefore, our system nudges users towards         tities, and in case of positive, one to classify the
better annotations by asking them to map their la-      type of relation. The relations are represented by
bels to the UMLS semantic types. Further, during        n-dimensional feature vectors. We get descrip-
annotation, users can choose to hide their annota-      tive characteristics of the relation of two entities
tions from both other users and the machine learn-      by combining lexical and syntactical features In
ing part of our system, while still being able to use   the pre-processing step, the document text is split
all of its features.                                    in sentences and every word is tokenized, lemma-
   Users can correct the predictions made by the        tized and POS-tagged. Similar to the approach de-
system by selectively adding them to their own an-      scribed in (Bui et al., 2014), we divided the sen-
notation sets and editing them if necessary. At any     tence into three groups by adding the prefix ”b”
point in the process, users can also manually add,      to each token appearing before the first entity, the
remove, and alter entities and relations to their       prefix ”i” to the words in between the two entities
own set.                                                and the prefix ”a” to the words after the second en-
                                                        tity. While we remove stop words and ignore enti-
3.3     Back end                                        ties in the context, we do not filter out punctuation,
We use Rserve5 , which provides an interface to         i.e. comma, colon and semicolon, since their ap-
the statistical computing language R with its ex-       pearance between two entities can be a strong in-
tensive text mining and machine learning capabil-       dicator that there is no relation between these enti-
ities. Further, our system relies on the text anal-     ties. We only consider three tokens on either side
ysis functionalities of an IMDB for entity and re-      of each entity in order to emphasize the near con-
lation predictions. Besides the documents, task-        text around the annotations. The annotated types
specific metadata, such as trained models, domain       of both entities are considered as a feature as well.
specific stopword lists and NER dictionaries are        We also include the distance, i.e. the number of
stored in the database, allowing fast access to im-        6
                                                             e1071         (https://cran.r-project.org/
                                                        package=e1071), an R interface to LIBSVM with
   5
       https://rforge.net/Rserve/                       its default RBF kernel
words and the number of characters between two
entities, as a feature. We avoid using trigger words
as proposed by (Bui et al., 2014), since this would
contradict with our domain-independence objec-
tive.
   Every newly annotated document is used to re-
train the models for relationship extraction, thus
learning over time to improve its performance.
Further, using the UMLS predefined set of types
also improves the learning capabilities of our sys-
tem, as entities can now be chosen from a finite
set and the entity type becomes a stronger feature     Figure 3: Influence of training set size on predic-
for relation prediction. Since training these mod-     tion performance.
els is only possible and reasonable given a certain
amount of information, TextAI needs a set of pre-
annotated documents, either by importing or man-       72.53% for the DDI-Drugbank and DDI-Medline
ually annotating a few documents of the corpus in      data set.
advance.
                                                       Relation Extraction. In total, the DDI corpus
3.5    Middleware                                      names five different relation types: non-relation,
As both database and front-end components have         mechanism, effect, advise and a general interac-
highly independent schemata and interfaces, trans-     tion. We achieved a precision of 72.95% on the
forming data between them is a key role of             test set of the DDI corpus, averaged over 10 iter-
the middleware. The middleware implements a            ations. Other researchers who performed relation
RESTful interface representing users, tasks, doc-      prediction on the DDI corpus achieve comparable
uments, user-documents and their content to pass       performance results of 60.9% and 62.99% macro
on information about data objects to the browser-      averaged F-score (Thomas et al., 2013a).
based front end. Additionally, user management
and access rights management are handled here.         Training Set Size. When users annotate new re-
An integrated user model ensures that every ac-        lations, the systems performance increases since
tion taken is properly authenticated and autho-        the SVM models are retrained with new user in-
rized. Our annotation editor can load and export       put. We measure the impact of this input on the
data provided in JSON format, thus the middle-         efficiency of our classifier with a setup as follows.
ware is also responsible for transforming informa-     Starting with only one annotated document, we
tion between the different representations in our      train both SVM classifiers on the given data and
system. In addition, the middleware provides an        evaluate on 20 randomly selected documents. We
interface to import and export documents. Inter-       then select 10 more documents for training and
nally, we use the BioC format (Comeau et al.,          again test on 20 test documents. This cycle repeats
2013), as it is general enough to be used across       until 500 documents are included in the training
different annotation domains.                          set. Figure 3 illustrates the F-score for each of the
                                                       DDI interaction types averaged over 10 runs with
4     Experiments                                      varying training documents.
                                                          In comparison to our performance for other re-
We focused on the medical domain and used the
                                                       lations, the ”interaction” type obtained lower re-
DDI corpus (Herrero-Zazo et al., 2013) to evalu-
                                                       sults and these have oscillated over our experi-
ate the performance of our NER and relation ex-
                                                       ments. This relation type is under-represented in
traction procedures.
                                                       the DDI corpus as it constitute only 6% of the an-
NER. In our evaluation, we consider not only           notated DDI relations. As discussed in previous
exact matches of the gold-standard and the pre-        work (Thomas et al., 2013b), this resulted in lower
dicted entities, but also overlapping entity label     performance of the systems for this relation type
markers are accepted. Our average F-1 score was        on the test set. We believe that the oscillation on
77.3% with a recall of 85.62% and a precision of       the results occurs for those test sets which con-
tained more or less instances of the ”interaction”      Lynette Hirschman, Gully A. P. C Burns, Martin
relation type that could not be correctly detected        Krallinger, Cecilia Arighi, K. Bretonnel Cohen,
                                                          Alfonso Valencia, Cathy H. Wu, Andrew Chatr-
by our system.
                                                          Aryamontri, Karen G. Dowell, Eva Huala, Anlia
                                                          Loureno, Robert Nash, Anne-Lise Veuthey, Thomas
5   Conclusions and Future Work                           Wiegers, and Andrew G. Winter. 2012. Text mining
We have presented a prototype that extends the an-        for the biocuration workflow. Database, 2012.
notation editor TextAE with multi-user functional-      Todd Lingren, Louise Deleger, Katalin Molnar, Haijun
ity and annotation prediction. This was achieved          Zhai, Jareen Meinzen-Derr, Megan Kaiser, Laura
by creating a concept of per-user annotation sets         Stoutenborough, Qi Li, and Imre Solti. 2014. Eval-
                                                          uating the impact of pre-annotation on annotation
and tasks, as well as a NER framework and re-             speed and potential bias: natural language process-
lation prediction algorithm. Our system provides          ing gold standard development for clinical named
users with functionality for annotations prediction       entity recognition in clinical trial announcements.
without interfering with their day-to-day annota-         Journal of the American Medical Informatics Asso-
                                                          ciation, 21(3):406–413.
tion work.
   As future work, we plan on conducting a user         Mariana Neves and Ulf Leser. 2012. A survey on an-
study on annotation speed and quality. Further,          notation tools for the biomedical literature. Brief-
we also want to explore NER algorithm based on           ings in bioinformatics, page bbs084.
machine learning and on the labels which are nor-       Brett R. South, Danielle Mowery, Ying Suo, Jian-
malized to the UMLS semantic types and not only           wei Leng, scar Ferrndez, Stephane M. Meystre, and
on the current dictionary-based approach. Finally,        Wendy W. Chapman. 2014. Evaluating the effects
                                                          of machine pre-annotation and an interactive annota-
semi-supervised learning approaches, such as ac-
                                                          tion interface on manual de-identification of clinical
tive learning to leverage user feedback, could im-        text. Journal of Biomedical Informatics, 50:162 –
prove NER and relation extraction even further.           172. Special Issue on Informatics Methods in Med-
                                                          ical Privacy.

References                                              Pontus Stenetorp, Sampo Pyysalo, Goran Topić,
                                                          Tomoko Ohta, Sophia Ananiadou, and Jun’ichi Tsu-
Quoc-Chinh Bui, Peter M.A. Sloot, Erik M. van Mul-        jii. 2012. brat: a web-based tool for nlp-assisted
  ligen, and Jan A. Kors. 2014. A novel feature-          text annotation. In Proceedings of the Demonstra-
  based approach to extract drugdrug interactions         tions at the 13th Conference of the European Chap-
  from biomedical text. Bioinformatics.                   ter of the Association for Computational Linguistics,
                                                          pages 102–107, Avignon, France, April. Association
Donald C. Comeau, Rezarta Islamaj Doan, Paolo Ci-         for Computational Linguistics.
  ccarese, Kevin Bretonnel Cohen, Martin Krallinger,
  Florian Leitner, Zhiyong Lu, Yifan Peng, Fabio Ri-    Philippe Thomas, Mariana Neves, Tim Rocktäschel,
  naldi, Manabu Torii, Alfonso Valencia, Karin Ver-       and Ulf Leser. 2013a. Wbi-ddi: Drug-drug inter-
  spoor, Thomas C. Wiegers, Cathy H. Wu, and              action extraction using majority voting. In Second
  W. John Wilbur. 2013. Bioc: a minimalist ap-            Joint Conference on Lexical and Computational Se-
  proach to interoperability for biomedical text pro-     mantics (*SEM), Volume 2: Proceedings of the Sev-
  cessing. Database, 2013.                                enth International Workshop on Semantic Evalua-
                                                          tion (SemEval 2013), pages 628–635, Atlanta, Geor-
Karën Fort and Benoı̂t Sagot. 2010. Influence of
                                                          gia, USA, June. Association for Computational Lin-
  pre-annotation on pos-tagged corpus development.
                                                          guistics.
  In Proceedings of the Fourth Linguistic Annotation
  Workshop, LAW IV ’10, pages 56–63, Stroudsburg,       Philippe Thomas, Mariana Neves, Tim Rocktäschel,
  PA, USA. Association for Computational Linguis-         and Ulf Leser. 2013b. Wbi-ddi: drug-drug inter-
  tics.                                                   action extraction using majority voting. In Second
Andres M Hernandez, Harry S Hochheiser, John R            Joint Conference on Lexical and Computational Se-
  Horn, Rebecca S Crowley, and Richard D Boyce.           mantics (* SEM), volume 2, pages 628–635.
  2014. Testing pre-annotation to help non-experts
                                                        Seid Muhie Yimam, Chris Biemann, Richard Eckart de
  identify drug-drug interactions mentioned in drug
                                                          Castilho, and Iryna Gurevych. 2014. Automatic an-
  product labeling. In Second AAAI Conference on
                                                          notation suggestions and custom annotation layers
  Human Computation and Crowdsourcing.
                                                          in webanno. In Proceedings of 52nd Annual Meet-
Marı́a Herrero-Zazo, Isabel Segura-Bedmar, Paloma         ing of the Association for Computational Linguis-
 Martı́nez, and Thierry Declerck. 2013. The ddi           tics: System Demonstrations, pages 91–96. Associ-
 corpus: An annotated corpus with pharmacological         ation for Computational Linguistics.
 substances and drug-drug interactions. Journal of
 Biomedical Informatics (JBI), 45(5):914–920, 10.

</pre>