<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Find Problems before They Find You with AnnotatorPro's Monitoring Functionalities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohammed R. H. Qwaider</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anne-Lyse Minard</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuela Speranza</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Magnini Fondazione Bruno Kessler</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Trento</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>qwaider</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>minard</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>manspera</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>magninig@fbk.eu</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>English. We present a tool for annotation of linguistic data. ANNOTATORPRO offers both complete monitoring functionalities (e.g. inter-annotator agreement and agreement with respect to a gold standard) and highly flexible task design (e.g. token and document level annotation, adjudication and reconciliation procedures). We teste ANNOTATORPRO in several industrial annotation scenarios, coupled with Active Learning techniques.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Driven by the popularity of machine learning
approaches, there has been in the last years an
increasing need to produce human annotated data for
a large number of linguistic tasks (e.g. named
entity recognition, semantic role labeling, sentiment
analysis, word sense disambiguation, and
discourse relations, just to mention a few). Datasets
(development, training and test data) are being
developed for different languages and different
domains, both for research and industrial purposes.</p>
      <p>A relevant consequence of this is the
increasing demand for annotated datasets, both in terms
of quantity and quality. This in turn calls for tools
with a rich apparatus of functionalities (e.g.
annotation, visualization, monitoring and reporting),
able to support and monitor a large variety of
annotators (i.e. from linguists to mechanical
turkers), flexible enough to serve a large spectrum
of annotation scenarios (e.g. crowdsourcing and
paid professional annotators), and open to the
integration of NLP tools (e.g. for automatic
preannotation and for instance selection based on
Active Learning).</p>
      <p>
        Although there is a large supply of annotation
tools, such as brat
        <xref ref-type="bibr" rid="ref9">(Stenetorp et al., 2012)</xref>
        , GATE
        <xref ref-type="bibr" rid="ref2">(Cunningham et al., 2011)</xref>
        , CAT
        <xref ref-type="bibr" rid="ref1">(Bartalesi Lenzi
et al., 2012)</xref>
        , and WebAnno
        <xref ref-type="bibr" rid="ref11">(Yimam et al., 2013)</xref>
        ,
and several functions are included in common
crowdsourcing platforms (e.g. CrowdFlower1),
we believe that none of the available tool possesses
the full range of functionalities for a real and
intensive industrial use. As an example, none of the
afore mentioned tools allows one to implement
adjudication rules (i.e. under what condition an item
annotated by more than one annotator is assigned
to a certain category) or to visualize items with
disagreement among annotators.
      </p>
      <p>This paper introduces ANNOTATORPRO, a new
annotation tool which was mainly conceived to
fulfill the above-mentioned needs. We highlight
two main aspects of the tool: (i) a high level of
flexibility to design the annotation task, including
the possibility to define adjudication and
reconciliation procedures; (ii) the rich set of functionalities
allowing for constant monitoring of the quality of
the data being annotated.</p>
      <p>The paper is organized as follows. In Section 2
we compare ANNOTATORPRO with some
state-ofthe-art annotation tools. Section 3 provides a
general description of the tool. Sections 4 and 5 focus
on the task design and on the monitoring
functionalities, while Section 6 provides a brief overview
of the tool’s application and future extensions.
1https://www.crowdflower.com</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Many annotation tools are available to the
community. However, some of them are limited by
license, e.g. CAT
        <xref ref-type="bibr" rid="ref1">(Bartalesi Lenzi et al., 2012)</xref>
        and
GATE
        <xref ref-type="bibr" rid="ref2">(Cunningham et al., 2011)</xref>
        are available for
research use only, while some others have open
licenses, e.g. brat
        <xref ref-type="bibr" rid="ref9">(Stenetorp et al., 2012)</xref>
        , but offer
limited features.
      </p>
      <p>The brat rapid annotation tool (brat) is an
open license annotation tool that supports
different annotation levels, in particular annotation at
the token level and annotation of relations between
marked tokens. It supports multiple annotators,
in the sense that many annotators can collaborate
on annotating the same corpus, but needs an
inhouse installation. Despite all these advantages,
brat does not support either annotation monitoring
or annotator/task reports.</p>
      <p>Other tools (e.g. CAT) provide advanced
functionalities to perform annotation at different
levels (e.g. token and relation level) through a
userfriendly interface, although they do not support
annotation monitoring.</p>
      <p>CrowdFlower is an outsourcing annotation
service that provides a platform for annotation
(focusing on annotation at the document level)
employing non expert contributors. It uses gold
standard tests to evaluate the annotators and
supports automatic adjudication features, but no
interannotator agreement metrics are available. In
addition an important issue which could limit the use
of outsourcing is the non in-house storage of the
data, in particular when sensitive data covered by
privacy regulations are concerned.</p>
      <p>GATE is a powerful tool that implements most
of the features to facilitate the annotation
production in all its phases (e.g. task creation,
annotator assignment, annotation monitoring and
multilayer annotation of the same corpus). However,
visualization of disagreement is not available and
no automatic adjudication is available.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Overall Description</title>
      <p>
        ANNOTATORPRO is a web-based annotation tool
built on top of the open source tool MT-EQUAL
(Machine Translation Error Quality Alignment),
a toolkit for the manual assessment of Machine
Translation output that implements three different
tasks in an integrated environment: annotation of
translation errors, translation quality rating (e.g.
adequacy and fluency, relative ranking of
alternative translations), and word alignment
        <xref ref-type="bibr" rid="ref3">(Girardi et
al., 2014)</xref>
        .
      </p>
      <p>ANNOTATORPRO inherits from MT-EQUAL
the capability of scaling over big data in an
optimized platform that is able to save annotation in
real-time. It also makes use of the MT-EQUAL
web-based interface which is a multi-user and
user-friendly interface.</p>
      <p>It performs simple tokenization based on
spaces, punctuation, and other
languagedependent rules, but the user can also upload
directly tokenized files.</p>
      <p>We designed new functionalities to fulfill the
requirements of high quality corpus annotation
performed by multiple annotators.
ANNOTATORPRO’s main novel features are:</p>
      <p>The interface includes different options to
design the annotation task (Section 4.1), which
are set by the project manager.</p>
      <p>The tool enables annotation at two levels
(Section 4.2): annotation at the token level
(e.g. part-of-speech tagging and named entity
recognition) and annotation at the document
level (e.g. sentiment analysis).</p>
      <p>ANNOTATORPRO’s interface offers
functionalities for annotation monitoring (Section
5), which include inter-annotator agreement
(IAA) monitoring and quality monitoring.</p>
      <p>ANNOTATORPRO has been implemented in
PHP and JavaScript, and uses MySQL to manage
a database. It takes as input several UTF-8
encoded formats: TXT (raw text), IOB22 and TSV
(tab separated values). It also accepts ZIP archives
containing the source files.</p>
      <p>As regards data storage, document’s
annotations are saved in a MySQL database in real time
(i.e. while data being annotated). The annotated
data can be exported in the following formats:
IOB2 and TSV.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Annotation Task Design</title>
      <p>ANNOTATORPRO distinguishes two types of
users, i.e. managers and annotators. Managers
2The IOB2 tagging format is a common format for text
chunking. B- is used to tag the beginning of a chunk, I- to tag
tokens inside the chunk and O to indicate tokens not
belonging to a chunk.
take care of designing the annotation task at hand;
in particular, they (i) define the annotation
procedure, which depends on the number of annotators,
their level of expertise (for example, non-expert
annotators might not be allowed to see/modify
each other’s work) and the use that the dataset is
intended for (e.g. evaluation, training, etc.), and
(ii) the annotator’s task, which includes selecting
the most appropriate annotation level and
creating the annotation categories/labels (Figure 1). As
opposed to managers, annotators are basic users,
who only have access to a limited number of
(annotation) functionalities (Figure 2).
4.1</p>
      <sec id="sec-4-1">
        <title>Annotation Procedure</title>
        <p>One of the main tasks of the manager is to define
the annotation procedure, which consists mainly
of:</p>
        <p>Defining the number of annotators (one or
more) who can collaborate on annotating the
same corpus.</p>
        <p>In case of multiple annotators, defining
the type of collaboration among them, i.e.
whether data are to be annotated only by one
or more of them (document level only).</p>
        <p>Defining the automatic adjudication rules
in the case where multiple annotations of
the same data are collected (document level
only). The two basic options are:
– considering an annotation as solved if
the majority of annotators agreed on a
certain annotation;
– considering an annotation as solved if a
minimum number of concordant
annotations is reached.</p>
        <p>Deciding whether to make the metadata of
the documents (e.g. document id, document
title) visible to the annotators during the
annotation phase.</p>
        <p>Deciding whether to allow for a revision
phase after the annotation has been
concluded, i.e. give the annotators the possibility
to modify their annotations, for example after
a reconciliation step has taken place. By
default, document metadata will be visible
during the revision phase to facilitate the work.
Decide the modality for the selection of data
to be presented to the annotators:
– propose to the annotator preselected
ordered documents (default option);
– randomly select documents from a large
dataset;
– select documents from a large dataset
through an Active Learning process.3
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Annotator’s Task</title>
        <p>ANNOTATORPRO supports two different
annotation levels, i.e one where annotation is performed
at the document level and one where we have
smaller units, typically tokens, being annotated. It
is the manager’s task to select the most
appropriate annotation level for the task at hand; for
example, named entity recognition needs data annotated
at the token level, whereas for sentiment analysis
a corpus is generally annotated at the document
level.</p>
        <p>Finally, the task manager defines the set of
categories or the set of labels to be used by the
annotator respectively to classify the documents (in
the case of document level annotation) or to mark
portion of text.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Annotation Monitoring</title>
      <p>In ANNOTATORPRO we have implemented several
monitoring functionalities aimed at guaranteeing
high quality annotation as described below.
5.1</p>
      <sec id="sec-5-1">
        <title>Progress Monitoring</title>
        <p>From the manager interface two tabs display
information about the annotations already performed.
The Annotation tab presents the progress of the
annotation task, i.e. the annotations done by each
annotator. This is real-time information, which
means that the manager can follow the progress
of the work underway. Moreover the manager can
visualize the annotations of each user in read-only
mode.</p>
        <p>The Overall stats panel displays a table which
summarizes the overall statistics about the
annotation. The following information is given: total
number of annotated documents; number of
nonannotated documents; number of partially
annotated documents (i.e. documents not yet annotated
by the required number of annotators); number of
completely annotated documents (i.e. documents
3The Active Learning process is not provided in the
distribution of ANNOTATORPRO, but the tool can select the data
to be annotated if they are associated with a confidence value
(in this case the tool can either select those with the highest
score or those with the lowest score).
annotated by the required number of annotators,
independently of whether annotators did or did not
reach an agreement).</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2 Inter-Annotator Agreement Monitoring</title>
        <p>IAA monitoring, which measures the level of
agreement between the annotators at regular
intervals, is activated every time two or more
annotators annotate the same data.</p>
        <p>
          IAA agreement is computed in terms of Dice
coefficient
          <xref ref-type="bibr" rid="ref4">(Lin, 1998)</xref>
          and Cohen’s Kappa
          <xref ref-type="bibr" rid="ref10">(Viera
and Garrett, 2005)</xref>
          ; the latter represents the
agreement as a continuous value from -1 to 1, where -1
means total disagreement and 1 means total
agreement.
        </p>
        <p>The project manager has access to different
types of information to constantly monitor the
level of agreement between annotators, focusing
both on a single annotator and overall:
the level of agreement each annotator obtains
with every other annotator and the average of
the IAA values obtained by each annotator;
the overall average IAA.</p>
        <p>ANNOTATORPRO also provides a visualization
of the annotations made by each annotator for
each document, where a different color is used to
present each tag from the tagset (see Figure 3).
This enables the manager to have quick and easy
access to the cases of disagreement and, if needed,
to give feedback to the annotators.
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Quality Monitoring</title>
        <p>Quality monitoring makes use of a gold standard
dataset previously annotated by an expert. Each
annotator is asked to provide an annotation for
those samples. The annotators do not know if they
are annotating a golden sample or not, which
ensures a non-biased evaluation. This enables the
project manager to assess the quality of the
annotations of each annotator by comparing them
against a dataset considered correct. The same
quantitative information and visualization as those
for IAA monitoring (see Section 5.2) are available.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Applications and Further Extensions</title>
      <p>
        We used ANNOTATORPRO for multiple projects,
on different tasks, including named entity
recognition
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8">(Minard et al., 2016a)</xref>
        , event detection
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8">(Minard et al., 2016b)</xref>
        and sentiment analysis. The
tool has been successfully exploited both in
situations with few experienced annotators as well as
with more than 20 non-expert annotators (i.e. high
school students) working in parallel.
ANNOTATORPRO has been fully integrated within an
Active Learning platform
        <xref ref-type="bibr" rid="ref5 ref6 ref8">(Magnini et al., 2016)</xref>
        and
successfully employed in two industrial projects,
resulting in high quality data.
      </p>
      <p>As for our next steps, we are working to
extend ANNOTATORPRO to include relations among
annotated entities, such as the relation between a
verb and its argument/s in semantic role labeling.</p>
      <p>ANNOTATORPRO is distributed as open source
software under the terms of Apache License 2.0.4
from the web page: http://hlt-nlp.fbk.
eu/technologies/annotatorpro.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been partially funded by the
EuclipRes project, under the program Bando
Innovazione 2016 of the autonomous Province of
Bolzano.</p>
      <p>4https://www.apache.org/licenses/
LICENSE-2.0</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Valentina</given-names>
            <surname>Bartalesi</surname>
          </string-name>
          <string-name>
            <surname>Lenzi</surname>
          </string-name>
          , Giovanni Moretti, and
          <string-name>
            <given-names>Rachele</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>CAT: the CELCT annotation tool</article-title>
          .
          <source>In Proceedings of the Eighth International Conference on Language Resources and Evaluation</source>
          ,
          <string-name>
            <surname>LREC</surname>
          </string-name>
          <year>2012</year>
          , pages
          <fpage>333</fpage>
          -
          <lpage>338</lpage>
          , Istanbul, Turkey, May
          <volume>23</volume>
          -25,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Hamish</given-names>
            <surname>Cunningham</surname>
          </string-name>
          , Diana Maynard, Kalina Bontcheva, Valentin Tablan, Niraj Aswani, Ian Roberts, Genevieve Gorrell, Adam Funk, Angus Roberts, Danica Damljanovic, Thomas Heitz,
          <string-name>
            <given-names>Mark A.</given-names>
            <surname>Greenwood</surname>
          </string-name>
          , Horacio Saggion, Johann Petrak,
          <string-name>
            <given-names>Yaoyong</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Wim</given-names>
            <surname>Peters</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Text Processing with GATE (Version 6</article-title>
          ). University of Sheffield Department of Computer Science.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Christian</given-names>
            <surname>Girardi</surname>
          </string-name>
          , Luisa Bentivogli, Mohammad Amin Farajian, and
          <string-name>
            <given-names>Marcello</given-names>
            <surname>Federico</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>MT-EQuAl: A toolkit for human assessment of machine translation output</article-title>
          .
          <source>In Proceedings of COLING</source>
          <year>2014</year>
          ,
          <source>the 25th International Conference on Computational Linguistics: System Demonstrations</source>
          , pages
          <fpage>120</fpage>
          -
          <lpage>123</lpage>
          , Dublin, Ireland,
          <source>August 23-29</source>
          ,
          <year>2014</year>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Dekang</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>An information-theoretic definition of similarity</article-title>
          .
          <source>In Proceedings of the Fifteenth International Conference on Machine Learning, ICML '98</source>
          , pages
          <fpage>296</fpage>
          -
          <lpage>304</lpage>
          , Madison, Wisconsin, USA. Morgan Kaufmann Publishers Inc.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Bernardo</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <surname>Anne-Lyse</surname>
            <given-names>Minard</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohammed R. H. Qwaider</surname>
            , and
            <given-names>Manuela</given-names>
          </string-name>
          <string-name>
            <surname>Speranza</surname>
          </string-name>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>TextPro-AL</surname>
          </string-name>
          :
          <article-title>An active learning platform for flexible and efficient production of training data for NLP tasks</article-title>
          .
          <source>In Proceedings of COLING</source>
          <year>2016</year>
          ,
          <source>the 26th International Conference on Computational Linguistics: System Demonstrations</source>
          , pages
          <fpage>131</fpage>
          -
          <lpage>135</lpage>
          , Osaka, Japan, December.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Anne-Lyse</surname>
            <given-names>Minard</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohammed R. H. Qwaider</surname>
            , and
            <given-names>Bernardo</given-names>
          </string-name>
          <string-name>
            <surname>Magnini</surname>
          </string-name>
          .
          <year>2016a</year>
          .
          <article-title>FBK-NLP at NEEL-IT: Active learning for domain adaptation</article-title>
          .
          <source>In Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ), volume
          <volume>1749</volume>
          ,
          <string-name>
            <surname>Napoli</surname>
          </string-name>
          , Italy, December 5-
          <issue>7</issue>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Anne-Lyse</surname>
            <given-names>Minard</given-names>
          </string-name>
          , Manuela Speranza, Bernardo Magnini, and
          <string-name>
            <surname>Mohammed</surname>
            <given-names>R. H.</given-names>
          </string-name>
          <string-name>
            <surname>Qwaider</surname>
          </string-name>
          . 2016b.
          <article-title>Semantic interpretation of events in live soccer commentaries</article-title>
          .
          <source>In Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ), Napoli, Italy, December 5-
          <issue>7</issue>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Pontus</given-names>
            <surname>Stenetorp</surname>
          </string-name>
          , Sampo Pyysalo, Goran Topic´,
          <string-name>
            <surname>Tomoko</surname>
            <given-names>Ohta</given-names>
          </string-name>
          , Sophia Ananiadou, and
          <string-name>
            <surname>Jun'ichi Tsujii</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Brat: A web-based tool for NLP-assisted text annotation</article-title>
          .
          <source>In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, EACL '12</source>
          , pages
          <fpage>102</fpage>
          -
          <lpage>107</lpage>
          , Avignon, France. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Anthony J.</given-names>
            <surname>Viera</surname>
          </string-name>
          and
          <string-name>
            <surname>Joanne M. Garrett</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Understanding interobserver agreement: The kappa statistic</article-title>
          .
          <source>Family Medicine</source>
          ,
          <volume>37</volume>
          (
          <issue>5</issue>
          ):
          <fpage>360</fpage>
          -
          <lpage>363</lpage>
          ,
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Seid</given-names>
            <surname>Muhie</surname>
          </string-name>
          <string-name>
            <surname>Yimam</surname>
          </string-name>
          , Iryna Gurevych, Richard Eckart de Castilho, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Biemann</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Webanno: A flexible, web-based and visually supported system for distributed annotations</article-title>
          .
          <source>In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          , Sofia, Bulgaria, August. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>