<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cooperative Annotation Tool</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alberto Zerbinati</string-name>
          <email>alberto.zerbinati@studenti.unipd.it</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Loreggia</string-name>
          <email>andrea.loreggia@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Contissa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federico Galli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francescca Lagioia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Sartor</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Legal Analytics, Annotation Web-Interface, XML schema</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CIRSFID Alma AI - University of Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>European University Institute</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Brescia</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Padova</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this work, we present SenTag 2.0, a web-based application for the semantic annotation of textual documents. SenTag provides an easy-to-use user interface through which multiple users can annotate a corpus of documents and produce XML files that can be validated against a schema. SenTag 2.0 aims at assisting the user during the tagging process and reducing errors in the output documents. ISWC-Posters-Demos-Industry 2022 (International Semantic Web Conference (ISWC) 2022: Posters, Demos, and Industry ∗Corresponding author.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In some areas of NLP, data is collected by building large corpora of annotated documents [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
To annotate the documents, researchers employ groups of domain experts, which are tasked
with identifying and describing the relevant components in the text. The resulting corpus
of annotated documents is then used to train artificial intelligence models [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. To identify
and describe entities, arguments, and other information in the documents, domain experts
usually adopt a predefined structured language [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], such as Extensible Markup Language
(XML). This allows the original document to be enriched with a semantic annotation that
identifies important information included in the text [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Here, we present SenTag 2.0, a
webbased application meant to provide a user-friendly interface that facilitates the annotation
process. It enables annotators to interact with documents without dealing directly with the
used markup language.Besides the easy-to-use tagging interface, the main contribution of
the present version– which diferentiates SenTag 2.0 from other solutions– is the integration
of a separate visualization of the annotation that allows interaction with a tagged document
through graphs that allow specifying relationships among arguments or other entities. We
also introduce the possibility to import XML files into the application in order to visualize
and continue the annotation process using SenTag 2.0. The full code is publicly available at
https://github.com/AlbertoZerbinati/sentag.
Tracks)
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        We report a non-exhaustive overview of recent works. SLATE [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is a lightweight annotation
tool that supports annotation at diferent scales (spans of characters, tokens, and lines, or a
document) and of diferent types (free text, labels, and links). YEDDA [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is a desktop application
that supports annotation of character spans with up to 8 categories using a GUI. It is designed to
be lightweight, with no external dependencies, though it only supports Python 2. INCEpTION
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a semantic annotation platform developed under the framework of the INCEpTION project.
The annotation process provides labels on each annotation, indicating the specific elements,
attributes, etc., to make their recognition easier. WebAnno is a generic web-based annotation
tool for distributed teams [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. HUMAN [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is a web-based annotation tool that covers several
annotation tasks on both textual and image data, and adopts an internal deterministic state
machine that enables researchers to chain diferent annotation tasks in an interdependent
manner. EntiTies [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is another web application that allows users to annotate a text from
scratch, to process it automatically, or to process it automatically and then correct it.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. SenTag</title>
      <p>
        Although being domain-agnostic, SenTag 2.0 is inspired by the eforts spent in the field of legal
analytics within the ADELE Project (Analytics for DEcision of LEgal cases)1. The tool was used
and tested by a team of legal experts to annotate judicial decisions, with excellent feedback
in terms of design, usability and functionality. SenTag 2.0 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] has been developed in Python
3.9 as a Django 3.2 app employing VUE 3 for the graphical interface. The application is based
on NER Annotator, from which it derives and expands the tagging part. We developed the
application to implement the following requirements: (i) security and user management; (ii)
multiple annotators and agreement score; (iii) intuitive interface for text annotation; (iv) graph
implementation for entities and relations. The main goal of the SenTag project is to support
domain-experts building manually annotated datasets of documents. Thanks to SenTag, domain
experts can annotate documents without any knowledge of the underlining markup language.
A Javascript module allows for a client-side implementation of the annotation, avoiding useless
transfers of information to the server. The information is sent to the server only when a user
saves the changes. All these modules interact with each other and store information in a SQLite
database. The internal representation of information is diferent from the ways it is presented
to users, according to the standard Model-View-Controller (MVC) design-pattern [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In the
following, we shall describe how we implement the requirements listed above.
Security and User Management. Being a web-based application, it is necessary to protect
documents from unauthorized access. It is also important to assign each authenticated user the
correct level of rights and their specific role in the process. There are three available groups
levelled in a hierarchical structure. Users in a group have the right to do all tasks allowed to
lower levels. These groups are: (i) Admin: this is the group with the highest level of rights.
Admin users can create, delete and update all users, and assign any rights. They can also assign
1ADELE (Analytics for DEcision of LEgal cases) Funded by the European Union’s Justice Programme (Grant
Agreement n. 101007420), information available at https://site.unibo.it/adele/en/project.
users to a specific group; (ii) Editor: this is the group with middle-level rights. Editor users
are in charge of: (i) uploading texts and XML schemas; (ii) grouping documents into tasks; (iii)
assigning tasks to schemas; (iv) assigning annotators to tasks they should perform; (v) checking
for annotators agreement; (vi) downloading annotated documents; (iii) Annotator: this is
the group with the lowest level of rights. Annotator users can only annotate the documents
assigned to them, using the set of tags specified in the schema associated with the task. Each
annotator can validate his or her work using the associated XML schema in order to check for
annotation errors or flaws; the annotation is enriched with graphs that can be used by annotators
to establish relationships among entities and/or arguments identified in the document.
Multiple Annotators and Agreement Score. Documents are assigned to one or multiple
tasks that should be accomplished by annotators. Thus a task corresponds to a set of documents,
a set of users assigned to that task, and a schema. Admins and editors can check the annotation
status of a task through the web-interface.It allows seeing some statistics about the quality
and the validity of the annotation performed on the document. In particular, it reports the
Krippendorf’s alpha [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] score for each document. This is a metric adopted to describe the level
of agreement for a group of annotators working on the same document. The score quantifies
how similarly annotators tagged the same text. Moreover, for each annotator, the interface
reports whether he or she has completed the annotation for a document. It also reports whether
each annotated document passed the validity check against the XML schema.Editors are entitled
to download the resulting XML files. This can be done at diferent levels of granularity, in
particular, they can download: i) the XMLs resulting from the annotation of a single annotator
on a single document; ii) the XMLs resulting from the annotation of all the annotators on a
single document; iii) the XMLs resulting from the annotation of a single annotator on all the
documents; iv) all the XMLs resulting from the annotation of all annotators on all documents.
Text Annotation. Annotation tags are meant to characterise important information in the
text. In order to do that, Extensible Markup Language (XML) schemas are employed to define
tags for the relevant entities that can be encountered in the documents and their hierarchy. Each
entity is indeed identified by a tag that is used to surround the corresponding text. Moreover,
each tag can be enriched with a set of attributes that describe possible relationships among
entities or augment the semantics of the tag. Thus, the XML schema specifying tags and
attributes for a certain document corpus is designed and defined a priori and associated with
each document (via the task). The schema specifies what tags and what attributes each user can
insert into the document. It can be used also to validate the output of the annotation phase.
Annotator users are in charge of this process. After being authenticated, each annotator can
see the set of tasks that are assigned to them and after selecting the task, they can see all the
documents that they should annotate.
      </p>
      <p>Graphs of Arguments and Relationships. During the tagging phase, it is possible to
visualise some of the entities identified in the text as nodes in a graph. Presently, the platform
allows two diferent graphs to be built (the graph of arguments and the graph of relationships),
but this feature can be easily extended to other types of graphs. A node in either graph appears
as soon as an annotator tags a part of the text with a specific tag. The XML schema is used
to specify what tags represent arguments and which ones represent entities. This is done by
adding the attribute   or the attribute     in the schema. Graph visualisation
is useful during the tagging process as it allows to establish relationships between entities in an
intuitive way.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work has been supported by the H2020 ERC Project “CompuLaw” (G.A. 833647); the ADELE
(Analytics for DEcision of LEgal Cases) project under the European Union’s Justice programme
(G.A. 101007420), and by the LAILA (Legal Analytics for Italian Law) project under the Italian
Ministry of Education and Research’s PRIN programme (G.A. 2017NCPZ22).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Poudyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Savelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ieven</surname>
          </string-name>
          , et al.,
          <article-title>Echr: Legal corpus for argument mining</article-title>
          ,
          <source>in: Proceedings of the 7th Workshop on Argument Mining</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lippi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Torroni</surname>
          </string-name>
          ,
          <article-title>Argumentation mining: State of the art and emerging trends</article-title>
          ,
          <source>ACM Transactions on Internet Technology (TOIT) 16</source>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lippi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pałka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Contissa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lagioia</surname>
          </string-name>
          , H.-W. Micklitz, G. Sartor,
          <string-name>
            <given-names>P.</given-names>
            <surname>Torroni</surname>
          </string-name>
          ,
          <article-title>Claudette: an automated detector of potentially unfair clauses in online terms of service</article-title>
          ,
          <source>Artificial Intelligence and Law</source>
          <volume>27</volume>
          (
          <year>2019</year>
          )
          <fpage>117</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. K.</given-names>
            <surname>Kummerfeld</surname>
          </string-name>
          ,
          <article-title>Slate: A super-lightweight annotation tool for experts</article-title>
          ,
          <source>in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Yedda: A lightweight collaborative text span annotation tool</article-title>
          , in
          <source>: Proceedings of ACL</source>
          <year>2018</year>
          ,
          <string-name>
            <given-names>System</given-names>
            <surname>Demonstrations</surname>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Klie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bugert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Boullosa</surname>
          </string-name>
          , et al.,
          <article-title>The INCEpTION platform: Machine-assisted and knowledge-oriented interactive annotation</article-title>
          ,
          <source>in: Proc. of the 27th International Conference on Computational Linguistics: System Demonstrations</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>R. E. De Castilho</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Mujdricza-Maydt</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Yimam</surname>
          </string-name>
          , et al.,
          <article-title>A web-based tool for the integrated annotation of semantic and syntactic structures</article-title>
          ,
          <source>in: Proc. of the Workshop on LT4DH</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>76</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ruiter</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. D'Sa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Reiners</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Alexandersson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Klakow</surname>
          </string-name>
          ,
          <article-title>Human: Hierarchical universal modular annotator</article-title>
          ,
          <source>in: EMNLP</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Feild</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Amello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lombardo</surname>
          </string-name>
          ,
          <string-name>
            <surname>Entities:</surname>
          </string-name>
          <article-title>An interface for annotating ties between entities in text</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Human Information Interaction and Retrieval</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>442</fpage>
          -
          <lpage>446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Loreggia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zerbinati</surname>
          </string-name>
          ,
          <article-title>Sentag: a web-based tool for semantic annotation of textual documents</article-title>
          , in: To appear
          <source>in the Proceedings of the 36th AAAI Conference on Artificial Intelligence</source>
          <year>2022</year>
          , AAAI Press,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Rayfield</surname>
          </string-name>
          ,
          <article-title>Web-application development using the model/view/controller design pattern</article-title>
          ,
          <source>Proceedings Fifth IEEE International Enterprise Distributed Object Computing Conference</source>
          (
          <year>2001</year>
          )
          <fpage>118</fpage>
          -
          <lpage>127</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Krippendorf</surname>
          </string-name>
          ,
          <article-title>Agreement and information in the reliability of coding</article-title>
          ,
          <source>Communication Methods and Measures</source>
          <volume>5</volume>
          (
          <year>2011</year>
          )
          <fpage>93</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>