=Paper= {{Paper |id=Vol-3004/paper14 |storemode=property |title=Bureau for Rapid Annotation Tool: Collaboration can do More over Variety-oriented Annotations |pdfUrl=https://ceur-ws.org/Vol-3004/paper14.pdf |volume=Vol-3004 |authors=Zheng Wang,Shuo Xu |dblpUrl=https://dblp.org/rec/conf/jcdl/WangX21 }} ==Bureau for Rapid Annotation Tool: Collaboration can do More over Variety-oriented Annotations== https://ceur-ws.org/Vol-3004/paper14.pdf
                  EEKE 2021 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


Bureau for Rapid Annotation Tool: Collaboration can do More
             over Variety-oriented Annotations
                            Zheng Wang                                                                         Shuo Xu∗
                         wangz@istic.ac.cn                                                       xushuo@bjut.edu.cn
    Institute of Scientific and Technical Information of China                             Beijing University of Technology
               Haidian District, Beijing, P. R. China                                    Chaoyang District, Beijing, P. R. China
ABSTRACT                                                                     Table 1: Two types of annotation collaborations used in previ-
                                                                             ous workbenches.
A high-quality manually annotated corpus is crucial for many text
mining and information extraction tasks. Several workbenches have
                                                                              Grounded                              Trusted
been developed in the literature to facilitate collaborative annota-
                                                                              UIMA-type system[10]                  Teamwork [8, 16]
tion. However, given the growing volumes of un-annotated docu-                Annotation semantic constraint [13]   Personal workspace TeamTat [16]
ments, these variety-oriented annotation workbenches have many                                                      Multi-annotator analysis [8]
                                                                                                                    Pairwise annotators comparison [8]
shortcoming in terms of teamwork, quality control and time effort.
For this purpose, we develop a novel workbench such that collabora-
tion can do more over variety-oriented annotation. Our workbench
                                                                             and develop a workbench named as Bureau for Rapid Annotation
is named as Bureau for Rapid Annotation Tool (Brat for short).
                                                                             Tool (Brat for short). Main functionalities include enhanced seman-
Main functionalities include enhanced semantic constraint system,
                                                                             tic constraint system, Vim-like shortcut keys, annotation filter and
Vim-like shortcut keys, annotation filter and graph-visualizing an-
                                                                             graph-visualizing annotation browser. Until now, over 500,000 men-
notation browser. Until now, over 500,000 mentions have been an-
                                                                             tions have been annotated with our Brat workbench.
notated with our Brat workbench.
                                                                             2 FUNCTIONALITIES
1 INTRODUCTION
                                                                             2.1 Enhanced Semantic Constraint System
A high-quality manually annotated corpus is very crucial for many
                                                                             It is well known that not all parameters are valid to a specific rela-
text mining and information extraction tasks [1, 3, 4, 6, 7, 14, 15].
                                                                             tionship. To limit invalid annotated results for an annotation project,
Several workbenches have been developed in the literature to fa-
                                                                             its manager can customize the schema at any time. Once the schema
cilitate collaborative annotation [8, 11, 13, 16]. However, given the
                                                                             is modified, all involved annotated mentions will be adjusted corre-
growing volumes of un-annotated documents, these variety-oriented
                                                                             spondingly. A readable name is usually assigned to each type of
workbenches still have many shortcomings in terms of teamwork,
                                                                             entity and relation. In addition, a list of rules are also attached to ex-
quality control and time effort. Let’s take the sentence ”Depend-
                                                                             pression the constraint conditions between parameters in each type
ing on the model, a Tesla costs somewhere between 1 and 3.33
                                                                             of relation. In this way, the understanding on entities and relations
BTC” [2] as an example. A practical issue we face is whether or
                                                                             from the manager can be delivered to all annotators.
not to assign ”Price” type to the mention ”between 1 and 3.33 BTC”.
This actually depends on a consensus acknowledging Bitcoin as ac-
                                                                             2.2 Vim-like Shortcut Key
tual money [5, 17].
   Reaching this consensus is extremely time-consuming and heavi-            According to our observation, the conventional annotating opera-
ly rely on two types of annotation collaborations in Table 1: ground-        tions (marking, selecting and confirming [13]) is time-expensive to
ed collaboration and trusted collaboration. By grounded collab-              choose a proper candidate from more than 5 entity types or rela-
oration, we mean that the resulting annotators are restricted with           tionships. To speed up the annotation procedure, our workbench
sounded pre-arrangements. For example, U-Compare only supports               embeds many Vim-like shortcut keys [12]. In this time, one can an-
named entity annotations in the UIMA-type system [10], which can             notate smoothly an entity by the following steps (cf. Figure 2): 1) to
avoid many conflicts. An alternative [13] takes the form of seman-           move cursor and select a span of text with Figure 1, 2) to acknowl-
tic constraints. In more details, a certain relationship should take         edge one command from recommended candidates with TAB and
parameters with specific entity types. As for trusted collaboration,         ENTER, 3) to type leading characters and confirm entity types. Sim-
YEDDA [16] recognized common gestures from BRAT [13] and                     ilar operations can be followed for relation mention annotation. It is
embedded many functionalities including teamwork, multi-annotator            worth noting that the key feature of this functionality is code auto-
analysis and pairwise annotators comparison. Then, on the basis              completion. This is based on enhanced semantic constraint system
of various annotations, the inter-project agreement can be calcu-            and polymorphic type inference [9].
lated. Another strategy of trusted collaboration, user-independent
workspace, was utilized in TeamTat [8].                                      2.3 Configurable Annotation Filter
   This paper combines these two types of annotation collabora-          It’s unknown in advance how many mentions should be annotated
tions to structure various mentions annotated by each annotator          for a single document, especially a very long document. To correct
                                                                         wrong annotations in time and reduce the conflicts among multi-
   ∗
     Corresponding author                                                ple documents, a feasible solution is to only display the mentions
  Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).



                                                                        80
                   EEKE 2021 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents




                                                                                Figure 3: The visualization of the entity ”Disk” and its relevant
                                                                                information in the TFH-2020 corpus [4] via our browser
           Figure 1: Vim-like Shortcut Keys Mapping

                                                                                ACKNOWLEDGMENTS
                                                                                This work is supported partially by the Strategic Priority Research
                                                                                Program of Chinese Academy of Sciences (Grant No. XDA16040504),
                                                                                National Key Research & Development Program of China (Grant
                                                                                No. 2019YFA0707202), and National Natural Science Foundation
                                                                                of China(Grant No. 71704169 and 72074014). We also thank Pro-
                                                                                fessor Yiming Jing and Rui Zheng for their assistance on how to
                                                                                understand the collaboration in the field of psychology.

                                                                                REFERENCES
                                                                                 [1] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak,
                                                                                     and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. Lecture
                                                                                     Notes in Computer Science 4825 LNCS (2007), 722–735.
                                                                                 [2] Jeff     Benson.      2021.             Here’s      How     Much      a     Ful-
                                                                                     ly     Loaded     Tesla    Model      S    Will    Cost    You    in    Bitcoin.
                                                                                     https://decrypt.co/57071/heres-how-much-a-fully-loaded-tesla-model-s-will-cost-you-in-bitcoin
                                                                                     [Online; accessed 16-Mars-2021].
Figure 2: The novel annotation procedure powered by Vim-like                     [3] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor.
                                                                                     2008. Freebase: A Collaboratively Created Graph Database for Structuring Hu-
Shortcut Keys                                                                        man Knowledge. In Proceedings of the 2008 ACM SIGMOD International Con-
                                                                                     ference on Management of Data. 1247–1250.
                                                                                 [4] Liang Chen, Shuo Xu, Lijun Zhu, Jing Zhang, Xiao-ping Lei, and Guancan Yang.
                                                                                     2020. A Deep Learning based Method for Extracting Semantic Information from
                                                                                     Patent Documents. Scientometrics 125, 1 (2020), 289–312.
with interested types in current workspace. Thereupon, we provide                [5] Vanessa         Dirwai.        2021.                     Should       Christian-
a configurable annotation filter by toggling or un-toggling entity                   s         Trade        Bitcoin        And         Other       Cryptocurrencies?
                                                                                     https://preciousearnings.medium.com/should-christians-trade-bitcoin-and-other-cryptocurrencies-8
types and relationships.                                                             [Online; accessed 22-August-2021].
                                                                                 [6] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Mur-
                                                                                     phy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge Vault:
2.4 Graph-visualizing Browser                                                        a Web-scale Approach to Probabilistic Knowledge Fusion. In Proceedings of the
In real-world scenario, it is not trivial to reach an agreement when                 20th ACM SIGKDD International Conference. 601–610.
                                                                                 [7] Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld. 2008.
multiple annotators are involved, and an entity or relation is men-                  Open Information Extraction from the Web. Commun. ACM 51, 12 (2008), 68.
tioned simultaneously in multiple documents. To inspect the un-                  [8] Rezarta Islamaj, Dongseop Kwon, Sun Kim, and Zhiyong Lu. 2020. TeamTat: A
                                                                                     Collaborative Text Annotation Tool. CoRR abs/2004.11894 (2020).
derling disagreements, our workbench can load and index all texts,               [9] Steven L. Jenkins and Gary T. Leavens. 1995. Polymorphic Type Inference in
mentions and their types, and then visualize them in a graph brows-                  Scheme. Computer Science Technical Reports 75 (1995).
er, as illustrated in Figure 3.                                                 [10] Yoshinobu Kano, William A. Baumgartner Jr., Luke McCrohon, Sophia Ana-
                                                                                     niadou, K. Bretonnel Cohen, Lawrence Hunter, and Jun’ichi Tsujii. 2009. U-
                                                                                     Compare: Share and Compare Text Mining Tools with UIMA. Bioinform. 25, 15
3 CONCLUSION                                                                         (2009), 1997–1998.
                                                                                [11] Mariana L. Neves and Ulf Leser. 2014. A Survey on Annotation Tools for the
Many projects utilized our Brat workbench to annotate interested                     Biomedical Literature. Briefings Bioinform 15, 2 (2014), 327–340.
                                                                                [12] Kim Schulz. 2007. Hacking Vim: A Cookbook to Get the Most Out of The Latest
entities and/or relations, and inspect potential conflicts over variety-             Vim Editor. Packt Publishing Ltd.
oriented annotations. Nowadays, over 500,000 mentions have been                 [13] Pontus Stenetorp, Sampo Pyysalo, Goran Topic, Tomoko Ohta, Sophia Anani-
annotated with our Brat workbench. In the near future, the Vim-                      adou, and Jun’ichi Tsujii. 2012. BRAT: A Web-based Tool for NLP-Assisted
                                                                                     Text Annotation. In Conference of the 13th European Chapter of the Associa-
like shortcut keys will be strengthen further, and machine learning                  tion for Computational Linguistics, Walter Daelemans, Mirella Lapata, and Lluı́s
methods will be incorporated to accelerate conflict inspection.                      Màrquez (Eds.). 102–107.




                                                                           81
                      EEKE 2021 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


[14] Zheng Wang, Shuo Xu, and Lijun Zhu. 2018.                 Semantic Relation
     Extraction Aware of N-Gram Features from Unstructured Biomed-
     ical Text.        Journal of Biomedical Informatics 86 (2018), 59–70.
     https://doi.org/10.1016/j.jbi.2018.08.011
[15] Shuo Xu, Xin An, Lijun Zhu, Yunliang Zhang, and Haodong Zhang. 2015.
     A CRF-based System for Recognizing Chemical Entity Mentions (CEMs) in
     Biomedical Literature. Journal of Cheminformatics 7, Suppl 1 (2015), S11.
     https://doi.org/10.1186/1758-2946-7-S1-S11
[16] Jie Yang, Yue Zhang, Linwei Li, and Xingxuan Li. 2018. YEDDA: A Light-
     weight Collaborative Text Span Annotation Tool. In Proceedings of the 56th An-
     nual Meeting Association for Computational Linguistics, Fei Liu and Thamar
     Solorio (Eds.). 31–36.
[17] David Yermack. 2015. Chapter 2 - Is Bitcoin a Real Currency? An Economic
     Appraisal. In Handbook of Digital Currency, David Lee Kuo Chuen (Ed.). Aca-
     demic Press, San Diego, 31–43.




                                                                                      82