=Paper=
{{Paper
|id=Vol-3004/paper14
|storemode=property
|title=Bureau for Rapid Annotation Tool: Collaboration can do More over Variety-oriented Annotations
|pdfUrl=https://ceur-ws.org/Vol-3004/paper14.pdf
|volume=Vol-3004
|authors=Zheng Wang,Shuo Xu
|dblpUrl=https://dblp.org/rec/conf/jcdl/WangX21
}}
==Bureau for Rapid Annotation Tool: Collaboration can do More over Variety-oriented Annotations==
EEKE 2021 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents Bureau for Rapid Annotation Tool: Collaboration can do More over Variety-oriented Annotations Zheng Wang Shuo Xu∗ wangz@istic.ac.cn xushuo@bjut.edu.cn Institute of Scientific and Technical Information of China Beijing University of Technology Haidian District, Beijing, P. R. China Chaoyang District, Beijing, P. R. China ABSTRACT Table 1: Two types of annotation collaborations used in previ- ous workbenches. A high-quality manually annotated corpus is crucial for many text mining and information extraction tasks. Several workbenches have Grounded Trusted been developed in the literature to facilitate collaborative annota- UIMA-type system[10] Teamwork [8, 16] tion. However, given the growing volumes of un-annotated docu- Annotation semantic constraint [13] Personal workspace TeamTat [16] ments, these variety-oriented annotation workbenches have many Multi-annotator analysis [8] Pairwise annotators comparison [8] shortcoming in terms of teamwork, quality control and time effort. For this purpose, we develop a novel workbench such that collabora- tion can do more over variety-oriented annotation. Our workbench and develop a workbench named as Bureau for Rapid Annotation is named as Bureau for Rapid Annotation Tool (Brat for short). Tool (Brat for short). Main functionalities include enhanced seman- Main functionalities include enhanced semantic constraint system, tic constraint system, Vim-like shortcut keys, annotation filter and Vim-like shortcut keys, annotation filter and graph-visualizing an- graph-visualizing annotation browser. Until now, over 500,000 men- notation browser. Until now, over 500,000 mentions have been an- tions have been annotated with our Brat workbench. notated with our Brat workbench. 2 FUNCTIONALITIES 1 INTRODUCTION 2.1 Enhanced Semantic Constraint System A high-quality manually annotated corpus is very crucial for many It is well known that not all parameters are valid to a specific rela- text mining and information extraction tasks [1, 3, 4, 6, 7, 14, 15]. tionship. To limit invalid annotated results for an annotation project, Several workbenches have been developed in the literature to fa- its manager can customize the schema at any time. Once the schema cilitate collaborative annotation [8, 11, 13, 16]. However, given the is modified, all involved annotated mentions will be adjusted corre- growing volumes of un-annotated documents, these variety-oriented spondingly. A readable name is usually assigned to each type of workbenches still have many shortcomings in terms of teamwork, entity and relation. In addition, a list of rules are also attached to ex- quality control and time effort. Let’s take the sentence ”Depend- pression the constraint conditions between parameters in each type ing on the model, a Tesla costs somewhere between 1 and 3.33 of relation. In this way, the understanding on entities and relations BTC” [2] as an example. A practical issue we face is whether or from the manager can be delivered to all annotators. not to assign ”Price” type to the mention ”between 1 and 3.33 BTC”. This actually depends on a consensus acknowledging Bitcoin as ac- 2.2 Vim-like Shortcut Key tual money [5, 17]. Reaching this consensus is extremely time-consuming and heavi- According to our observation, the conventional annotating opera- ly rely on two types of annotation collaborations in Table 1: ground- tions (marking, selecting and confirming [13]) is time-expensive to ed collaboration and trusted collaboration. By grounded collab- choose a proper candidate from more than 5 entity types or rela- oration, we mean that the resulting annotators are restricted with tionships. To speed up the annotation procedure, our workbench sounded pre-arrangements. For example, U-Compare only supports embeds many Vim-like shortcut keys [12]. In this time, one can an- named entity annotations in the UIMA-type system [10], which can notate smoothly an entity by the following steps (cf. Figure 2): 1) to avoid many conflicts. An alternative [13] takes the form of seman- move cursor and select a span of text with Figure 1, 2) to acknowl- tic constraints. In more details, a certain relationship should take edge one command from recommended candidates with TAB and parameters with specific entity types. As for trusted collaboration, ENTER, 3) to type leading characters and confirm entity types. Sim- YEDDA [16] recognized common gestures from BRAT [13] and ilar operations can be followed for relation mention annotation. It is embedded many functionalities including teamwork, multi-annotator worth noting that the key feature of this functionality is code auto- analysis and pairwise annotators comparison. Then, on the basis completion. This is based on enhanced semantic constraint system of various annotations, the inter-project agreement can be calcu- and polymorphic type inference [9]. lated. Another strategy of trusted collaboration, user-independent workspace, was utilized in TeamTat [8]. 2.3 Configurable Annotation Filter This paper combines these two types of annotation collabora- It’s unknown in advance how many mentions should be annotated tions to structure various mentions annotated by each annotator for a single document, especially a very long document. To correct wrong annotations in time and reduce the conflicts among multi- ∗ Corresponding author ple documents, a feasible solution is to only display the mentions Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 80 EEKE 2021 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents Figure 3: The visualization of the entity ”Disk” and its relevant information in the TFH-2020 corpus [4] via our browser Figure 1: Vim-like Shortcut Keys Mapping ACKNOWLEDGMENTS This work is supported partially by the Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDA16040504), National Key Research & Development Program of China (Grant No. 2019YFA0707202), and National Natural Science Foundation of China(Grant No. 71704169 and 72074014). We also thank Pro- fessor Yiming Jing and Rui Zheng for their assistance on how to understand the collaboration in the field of psychology. REFERENCES [1] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. Lecture Notes in Computer Science 4825 LNCS (2007), 722–735. [2] Jeff Benson. 2021. Here’s How Much a Ful- ly Loaded Tesla Model S Will Cost You in Bitcoin. https://decrypt.co/57071/heres-how-much-a-fully-loaded-tesla-model-s-will-cost-you-in-bitcoin [Online; accessed 16-Mars-2021]. Figure 2: The novel annotation procedure powered by Vim-like [3] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Hu- Shortcut Keys man Knowledge. In Proceedings of the 2008 ACM SIGMOD International Con- ference on Management of Data. 1247–1250. [4] Liang Chen, Shuo Xu, Lijun Zhu, Jing Zhang, Xiao-ping Lei, and Guancan Yang. 2020. A Deep Learning based Method for Extracting Semantic Information from Patent Documents. Scientometrics 125, 1 (2020), 289–312. with interested types in current workspace. Thereupon, we provide [5] Vanessa Dirwai. 2021. Should Christian- a configurable annotation filter by toggling or un-toggling entity s Trade Bitcoin And Other Cryptocurrencies? https://preciousearnings.medium.com/should-christians-trade-bitcoin-and-other-cryptocurrencies-8 types and relationships. [Online; accessed 22-August-2021]. [6] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Mur- phy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge Vault: 2.4 Graph-visualizing Browser a Web-scale Approach to Probabilistic Knowledge Fusion. In Proceedings of the In real-world scenario, it is not trivial to reach an agreement when 20th ACM SIGKDD International Conference. 601–610. [7] Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld. 2008. multiple annotators are involved, and an entity or relation is men- Open Information Extraction from the Web. Commun. ACM 51, 12 (2008), 68. tioned simultaneously in multiple documents. To inspect the un- [8] Rezarta Islamaj, Dongseop Kwon, Sun Kim, and Zhiyong Lu. 2020. TeamTat: A Collaborative Text Annotation Tool. CoRR abs/2004.11894 (2020). derling disagreements, our workbench can load and index all texts, [9] Steven L. Jenkins and Gary T. Leavens. 1995. Polymorphic Type Inference in mentions and their types, and then visualize them in a graph brows- Scheme. Computer Science Technical Reports 75 (1995). er, as illustrated in Figure 3. [10] Yoshinobu Kano, William A. Baumgartner Jr., Luke McCrohon, Sophia Ana- niadou, K. Bretonnel Cohen, Lawrence Hunter, and Jun’ichi Tsujii. 2009. U- Compare: Share and Compare Text Mining Tools with UIMA. Bioinform. 25, 15 3 CONCLUSION (2009), 1997–1998. [11] Mariana L. Neves and Ulf Leser. 2014. A Survey on Annotation Tools for the Many projects utilized our Brat workbench to annotate interested Biomedical Literature. Briefings Bioinform 15, 2 (2014), 327–340. [12] Kim Schulz. 2007. Hacking Vim: A Cookbook to Get the Most Out of The Latest entities and/or relations, and inspect potential conflicts over variety- Vim Editor. Packt Publishing Ltd. oriented annotations. Nowadays, over 500,000 mentions have been [13] Pontus Stenetorp, Sampo Pyysalo, Goran Topic, Tomoko Ohta, Sophia Anani- annotated with our Brat workbench. In the near future, the Vim- adou, and Jun’ichi Tsujii. 2012. BRAT: A Web-based Tool for NLP-Assisted Text Annotation. In Conference of the 13th European Chapter of the Associa- like shortcut keys will be strengthen further, and machine learning tion for Computational Linguistics, Walter Daelemans, Mirella Lapata, and Lluı́s methods will be incorporated to accelerate conflict inspection. Màrquez (Eds.). 102–107. 81 EEKE 2021 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents [14] Zheng Wang, Shuo Xu, and Lijun Zhu. 2018. Semantic Relation Extraction Aware of N-Gram Features from Unstructured Biomed- ical Text. Journal of Biomedical Informatics 86 (2018), 59–70. https://doi.org/10.1016/j.jbi.2018.08.011 [15] Shuo Xu, Xin An, Lijun Zhu, Yunliang Zhang, and Haodong Zhang. 2015. A CRF-based System for Recognizing Chemical Entity Mentions (CEMs) in Biomedical Literature. Journal of Cheminformatics 7, Suppl 1 (2015), S11. https://doi.org/10.1186/1758-2946-7-S1-S11 [16] Jie Yang, Yue Zhang, Linwei Li, and Xingxuan Li. 2018. YEDDA: A Light- weight Collaborative Text Span Annotation Tool. In Proceedings of the 56th An- nual Meeting Association for Computational Linguistics, Fei Liu and Thamar Solorio (Eds.). 31–36. [17] David Yermack. 2015. Chapter 2 - Is Bitcoin a Real Currency? An Economic Appraisal. In Handbook of Digital Currency, David Lee Kuo Chuen (Ed.). Aca- demic Press, San Diego, 31–43. 82