=Paper=
{{Paper
|id=Vol-2936/paper-60
|storemode=property
|title=HUKB at ChEMU 2021 Task 2: Anaphora Resolution
|pdfUrl=https://ceur-ws.org/Vol-2936/paper-60.pdf
|volume=Vol-2936
|authors=Kojiro Machi,Masaharu Yoshioka
|dblpUrl=https://dblp.org/rec/conf/clef/MachiY21
}}
==HUKB at ChEMU 2021 Task 2: Anaphora Resolution==
HUKB at ChEMU 2021 Task 2: Anaphora Resolution Kojiro Machi1 , Masaharu Yoshioka1,2,3 1 Graduate School of Information Science and Technology, Hokkaido University, N14 W9, Kita-ku, Sapporo-shi, Hokkaido, Japan 2 Faculty of Information Science and Technology, Hokkaido University 3 Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University Abstract This paper describes our system for the ChEMU 2021 task 2 of anaphora resolution for extracting chem- ical reactions from patents. We divide the task into two subtasks: (1) span detection of mentions that include antecedent and anaphor; and (2) mention classification and relation detection. For the first task, we use a deep learning-based approach. For the second task, we use a rule-based approach that uses features related to chemical reactions by ChemicalTagger, which is a state-of-the-art text-mining tool for chemistry. Our system obtained an exact-match F-score of 0.6907 and a relaxed match F-score of 0.7459. Keywords Anaphora resolution, Chemical patents, ChemicalTagger 1. Introduction Chemical patents are useful information to extract new chemical discoveries because the chemical reactions are usually disclosed in patents [1]. There are several research efforts to extract chemical reaction information from these documents [2, 3]. The ChEMU (Cheminformatics Elsevier Melbourne University) lab aims to develop infor- mation extraction techniques for chemical patents and to provide a ChEMU 2021 anaphora resolution task (ChEMU-Ref task) that extracts five types of anaphoric relationships: COREFER- ENCE, TRANSFORMED, REACTION_ASSOCIATED, WORK_UP, and CONTAINED [4]. Recently, deep learning-based approaches have shown high performance in text mining [5], and the approach has also been applied to coreference resolution [6]. However, it can be difficult for a system that uses only a deep learning-based approach to solve the ChEMU task because this task requires the whole context of a document to detect mentions and relationships. For example, the target compound of a reaction is often written in headings and at the end of procedures. In addition, even if the same words and actions are used in multiple sentences, the label of mentions can be different depending on the objective. Another approach for extracting chemical information from the chemical literature is a rule- based one. ChemicalTagger [7] is a state-of-the-art system that can recognize chemical named CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania " machi@eis.hokudai.ac.jp (K. Machi); yoshioka@ist.hokudai.ac.jp (M. Yoshioka) ~ https://www-kb.ist.hokudai.ac.jp/yoshioka/ (M. Yoshioka) 0000-0002-2096-1218 (M. Yoshioka) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) entities and procedures for a chemical reaction process using predefined rules. In the ChEMU 2020 task, Lowe et al., [8] used this system as a core component and submitted the second-best team results for the event extraction task. Therefore, we assume the information extracted from ChemicalTagger can be used as a clue to identify the relationship for the ChEMU-Ref task. Therefore, in this paper, we have developed a pipeline system for the task that uses both deep learning-based and rule-based methods. We divided the task into two subtasks: (1) span detection of mentions that include antecedents and anaphora; and (2) mention classification and relation detection. For the first step, the system solves a named entity recognition (NER) problem using BioBERT [9], one of the state-of-the-art deep learning-based methods for NER. For the second step, the system solves a relation detection problem using rules by regex and features of ChemicalTagger, a semantic natural language processing tool for chemical literature. 2. Related Works 2.1. ChemicalTagger ChemicalTagger is a text-mining tool for chemical literature. ChemicalTagger annotates terms and phrases related to a chemical synthesis procedure using a chemical entity tagger (OS- CAR4 [10]), a regex tagger, and a part-of-speech tagger; then, phrases are parsed by ANTLR [11]. Figure 1 shows examples of parsed trees generated by ChemicalTagger. Both of the Action- Phrases in the figure are identified by verbs defined in dictionaries such as heated () and diluted ( ). Moreover, chemicals ( ), parameters ( , ), and apparatuses ( ) are annotated. As shown in the right- hand tree, can be given a role using grammatical information. Figure 1: Examples of parsed trees generated by ChemicalTagger 2.2. BioBERT BioBERT [9] is a domain-specific language model pretrained on biomedical documents. BioBERT shows the state-of-the-art scores on several chemical NER datasets. In the original implementa- tion of fine-tuning for the chemical NER task, given tokens are decomposed into subwords and then the labels at the beginning of the tokens are predicted by BioBERT. 3. Method We have developed a pipeline system for the task that uses both deep learning-based and rule- based methods. While features obtained from ChemicalTagger are useful, the predefined rules of ChemicalTagger are insufficient for mention detection because patent document descriptions vary. In addition, making syntactic patterns for mention detection is expensive. Therefore, for the first step, we used BioBERT, one of the state-of-the-art NER systems based on deep learning, for the span detection of candidates that include antecedents and anaphora. For the second step, we constructed rules for mention classification and relation detection using features generated by ChemicalTagger and regex patterns. Followings are guidelines for constructing rules: • Construction and evaluation of rules: The rules are developed using a training set and evaluated on a development set. • Simple rules: Due to the wide varieties of expressions for anaphora, it is necessary to make complex rules for these expressions to achieve better accuracy. However, it was difficult to construct large number of rules for increasing recall. Therefore, we constructed simple rules that covers the varieties to increase F-score on the development set. 3.1. Candidate Mention Detection In this stage, we define candidate mention detection as a NER problem. These candidates are used to identify antecedents and anaphora of the previously mentioned five types of relations. We used BioBERT-Base v1.11 for the NER model. For the tokenization, we used parsed entities by ChemicalTagger as tokens. Hyperparameters are the following: max sequence length=384 (this is enough to cover all of sequences contained in training and development sets), batch size=32, learning rate=1e-5, and the number of epoch=50. The ChEMU-Ref corpus contains discontinuous mentions and overlapped mentions. A method to take a discontinuous mention into account is an extended BIO format using DB and DI for the discontinuous mention (BIOHD format) [12]. We used a similar approach, as shown in Table 1. Here, B and I represent the beginning and inside of the continuous mention, respectively, and DH and DI represent the head and inside of the discontinuous mention, respectively. Because the overlapped mentions are often detected using patterns and features obtained from ChemicalTagger, they are not covered in this stage, and the longest entity is used for training when multiple entities are overlapped. 1 https://github.com/dmis-lab/biobert Table 1 Example of tokenization labels Method dimethyl formamide ( DMF ) ( 50 mL ) BIOHD [12] DB DI O B O DI DI DI DI Ours B-DH I-DH O B O B-DI I-DI I-DI I-DI 3.2. Relation Detection In this stage, the candidates extracted in the previous stage are classified into antecedents and anaphora (five types of mentions) and their relations are detected. The relation is extracted by a rule-based system that uses regex and rules from features obtained from ChemicalTagger. 3.2.1. Section Detection A chemical patent usually consists of three parts: heading, synthesis, and work-up (Figure 2). For the visualizations of the relations, brat annotation tool [13] is used in this paper. Identification Figure 2: Example of section separation of these parts is important for this task because it is useful for the relation detection. To give an example of the advantage of this identification, the heading part does not contain relations, except for COREFERENCE. Another example is when a word is used in the same action phrase; while the word has independent relations, the relation can be detected by checking which part the word belongs to. Therefore, we separated the heading and synthesis sections by finding the first sentence that contains an action phrase except and separated the synthesis and work-up by finding an action phrase based on previous research [2] (Table 2). Table 2 Action phrases at the start of the work-up section [2] Concentrate, Degass, Dry, Extract, Filter, Partition, Precipitate, Purify, Work-up Recover, Remove, Wash, Quench Other Add, ApparatusAction, Cool, Dissolve, Heat, Stir, Synthesize, Wait, Yield 3.2.2. Workflow of Relation Detection Figure 3 shows an overview of the procedure for relation detection. Because COREFERENCE have different characteristics from the other relations, for example, it can be found by using a simple regex pattern such as “antecedent (anaphor)”, our system detect simple COREFERENCE relations for the first step and the remaining five relations including COREFERENCE for the second step. For the first step, our system detects simple COREFERENCE relations that can be found by using regex patterns and specific rules. Regex patterns used in this step are shown in Table 3. Because the anaphor found in a discontinuous mention (fifth regex pattern) requires only the chemical substance and the amount in its antecedent, the phrase in the antecedent is extracted by splitting the original noun phrase using prepositions “in, ” “of, ” and conjunction “and,” then, the anaphor is used as the antecedent for the next anaphor. Next, to find the COREFERENCE of a target compound of that is usually written again at the end of the work-up with its yield, we assume the last mention candidate in the heading to be the target compound and make the relation to the candidate in the work-up with its yield as the COREFERENCE. Then, for solvent detection that has a specific string and relation, the system ensures the string of a candidate anaphor contains “solvent” or “volatile.” When the anaphor contains a keyword, the system searches antecedents from the solvents found by ChemicalTagger. For the second step, the system detects the remaining relations using the rules and the features generated by ChemicalTagger. Our approach to relation detection starts with classifying mention candidates detected by BioBERT into antecedent and anaphor by using the label added by ChemicalTagger. The system classifies the mentions that have an label (chemical entity) for antecedent candidates except for the case where the terms exist in the Table 3 Regex patterns for COREFERENCE detection between antecedent A and ⁓⁓⁓⁓⁓⁓⁓anaphor⁓⁓B; dcA represents discontinuous A pattern example A ([^()]*B[^()]*) 3-Ethynyl-4-hydrocybenzaldehyde (32)⁓⁓ A.{0,20} as B the title compound (0.166 g, 87%) as a⁓⁓⁓⁓⁓⁓ white⁓⁓⁓⁓ solid A: {0,1}B Intermediate1: 5-Bromo-7-chloroindolin-2-one ⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓ A\nB\n Intermediate 11\n2-(tert-bytyl)-5-methoxyisonicotinaldehyde\n ⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓⁓ dcA B dcA dimethyl formamide (DMF) ⁓⁓⁓⁓ (50mL) Figure 3: Overview of the procedure for the relation detection phrase (because the chemical entity can be an antecedent in the case that it is written at the end of the work-up as the target compound). Other mentions (usually noun phrases) are classified as anaphor candidates. After candidates are identified, the system processes them in their order of appearance. When an anaphor is found, the system sets the relationship between the anaphor and antecedent candidates that exist in phrases before the anaphor and that are not connected to the other anaphor. The type of relationship is determined using the procedures described in Figure 3. For the first step of mention detection, if an anaphor is an , the anaphor is assigned a CONTAINED relation because most words in this category contain chemical entities. Next, the system checks whether the anaphor belongs to the work-up section (when the WORK_UP flag is True) or not. If it belongs to the work-up section, the anaphor is assigned a WORK_UP relation. The WORK_UP flag is set to True when the action phrase belongs to the work-up phrase (Table 2). This flag is set to False when the Yield phrase with the parameter is found because the phrase usually appears at the end of the work-up section. Next, the system check counts the antecedent candidates. If there are multiple antecedents, the anaphor is assigned a REACTION_ASSOCIATED relation because the remaining labels, TRANSFORMED and COREFERENCE (except the previously extracted ones by regex and rules from ChemicalTagger), basically have one antecedent. Finally, the system checks for the existence of any action phrase between an antecedent and the previous phrase. If it exists, the anaphor is assigned a TRANSFORMED relation. In other cases, the anaphor is assigned a COREFERENCE relation because the antecedent is supposed to be unchanged. 3.3. Post-processing When terms A and B are COREFERENCE and B and C are COREFERENCE, A and C should also be COREFERENCE. To consider this issue, we used a post-processing tool distributed by the organizers2 that creates a link like that between A and C in the previous example. 4. Main Results For the evaluation phase, the ChEMU-Ref task employs precision, recall, and F-score calculated by the BRATEval tool3 . Table 4 shows the evaluation result of our system prediction on the test set. The evaluation was conducted by the task organizers. Here, our system’s output was submitted after the evaluation phase. Our system obtained an exact-match F-score of 0.6907 and a relaxed match F-score of 0.7459. Table 4 Results of relation detection on the test set Exact Relaxed Relation Precision Recall F-score Precision Recall F-score COREFERENCE 0.6956 0.5319 0.6028 0.7868 0.6016 0.6819 CONTAINED 0.7214 0.6824 0.7014 0.7929 0.7500 0.7708 REACTION_ASSOCIATED 0.6680 0.6803 0.6741 0.7224 0.7357 0.7290 TRANSFORMED 0.6611 0.7169 0.6879 0.6611 0.7169 0.6879 WORK_UP 0.7467 0.7403 0.7435 0.7929 0.7861 0.7895 Overall 0.7132 0.6696 0.6907 0.7702 0.7231 0.7459 5. Error Analysis In this section, we discuss our system by using a development set. Because our system employs a pipeline approach, we examine candidate mention detection and relation detection. 2 https://raw.githubusercontent.com/yuan-li/chemu2021/master/apply-transitive-closure.py 3 https://bitbucket.org/nicta_biomed/brateval/src/master/ 5.1. Candidate Mention Detection Table 5 shows the evaluation results of candidate detection by our NER system on the develop- ment set in terms of considering overlapped mentions of candidates and when not considering the overlapped mentions. Our system obtained an exact-match F-score of 0.9885 and a relaxed match F-score of 0.9967 when excluding overlapped entities. Our system obtained an exact-match F-score of 0.9820 and a relaxed match F-score of 0.9902 when considering all entities. Table 5 Candidate detection results on the development set Exact Relaxed Entities Precision Recall F-score Precision Recall F-score Except overlapped 0.9865 0.9906 0.9885 0.9947 0.9987 0.9967 All 0.9865 0.9776 0.9820 0.9947 0.9857 0.9902 In this stage, the following errors were found: • Boundary detection by ChemicalTagger tokenization • Lacking context There are several cases in which ChemicalTagger failed to identify appropriate terms and sentence boundaries in the tokenization process. “)(hereinafter” in document 0735 is an example of term boundary error. In this example, ChemicalTagger extracted this string as one token (an- notated as a preposition: ), but the first character ) should be separated from “(hereinafter” to identify the appropriate boundary of a chemical entity. “... to give the compound I-1256 (332mg, yield 52%)” in document 0429 is an example of sentence boundary error. This phrase was split into two sentences because I-1256 was annotated as a token, which is used at the end of a sentence. There are chemical entities that are not included in relations. Because chemical entities in heading sections are sometimes written as one sentence, our system cannot consider whether candidates in other sentences have relations. 5.2. Relation Detection Table 4 shows the evaluation results of our system prediction on the development set. Our system obtained an F-score of 0.7569 in relation detection and an F-score of 0.8349 in mention detection for exact match. Figure 4 shows the confusion matrices of relation detection. Here, the abbreviations are CR (COREFERENCE), CT (CONTAINED), WU (WORK_UP), RA (REACTION_ASSOCIATED) and TR (TRANSFORMED). In this stage, the following errors were found: • Boundary detection between the synthesis and work-up sections • Multiple mentions across relations • Other issues caused by a lack of rules Table 6 Results for the development set Mention (anaphor) Relation Relation Precision Recall F-score Precision Recall F-score COREFERENCE 0.9372 0.7834 0.8535 0.8117 0.5897 0.6831 CONTAINED 0.9118 0.9118 0.9118 0.8209 0.7971 0.8088 REACTION_ASSOCIATED 0.7942 0.7383 0.7652 0.7753 0.7408 0.7577 TRANSFORMED 0.7179 0.7850 0.7500 0.7179 0.7850 0.7500 WORK_UP 0.8241 0.8994 0.8601 0.8073 0.7789 0.7928 All 0.8475 0.8227 0.8349 0.7972 0.7206 0.7569 Figure 4: Relation detection confusion matrices 5.2.1. Boundary Detection between the Synthesis and Work-up Sections on the Development Set Boundary detection between the synthesis and the work-up section is important for our system because our system classifies an anaphor as WORK_UP when the anaphor is included in the work-up section divided by action phrase (Table 2). Typical cases of false positives in WORK_UP are caused by the work-up action phrases that are found in the synthesis section. Figure 5 shows an example of this boundary detection error. Because the “Degassed” phrase is used for the detection of the first action phrase of the work-up, our system detects the work-up start from the first sentence. As a result, the greyish suspension in the second sentence was annotated as WORK_UP instead of REACTION_ASSOCIATED. In contrast, false negatives of WORK_UP are boundaries that require knowledge of chemistry. Another type of boundary detection error is the failure to recognize the work-up process shown in Figure 6. In this case, adding chemical entities is common practice for the reaction process, and our system assumes this operation is for the synthesis process. However, it is common to add material that is not directly related to the reaction for the work-up process. It is necessary to have chemical knowledge to determine that this operation is for the work-up process. As a result, the phases is annotated as REACTION_ASSOCIATED instead of WORK_UP. Figure 5: Work-up phrase in the synthesis section Figure 6: Boundary of Work-up and overlapped mentions 5.2.2. Multiple Mention Across Relations Our system cannot detect a mention that has multiple labels except for COREFERENCE because all anaphora is given one label except for COREFERENCE, as shown in Figure 3. For example, the phases in Figure 6 is labeled as REACTION_ASSOCIATED and cannot be labeled as WORK_UP. 5.2.3. Other Issues Other issues we found were as follows. • As shown in Table 6 and Figure 4, recall of COREFERENCE relations was comparatively lower than others because of an insufficient number of rules. In addition, to improve the recall of the COREFERENCE, we used particular types of phrases that are used for COREFERENCE. We use all solvents (chemical entities) found by ChemicalTagger that contain the string “solvent” as candidates to identify COREFERENCE. By using this rule, the system generates 47 false positives out of 105 results for no relation. • Because our relation extraction system relies on the classification of mention candidates (antecedents or anaphora) to set the relations, a mention candidate misclassification causes multiple errors in relation detection. As shown in an example in Figure 7, the misclassification of the term celite as an anaphor made two work-up relations to the previous antecedents and removed two work-up relations from the last mention. • Chemical patents occasionally refer to procedures of other chemical reactions. Our system does not work well in this case because the documents omit detailed procedures. Figure 7: Negative effects of one entity misclassification 6. Conclusion We propose a hybrid system that uses a state-of-the-art chemical text-mining tool for features and a deep learning-based approach to bridge the gap of entity between the tool and the ChEMU-Ref task. In the mention candidate detection step, issues related to boundary detection and a lack of context were found. To develop boundary detection, we need to use other tokenization tool that splits a sequence into smaller tokens (e.g. https://github.com/spyysalo/standoff2conll) and construct rules that consider the whole context of a document. In the mention relation detection step, several issues were found to be caused by gaps of tags set between ChemicalTagger and the ChEMU-Ref task and by insufficient rules in terms of quantity and clarity that we constructed. To develop our system, we need to analyze the gap and improve the rules of the system. For example, it is important to add a tag for mention classification when ChemicalTagger misses the tag such as celite in Figure 7. In addition, we need to reconsider the method to detect the work-up section, such as labels for a start of a work-up section and more complex rules. Another direction is using a deep learning-based approach for classifying mentions or end-to-end relation detection itself. Acknowledgment We would like to thank the task organizers for providing the dataset. This work was partially supported by JSPS KAKENHI Grant Number 19K22888. References [1] M. Bregonje, Patents: A unique source for scientific technical information in chem- istry related industry?, World Patent Information 27 (2005) 309–315. URL: https:// www.sciencedirect.com/science/article/pii/S0172219005000736. doi:https://doi.org/ 10.1016/j.wpi.2005.05.003. [2] D. M. Lowe, Extraction of chemical structures and reactions from the literature, Ph.D. thesis, University of Cambridge, 2012. [3] S. H. M. Mehr, M. Craven, A. I. Leonov, G. Keenan, L. Cronin, A uni- versal system for digitization and automatic execution of the chemical synthesis literature, Science 370 (2020) 101–108. URL: https://science. sciencemag.org/content/370/6512/101. doi:10.1126/science.abc2986. arXiv:https://science.sciencemag.org/content/370/6512/101.full.pdf. [4] B. Fang, C. Druckenbrodt, S. A. Akhondi, J. He, T. Baldwin, K. Verspoor, ChEMU-ref: A corpus for modeling anaphora resolution in the chemical domain, in: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics, Online, 2021, pp. 1362–1375. URL: https://www.aclweb.org/anthology/2021.eacl-main.116. [5] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [6] K. Lee, L. He, M. Lewis, L. Zettlemoyer, End-to-end neural coreference resolution, arXiv preprint arXiv:1707.07045 (2017). [7] L. Hawizy, D. M. Jessop, N. Adams, P. Murray-Rust, Chemicaltagger: A tool for semantic text-mining in chemistry, Journal of cheminformatics 3 (2011) 1–13. [8] D. M. Lowe, J. Mayfield, Extraction of reactions from patents using grammars, in: L. Cappellato, C. Eickhoff, N. Ferro, A. Névéol (Eds.), Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, September 22-25, 2020, volume 2696 of CEUR Workshop Proceedings, CEUR-WS.org, 2020. URL: http://ceur-ws.org/ Vol-2696/paper_221.pdf. [9] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics (2019). doi:10.1093/bioinformatics/btz682. [10] D. M. Jessop, S. E. Adams, E. L. Willighagen, L. Hawizy, P. Murray-Rust, Oscar4: a flexible architecture for chemical text-mining, Journal of cheminformatics 3 (2011) 1–12. [11] T. Parr, The definitive ANTLR reference: building domain-specific languages, Pragmatic Bookshelf, 2007. [12] B. Tang, Q. Chen, X. Wang, Y. Wu, Y. Zhang, M. Jiang, J. Wang, H. Xu, Recognizing disjoint clinical concepts in clinical text using machine learning-based methods, in: AMIA annual symposium proceedings, volume 2015, American Medical Informatics Association, 2015, p. 1184. [13] P. Stenetorp, G. Topić, S. Pyysalo, T. Ohta, J.-D. Kim, J. Tsujii, Bionlp shared task 2011: Supporting resources, in: Proceedings of BioNLP Shared Task 2011 Workshop, Association for Computational Linguistics, Portland, Oregon, USA, 2011, pp. 112–120. URL: http: //www.aclweb.org/anthology/W11-1816.