HUKB at ChEMU 2022 Task 1: Expression-Level Information Extraction Kojiro Machi1 , Masaharu Yoshioka1,2,3 1 Graduate School of Information Science and Technology, Hokkaido University, N14 W9, Kita-ku, Sapporo-shi, Hokkaido, Japan 2 Faculty of Information Science and Technology, Hokkaido University 3 Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University Abstract This paper describes our results for the three tasks at ChEMU 2022: Task 1a (named entity recognition), Task 1b (event extraction), and Task 1c (anaphora resolution). We adopted a hybrid approach using deep learning models and a small set of post-processing rules for these tasks. For Tasks 1b and 1c, we adopted a pipeline approach for relation extraction, which combined mention detection with relation classification. In addition, we proposed post-processing methods for Task 1c that considered the results of Task 1a. Our system obtained an exact match F-score of 0.9412 and a relaxed match F-score of 0.9572 for Task 1a, an exact match F-score of 0.8865 and a relaxed match F-score of 0.9027 for Task 1b, and an exact match F-score of 0.7232 and an F-score of 0.8053 for Task 1c for each test set (private score). Although our approaches tried to consider the document-level context and relationships between the tasks, limitations remained. Keywords Information extraction, Chemical patents, Named entity recognition, Event extraction, Anaphora resolu- tion 1. Introduction The automated extraction of the chemical-reaction information in patents plays an important role in collecting chemical-reaction information in reaction databases for use by synthetic chemists. Chemical patents contain important information about new chemical discoveries because any new chemical compounds are usually published via patents [1]. With the number of patents increasing rapidly, manually collecting the information written in patents not only takes time and cost but also requires expertise in the subject matter of the patents. Since 2020, the Cheminformatics Elsevier Melbourne University (ChEMU) laboratory has identified several tasks related to information extraction from chemical patents, including expression-level information extraction [2, 3] and document-level information [4, 5]. For 2022, the ChEMU laboratory is providing five tasks for ChEMU 2022 [6]. In recent years, deep learning has been recognized as a promising approach to informa- tion extraction from chemical literature. For example, pre-trained language models such as CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy $ machi@eis.hokudai.ac.jp (K. Machi); yoshioka@ist.hokudai.ac.jp (M. Yoshioka) € https://www-kb.ist.hokudai.ac.jp/yoshioka/ (M. Yoshioka)  0000-0002-2096-1218 (M. Yoshioka) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) BioBERT [7] and ChemBERT [8] have shown high performance in information-extraction tasks. Moreover, the best systems in previous ChEMU tasks all employed deep-learning-based approaches, together with a small amount of rule-based post-processing [9, 10]. In addition, both of these systems used a pipeline approach for relation extraction tasks [11, 12], such as the event extraction and anaphora resolution tasks. This paper describes our results for the three tasks at ChEMU 2022: Task 1a (named entity recognition, NER), Task 1b (event extraction, EE) and Task 1c (anaphora resolution, AR). We employed hybrid approaches that used deep learning models and a small set of post-processing rules. In addition, we propose post-processing methods for Task 1c that considered the results of Task 1a. 2. Task Description Prediction systems for the three tasks must solve two general tasks: the identification of the spans of entities and the labels (mention detection), together with the identification of relations between the spans (relation classification). Task 1a, (i.e., NER), is a mention detection task that identifies one of the 10 entity types of labels. The set of labels contains these compounds (STARTING_MATERIAL, REAGENT_CATALYST, REACTION_PRODUCT, SOLVENT, OTHER_COMPOUND), conditions (TIME, TEMPERATURE), yields (YIELD_PERCENT, YIELD_OTHER), and a relation label (EXAMPLE_LABEL). Task 1b, (i.e., EE), is a task that involves both mention detection and relation extraction. The mention detection task identifies events that have a relationship with entities in Task 1a and identifies relations between the events and the entities. Almost all events involve a label that is either REACTION_STEP or WORK_UP and some events involve both labels. The relation extraction task identifies relations between the events and compounds, which are annotated with ARG1, and relations between the events and conditions or yield are annotated with ARGM. Task 1c, (i.e., AR), involves both mention detection and relation extraction tasks. The mention detection task identifies an antecedent as ENTITY and the anaphor as a label that represents their relationship. The relation extraction task identifies relations between antecedents and anaphors as a coreference relation (COREFERENCE) and four bridging relations (TRANSFORMED, RE- ACTION_ASSOCIATED, WORK_UP, and CONTAINED). 3. Methods We developed systems for mention detection and relation classification tasks that used Chem- BERT [8], a pre-trained language model for chemistry-related documents. This approach was similar to those of the best systems adopted in previous tasks [11, 12]. Figure 1 shows our pipeline method. First, we split a snippet into sentences by using ChemDataExtractor [13]. Then, ChemBERT predicted the labels for mentions/relations. post-processing methods were adopted for mention detection in Task 1a and relation detection in Task 1c, with the aim of addressing the document-level context. We fine-tuned ChemBERT for Task 1a. In addition, we fine-tuned multiple ChemBERTs for Tasks 1b and 1c because these tasks are more complex than Task 1a. Table 1 shows the set of Optional sentence sentence prediction document-level segmentation (ChemBERT) postprocessing post- sentence snippet snippet processed ... snippet sentence Figure 1: Overview of our pipeline method target labels for the three tasks involving mention detection and relation classification. Table 1 Set of target labels for the three tasks involving mention detection and relation classification Mention detection Relation classification Task 1a Named entity - Named entity (Task 1a) ARG1 Task 1b Event ARGM Candidate for coreference Coreference Task 1c Candidate for bridging Bridging 3.1. Mention Detection For Task 1a, we trained a ChemBERT model and constructed a set of post-processing rules. First, ChemBERT was fine-tuned on the training set, excluding snippets that had overlap- ping mentions. A snippet was split into sentences by using ChemDataExtractor [13] and the sentences were split by a simple regex rule that used a particular tool1 by default. Then, IOB2 labels were assigned to the tokens. We used a linear classifier to predict the label of each tokens and the input for it was the output of the first sub-token [14]. Second, two post- processing methods were applied. Because compounds in a heading section, which contains the product of the snippet (REACTION_PRODUCT) and/or the final product of the multistep reaction (OTHER_COMPOUND), cannot be distinguished without context, a post-processing method is required. The rule adopted is that if EXAMPLE_LABEL exists in a snippet, then the REACTION_PRODUCT that appears before the last EXAMPLE_LABEL is annotated with OTHER_COMPOUND. Then, if OTHER_COMPOUND appears in a heading and the same string appears as a REACTION_PRODUCT after the last source compound (STARTING_MATERIAL, REAGENT_CATALYST, or SOLVENT), then the entity is labeled as a REACTION_PRODUCT. For Task 1b, we used the method in Task 1a for named entity recognition and trained a ChemBERT model for the detection of events in the same manner as for Task 1a. For Task 1c, we trained one ChemBERT model for coreference and a second for bridging relations. The reason why we split the relations is because their mentions are partly different from each other. Therefore, we aimed to suppress false-positive relations caused by false-positive 1 https://github.com/spyysalo/standoff2conll mentions from the other ones. In these models, single-label mention detection was performed and the relation labels were given in the relation classification step. Sentences were tokenized in the same manner as for Task 1a, with B, I, O and D labels from BIOHD [15] being used for tagging because Task 1c contains discontinuous mentions. When overlapping mentions were tokenized, we used the longer entities and discarded the shorter ones. Because the number of mentions in the training set for coreference was smaller than for other datasets used for mention detection, we augmented the number of positive examples in the training set by reusing sentences that contained one or more mentions five times. 3.2. Relation Classification For Task 1b, we trained a one ChemBERT model for ARG1 relations and a second for ARGM relations. The input to the relation classification was a sentence with the candidate pair for a relation between an event enclosed by [E1] and [/E1] tokens and a target enclosed by [E2] and [/E2] tokens. The output was a binary classification result indicating whether the pair has a relation or not. All candidate pairs in a sentence were classified and positive relations were annotated by the system. For example, if two events and three targets are included in a sentence, the number of candidate pairs would be six. This approach is similar to the Melax Tech system [11], which performed best in the ChEMU 2020 task [9]. This approach was also discussed in a general framework [16]. In the training stage, we used not only gold-standard entities but also predicted events that were generated by five systems trained on 80% of the training set, similarly to five-fold cross-validation. For Task 1c, we trained one ChemBERT model for coreference and a second for bridging relations. The input to the relation classification was a pair of sentences representing a candidate pair for a relation between an anaphor enclosed by [E1] and [/E1] tokens and an antecedent enclosed by [E2] and [/E2]. The reason for using a pair of sentences, different from Task 1b, was that relations in Task 1c were often across sentences. If a mention was discontinuous, the first block of the mention was enclosed. The output was the label of the relation or a NO_RELATION label. We applied two post-processing methods for Task 1c because relations that involve more than two sentences were included in the snippets. First, for coreference relations, when RE- ACTION_PRODUCT appeared multiple times in a snippet and the sentence-level distance of the mentions was more than two, which means it cannot be found by ChemBERT, we assign a COREFERENCE relation. Second, for bridging relations, when a candidate for an antecedent did not have any anaphors, we searched for antecedent candidates for the anaphor by finding words that started with “the” and were an anaphor for another antecedent. The candidate for the anaphor that was closest to the antecedent was then selected as the anaphor. If the an- tecedent contained STARTING_MATERIAL, REAGENT_CATALYST or SOLVENT, the relation was annotated with REACTION_ASSOCIATED. Otherwise, the relation was annotated with a label that was the same as the already annotated anaphor after the antecedent. In addition to the above methods, we used a post-processing tool distributed by the task organizer2 , which generates a coreference between A and C when coreferences between A and B and between B and C already exist. 2 https://raw.githubusercontent.com/yuan-li/chemu2021/master/apply-transitive-closure.py 3.3. Experimental Settings We used ChemBERT v3.0 [8] for the mention detection and relation classification models. ChemBERT was implemented by using AllenNLP [17] and HuggingFace Transformers [18]. We used the AdamW optimizer [19] and cross entropy loss for optimization. The models were trained on the training set for the task and evaluated on the development set and both public and private test sets. Hyperparameter values were set as follows: max sequence length=384 (covering all sequences contained in the training and development sets), batch size=16, learning rate=1e-5, and patience=7. Because the relation classification in Task 1c accepts a pair of sentences as its input, a maximum sequence length of 512 was used for this task. The validation metric for early stopping of mention detection in the development set was the F-score. For relation classification, it was the validation loss. The performances of the various systems were evaluated with respect to both exact and relaxed matching for precision, recall, and F-score. 4. Main Results We submitted a system with post-processing for Task 1a before the deadline and a system without post-processing after the deadline. Table 2 shows our results for Task 1a on the private set. Our system obtained an exact match F-score of 0.9412 and a relaxed match F-score of 0.9572. Table 3 shows our results for the private set in detail. Table 2 Results for Task 1a on the private set. Here, P represents precision, R represents recall, and F represents F-score Exact Relaxed Relation P R F P R F ChemBERT (late) 0.9327 0.9349 0.9338 0.9481 0.9503 0.9492 ChemBERT + PP 0.9401 0.9422 0.9412 0.9561 0.9583 0.9572 We submitted one system for Task 1b before the deadline. We also submitted a corrected version after the deadline, having found an error related to the sequence length of the input to the system. Table 4 shows our results for Task 1b on the private set. The corrected system obtained an exact match F-score of 0.8865 and a relaxed match F-score of 0.9027. Table 5 shows our results for the private set in detail. We submitted three systems for Task 1c before the deadline. We also submitted a corrected version after the deadline, having found an error related to the sequence length of the input to the system. Table 6 shows our results for Task 1c on the private set. The corrected system obtained an exact match F-score of 0.7232 and a relaxed match F-score of 0.8053. Table 7 shows our results on the private set in detail. Table 3 Detailed results for Task 1a on the private set, as predicted by the post-processing version of the system Exact Relaxed Entity P R F P R F EXAMPLE_LABEL 0.9714 0.9913 0.9812 0.9714 0.9913 0.9812 OTHER_COMPOUND 0.9498 0.9486 0.9492 0.9637 0.9625 0.9631 REACTION_PRODUCT 0.9112 0.9034 0.9073 0.9433 0.9352 0.9392 REAGENT_CATALYST 0.8529 0.9050 0.8782 0.8721 0.9253 0.8979 SOLVENT 0.9284 0.9666 0.9471 0.9284 0.9666 0.9471 STARTING_MATERIAL 0.8997 0.8594 0.8791 0.9353 0.8934 0.9138 TEMPERATURE 0.9802 0.9770 0.9786 0.9901 0.9869 0.9885 TIME 0.9673 0.9741 0.9707 0.9883 0.9953 0.9918 YIELD_OTHER 0.9853 0.9711 0.9782 0.9902 0.9759 0.9830 YIELD_PERCENT 0.9750 0.9943 0.9846 0.9778 0.9972 0.9874 All 0.9401 0.9422 0.9412 0.9561 0.9583 0.9572 Table 4 Results for Task 1b on the private set Exact Relaxed Relation P R F P R F ChemBERT 0.9058 0.8685 0.8868 0.9222 0.8842 0.9028 ChemBERT Corrected (late) 0.9054 0.8684 0.8865 0.9220 0.8842 0.9027 5. Discussion Selecting “the best model” of the various models in training was difficult. Table 8 shows the F-scores on the development and private sets. We used models that showed the best F-score on the development set for mention detection and the best validation loss for relation classification; therefore, it is not surprising that F-scores on the development sets were better than those for the private sets. In particular, the result for Task 1c (greater than 0.09) represented a large gap. An explanation for this larger gap could be that the predictions by the models were unstable because Task 1c was more difficult than the other tasks. Therefore, we must reconsider training methods when seeking a better model. In Task 1b, all relations whose recalls were zero had fewer than five gold-standard examples (Table 5). It is quite difficult for our machine learning framework to identify such relations with a small amount of examples. Although our system showed good results for Tasks 1a and 1b, we found errors caused by a lack of document-level information. Figure 2 shows a confusion matrix for Task 1a. Because the role of a compound depends on a reaction, it is difficult to identify the la- bel of compounds without document-level information. Examples included errors among STARTING_MATERIAL, REAGENT_CATALYST, and SOLVENT and errors between REAC- TION_PRODUCT and OTHER_COMPOUND. In addition, errors between a compound for a reaction (REAGENT_CATALYST, SOLVENT, STARTING_MATERIAL) and one for a work-up Table 5 Detailed results for Task 1b on the private set, as predicted by the corrected system. * represents relations that had fewer than five gold-standard examples Exact Relaxed Relation P R F P R F ARG1|REACTION_STEP|OTHER_COMPOUND 0.4756 0.5342 0.5032 0.5000 0.5616 0.5290 ARG1|REACTION_STEP|REACTION_PRODUCT 0.8731 0.8561 0.8645 0.9179 0.9000 0.9089 ARG1|REACTION_STEP|REAGENT_CATALYST 0.8399 0.8744 0.8568 0.8590 0.8904 0.8744 ARG1|REACTION_STEP|SOLVENT 0.8929 0.8883 0.8906 0.8596 0.8950 0.8770 ARG1|REACTION_STEP|STARTING_MATERIAL 0.8832 0.8043 0.8419 0.9036 0.8228 0.8613 ARG1|WORKUP|OTHER_COMPOUND 0.9455 0.9052 0.9249 0.9627 0.9217 0.9418 *ARG1|WORKUP|REACTION_PRODUCT 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 *ARG1|WORKUP|REAGENT_CATALYST 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 *ARG1|WORKUP|SOLVENT 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 *ARG1|WORKUP|STARTING_MATERIAL 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 ARGM|REACTION_STEP|TEMPERATURE 0.9262 0.8750 0.8999 0.9328 0.8811 0.9062 ARGM|REACTION_STEP|TIME 0.8976 0.9024 0.9000 0.9213 0.9261 0.9237 ARGM|REACTION_STEP|YIELD_OTHER 0.9845 0.9315 0.9573 0.9871 0.9340 0.9598 ARGM|REACTION_STEP|YIELD_PERCENT 0.9671 0.9229 0.9444 0.9701 0.9257 0.9474 ARGM|WORKUP|TEMPERATURE 0.9063 0.6541 0.7598 0.9271 0.6692 0.7773 ARGM|WORKUP|TIME 0.7895 0.4054 0.5357 0.7895 0.4054 0.5357 *ARGM|WORKUP|YIELD_OTHER 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 *ARGM|WORKUP|YIELD_PERCENT 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 All 0.9054 0.8684 0.8865 0.9220 0.8842 0.9027 Table 6 Results for Task 1c on the private set. 𝑃 𝑃𝐵𝑅 represents post-processing for bridging relations and 𝑃 𝑃𝐶𝑅 represents post-processing for coreference relations. Exact Relaxed Relation P R F P R F ChemBERT 0.7393 0.6616 0.6983 0.8222 0.7358 0.7766 ChemBERT + 𝑃 𝑃𝐵𝑅 0.7290 0.6838 0.7057 0.8107 0.7604 0.7848 ChemBERT + 𝑃 𝑃𝐵𝑅 + 𝑃 𝑃𝐶𝑅 0.6876 0.7307 0.7085 0.7660 0.8140 0.7893 ChemBERT + 𝑃 𝑃𝐵𝑅 + 𝑃 𝑃𝐶𝑅 Corrected (late) 0.7144 0.7322 0.7232 0.7955 0.8153 0.8053 (OTHER_COMPOUND) were found because distinguishing between them from just one sen- tence was sometimes difficult. Errors between REACTION_STEP and WORK_UP for Task 1b were also found to be caused by the same difficulty. However, adopting a post-processing method for Task 1a mitigated the errors between REACTION_PRODUCT and OTHER_COMPOUND (Table 2). The post-processing methods employed in Task 1c were also useful in mitigating these errors (Table 6). However, we should note that using these methods adversely affected the precision. For example, our post-processing method for coreference generated false relations when false positive REACTION_PRODUCT existed. Therefore, we must be careful when using post-processing methods. Table 7 Results for Task 1c on the private set, as predicted by the corrected system Exact Relaxed Relation P R F P R F COREFERENCE 0.4896 0.4882 0.4889 0.5975 0.5958 0.5967 CONTAINED 0.5054 0.6267 0.5595 0.7312 0.9067 0.8095 REACTION_ASSOCIATED 0.7368 0.7974 0.7659 0.8094 0.8760 0.8414 TRANSFORMED 0.7310 0.7576 0.7440 0.7368 0.7636 0.7500 WORK_UP 0.8219 0.8230 0.8224 0.8940 0.8952 0.8946 All 0.7144 0.7322 0.7232 0.7955 0.8153 0.8053 Table 8 Comparison of F-scores between development and private sets Exact Relaxed Task Development Private Development Private Task 1a 0.9548 0.9412 0.9677 0.9535 Task 1b 0.9179 0.8865 0.9294 0.9027 Task 1c 0.8168 0.7232 0.8773 0.8053 With the aim of improving our systems, we tried to consider the relationships between the tasks in some preliminary experiments. For example, we tried to construct post-processing rules for named entity mentions in Task 1a by on the results for Task 1b. However, these rules did not improve the results because it was difficult to determine which prediction (named entity or event) was correct. In addition, we tried to use the relationships not only in the forward direction (i.e., Task 1a to Task 1c), but also in the backward direction (Task 1c to Task 1a). However, improving the performance in the backward direction was also difficult because the performance of the later task was lower than in the earlier task. Despite these difficulties, the post-processing methods for Task 1c that considered the results for Task 1a did improve our system’s performance. Therefore, we must conduct a more detailed analysis of the relationships between the tasks if we are to improve our systems via this approach. Table 9 shows the results for mention detection in Task 1c on the development set. First, the detection of coreference mentions was difficult compared to identifying bridging relations. The reasons were that coreference mentions had only a small number of mentions in the training data and sometimes required inter-sentence information to extract antecedents. The significance of the data augmentation was not clear. Therefore, we must reconsider the ratios used for positive mentions. Several errors were caused by failures in sentence splitting. For example, START- ING_MATERIAL “Ex. 18A” caused a split into two sentences by the ChemDataExtractor sentence splitter because of the “.” in “Ex. 18A”. A solution to this problem would involve applying a set of rules involving dependency parsing and trigger words. Figure 2: Normalized confusion matrix for Task 1a. Values less than 0.001 are not shown Table 9 Results for mention detection in Task 1c on the development set Exact Relaxed Relation P R F P R F Coreference 0.8099 0.8085 0.8092 0.8746 0.8730 0.8738 Coreference with data augmentation 0.7982 0.8316 0.8145 0.8567 0.8926 0.8743 Bridging 0.9090 0.9415 0.9250 0.9542 0.9884 0.9710 6. Conclusion This paper has reported our results for the three tasks at ChEMU 2022. We proposed hybrid methods that used ChemBERT and a small set of post-processing rules for these tasks. We employed a pipeline approach for Tasks 1b and 1c that combined mention detection and relation classification. Because we used only one or two sentences as the input to ChemBERT, this lack of document-level information suppressed the performance of the system. Although we confirmed that adopting a set of post-processing rules was effective in considering document- level information, we also confirmed that the set of rules we used was insufficient. In addition, although we tried to use relationships between the tasks to improve performance, it was difficult to construct rules that did achieve improvements. Therefore, we must conduct more detailed analyses about the relationships between the tasks. Acknowledgments We would like to thank the ChEMU team for providing the datasets. This work was partially supported by JSPS KAKENHI Grant Number 21K19814. References [1] M. Bregonje, Patents: A unique source for scientific technical information in chem- istry related industry?, World Patent Information 27 (2005) 309–315. URL: https:// www.sciencedirect.com/science/article/pii/S0172219005000736. doi:https://doi.org/ 10.1016/j.wpi.2005.05.003. [2] J. He, D. Q. Nguyen, S. A. Akhondi, C. Druckenbrodt, C. Thorne, R. Hoessel, Z. Afzal, Z. Zhai, B. Fang, H. Yoshikawa, A. Albahem, L. Cavedon, T. Cohn, T. Baldwin, K. Verspoor, ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents, Frontiers in Research Metrics and Analytics 6 (2021). URL: https://www. frontiersin.org/article/10.3389/frma.2021.654438. doi:10.3389/frma.2021.654438. [3] B. Fang, C. Druckenbrodt, S. A. Akhondi, J. He, T. Baldwin, K. Verspoor, ChEMU-Ref: A Corpus for Modeling Anaphora Resolution in the Chemical Domain, in: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics, Online, 2021, pp. 1362–1375. URL: https://www.aclweb.org/anthology/2021.eacl-main.116. [4] H. Yoshikawa, S. Akhondi, C. Thorne, C. Druckenbrodt, R. Hoessel, Z. Zhai, J. He, T. Bald- win, K. Verspoor, Chemical Reaction Reference Resolution in Patents (2021). [5] Z. Zhai, C. Druckenbrodt, C. Thorne, S. A. Akhondi, D. Q. Nguyen, T. Cohn, K. Verspoor, ChemTables: a dataset for semantic classification on tables in chemical patents, Journal of Cheminformatics 13 (2021) 1–20. [6] Y. Li, B. Fang, J. He, H. Yoshikawa, S. Akhondi, C. Druckenbrodt, C. Thorne, Z. Zhai, Z. Afzal, T. Cohn, T. Baldwin, K. Verspoor, The ChEMU 2022 Evaluation Campaign: Information Extraction in Chemical Patents, in: M. Hagen, S. Verberne, C. Macdonald, C. Seifert, K. Balog, K. Nørvåg, V. Setty (Eds.), Advances in Information Retrieval, Springer International Publishing, Cham, 2022, pp. 400–407. [7] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics (2019). doi:10.1093/bioinformatics/btz682. [8] J. Guo, A. S. Ibanez-Lopez, H. Gao, V. Quach, C. W. Coley, K. F. Jensen, R. Barzilay, Automated Chemical Reaction Extraction from Scientific Liter- ature, Journal of Chemical Information and Modeling 62 (2022) 2035–2045. URL: https://doi.org/10.1021/acs.jcim.1c00284. doi:10.1021/acs.jcim.1c00284. arXiv:https://doi.org/10.1021/acs.jcim.1c00284, pMID: 34115937. [9] J. He, D. Quoc Nguyen, S. A. Akhondi, C. Druckenbrodt, C. Thorne, R. Hoessel, Z. Afzal, Z. Zhai, B. Fang, H. Yoshikawa, et al., An extended overview of the CLEF 2020 ChEMU lab: information extraction of chemical reactions from patents, in: Proceedings of CLEF (Conference and Labs of the Evaluation Forum) 2020 Working Notes, 2020. [10] Y. Li, B. Fang, J. He, H. Yoshikawa, S. A. Akhondi, C. Druckenbrodt, C. Thorne, Z. Afzal, Z. Zhai, T. Baldwin, et al., Extended overview of ChEMU 2021: reaction reference resolution and anaphora resolution in chemical patents, CLEF (Working Notes) (2021). [11] J. Zhang, Y. Zhang, Melaxtech: a report for clef 2020–ChEMU task of chemical reaction extraction from patent, Work Notes CLEF. Published online.[Google Scholar] (2020). [12] R. Dutt, S. Khosla, C. P. Rosé, A pipelined approach to Anaphora Resolution in Chemical Patents., in: CLEF (Working Notes), 2021, pp. 710–719. [13] M. C. Swain, J. M. Cole, ChemDataExtractor: A Toolkit for Automated Extraction of Chem- ical Information from the Scientific Literature, Journal of Chemical Information and Model- ing 56 (2016) 1894–1904. URL: https://doi.org/10.1021/acs.jcim.6b00207. doi:10.1021/acs. jcim.6b00207. arXiv:https://doi.org/10.1021/acs.jcim.6b00207, pMID: 27669338. [14] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. URL: https://www.aclweb.org/ anthology/N19-1423. doi:10.18653/v1/N19-1423. [15] B. Tang, Q. Chen, X. Wang, Y. Wu, Y. Zhang, M. Jiang, J. Wang, H. Xu, Recognizing disjoint clinical concepts in clinical text using machine learning-based methods, in: AMIA annual symposium proceedings, volume 2015, American Medical Informatics Association, 2015, p. 1184. [16] Z. Zhong, D. Chen, A Frustratingly Easy Approach for Entity and Relation Extraction, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, 2021, pp. 50–61. URL: https://aclanthology.org/2021.naacl-main.5. doi:10.18653/v1/2021.naacl-main.5. [17] M. Gardner, J. Grus, M. Neumann, O. Tafjord, P. Dasigi, N. F. Liu, M. Peters, M. Schmitz, L. S. Zettlemoyer, AllenNLP: A Deep Semantic Natural Language Processing Platform, 2017. arXiv:arXiv:1803.07640. [18] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, A. M. Rush, Transformers: State-of-the-Art Natural Lan- guage Processing, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Online, 2020, pp. 38–45. URL: https://www.aclweb.org/anthology/2020.emnlp-demos.6. [19] I. Loshchilov, F. Hutter, Decoupled Weight Decay Regularization, in: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, OpenReview.net, 2019. URL: https://openreview.net/forum?id=Bkg6RiCqY7.