NLPatVCU: CLEF 2022 ChEMU Shared Task System Description Darshini Mahendran1 , Christina Tang1 and Bridget T. McInnes1 1 Virginia Commonwealth University, 601 West Main Street, Richmond VA 23220, USA Abstract This paper describes our team’s participation in the Tracks 1a & 1b from Cheminformatics Elsevier Melbourne University (ChEMU) 2022 Challenge that focuses on information extraction in chemical patents. We discuss our systems: MedaCy, a python-based supervised multi-class entity recognition system for Named Entity Recognition (NER), and RelEx, a python-based relation extraction system for Event Extraction (EE). Our best model for Task 1a obtained an overall exact precision, recall, and 𝐹1 score of 0.73, 0.81, and 0.77, respectively, and relaxed precision, recall, and 𝐹1 score of 0.83, 0.92, and 0.87 respectively. Our best model for Task 1b obtained an overall exact precision, recall, and 𝐹1 score of 0.82, 0.68, and 0.75, respectively, and a relaxed precision, recall, and 𝐹1 score of 0.88, 0.73, and 0.79 respectively. Keywords Named Entity Recognition, Event Extraction, Information Extraction 1. Introduction Chemical patents contain information about chemicals and their reactions. This information can be used to discover new chemicals and synthetic pathways [1] [2]. However, manually combing this literature is time-consuming. Therefore there is an increased need for tools to automatically extract chemicals and their reactions. The process referred to as chemical reaction detection [3] consists of two main components. The first is to identify the different components of the reaction, and the second is to identify the relation between the components. The CLEF 2022 Cheminformatics Elsevier Melbourne University (ChEMU) Task 1a aims to create systems to perform Named Entity Recognition (NER) over chemical patents as the first step in chemical reaction detection. Specifically, the goal of this task is to automatically identify chemical compounds based on the role they play in a reaction, as well as other relevant information such as yield and temperature. The CLEF 2022 ChEMU Task 1b aims to create systems to perform Event Extraction (EE) over the entities to identify the individual steps in the reaction. In this paper, we describe our participation in the CLEF 2022 ChEMU Task 1a and 1b Challenge. For this challenge, we used our python framework MedaCy to automatically identify the reaction components; and RelEx’s GCN-BERT to automatically link the trigger words with the CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy $ mahendrand@vcu.edu (D. Mahendran); ctang2@vcu.edu (C. Tang); btmcinnes@vcu.edu (B. T. McInnes)  0000-0002-4950-6876 (D. Mahendran); 0000-0001-6204-0129 (C. Tang); 0000-0003-2297-6672 (B. T. McInnes) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) experimental parameters to provide the sequence of steps within the reaction. MedaCy [4] contains a number of supervised multi-label sequence classification algorithms for NER. RelEx’s GCN-BERT [5] utilizes Graph Convolutional Neural Networks (GCNs). Our best model for Task 1a obtained an overall exact precision, recall, and 𝐹1 score of 0.73, 0.81, and 0.77 respectively, and relaxed precision, recall, and 𝐹1 score of 0.83, 0.92, and 0.87 respectively. Our best model for Task 1b obtained an overall exact precision, recall, 𝐹1 score of 0.82, 0.68, and 0.75 respectively, and a relaxed precision, recall, and 𝐹1 score of 0.88, 0.73, and 0.79 respectively. 2. Data Table 1 Definitions of entity types, trigger words, and relation types of ChEMU 2022 dataset [6] Entity Type Definition A product is a substance that is formed during a chemical REACTION_PRODUCT (R.P.) reaction. A substance that is consumed in the course of a chemical STARTING_MATERIAL (S.M.) reaction providing atoms to products is considered as starting material. A reagent is a compound added to a system to cause or help with a chemical reaction. Compounds like catalysts, REAGENT_CATALYST (R.C.) bases to remove protons or acids to add protons must be also annotated with this tag. A solvent is a chemical entity that dissolves a solute SOLVENT (S) resulting in a solution. Other chemical compounds that are not the products, OTHER_COMPOUND (O.C.) starting materials, reagents, catalysts and solvents. TIME The reaction time of the reaction. TEMPERATURE (Temp) The temperature of the reaction. YIELD_PERCENT (Y.P.) Yields given in percent values. YIELD_OTHER (Y.O.) Yields provided in other units than %. A manipulation required to isolate and purify the product WORKUP of a chemical reaction REACTION STEP An event that converts starting materials into a product The elation between an event trigger word and a chemical ARG1 compound The relation between an event trigger word and a tem- ARGM perature, time, or yield entity The ChEMU 2022 data corpus [7] includes chemical entities and events that explain the sequence of steps that leads to a chemical reaction to an end product. It contains 1500 chemical snippets sampled from 180 English document patents from the European Patent Office and the United States Patent and Trademark Office [8]. Each snippet holds a detailed description of chemical reactions. Entities of this dataset are divided into four categories [6]: (1) chemical compounds that are involved in a chemical reaction; (2) conditions under which a chemical reaction is carried out; (3) yields obtained for the final chemical product; and (4) example labels that are associated with reaction specifications. The four categories are further divided into a total of ten entity types. The compound category defines five roles a chemical compound can play within a chemical reaction. Conditions category and yield category each include two entity types. A chemical reaction step involves an action and one or more chemical compounds on which the action takes effect [6]. The action is also linked to the conditions under which the action is carried out and the resultant yields from the action. Relations form between actions (trigger words) and all arguments involved in the reaction steps, such as chemical compounds, conditions, and yields. The ARG1 event label corresponds to relations between a trigger word and chemical compound entities. The ARGM event label corresponds to the relations between a trigger word and temperature, time, or yield entities. Table 1 shows the definitions of the entity types, trigger words, and relation types. Table 2 Number of entity types and trigger words in the training data and their event relations Training set Development Set Events Entities Instances REACTION_STEP WORKUP REACTION_STEP WORKUP EXAMPLE_LABEL 886 - - - - REACTION_PRODUCT 2052 1101 11 719 7 STARTING_MATERIAL 1754 1747 4 1122 1 ARG1 REAGENT_CATALYST 1281 1272 - 789 - SOLVENT 1140 1134 4 667 3 OTHER_COMPOUND 4640 161 4097 105 2661 YIELD_PERCENT 955 937 1 688 - YIELD_OTHER 1061 1043 2 602 - ARGM TIME 1059 839 81 569 46 TEMPERATURE 1515 813 242 473 140 REACTION_STEP 3815 Triggers WORKUP 3053 3. Methods This section describes the underlying methodology of our system for Tasks 1a and 1b. 3.1. Task 1a: Named Entity Recognition To identify the experimental parameters and triggers from the data, we use our MedaCy NER package, which was previously trained for chemical reactions [9]. In this section, we describe the two NER algorithms we evaluated for this challenge. BiLSTM + CRF: Long Short-Term Memory(LSTMs) units [10] are a form of Recurrent Neural Networks (RNNs). LSTMs take as their input, not just the current input example but also what they have seen in the past. This allows them to connect previous observations over arbitrarily long distances. They incorporate the functionality to identify what information should be passed to the next LSTM cell and what information should not, allowing only relevant information to be passed on. For bidirectional LSTMs (BiLSTMs), data are processed in both directions with two separate hidden layers, which are then fed forward into the same output layer. This allows the system to exploit context in both directions. In this work, we feed the result of the BiLSTM through a linear-chain Conditional Random Fields(CRF) to calculate the final class probability for each token in the sequence. Here, we use the word embeddings derived from word2vec Figure 1: BiLSTM+CRF architecture for NER. [11] in combination with character embeddings [12] as input into our BiLSTM+CRF model. The word2vec embeddings are derived from a neural network that learns a representation of a word-word co-occurrence matrix. At a high level, it is a neural network that learns a series of weights (a hidden layer within the neural network) that either maximizes the probability of a word given the surrounding context, referred to as the Continuous Bag Of Words (CBOW) approach, or maximizes the probability of the context given the word, referred to as the Skip- gram approach. The character embeddings are learned using a BiLSTM and concatenated onto the word2vec embeddings. Figure 1 provides a high level over view of the architecture. BERT: The Bidirectional Encoder Representations from Transformers (BERT) is a contextualized language model trained over a large corpus for the tasks of masked language modeling and next sentence prediction. Devlin, et al. [13] showed that this pre-trained model could be fine-tuned for other NLP tasks, including NER, by adding a simple classification layer. Our system consists of and Alternate WordPiece Labeling Component, BioBERT [14] with a Linear Classification Layer, and a CRF output layer. The BERT tokenization splits tokens into “WordPieces”, creating Figure 2: BERT architecture for NER. a complication when doing token-level classification like NER. As recommended by Devlin, et al. [13], we classified the first WordPiece by masking the rest and applying an “X” label. BioBERT [14] is a BERT model that was pre-trained over PubMed abstracts and full-text articles from PubMed Central 1 . Lastly, a CRF is assigned a class probability for each subword in the sequence to incorporate the interdependence between labels into the model. Figure 2 provides a high level over view of the architecture. 3.2. Task 1b: Event Extraction In this section, we describe our EE system, which utilizes GCN in combination with the BERT encoder. GCN-BERT: To identify the chemical arguments between the trigger words and the entities, we use RelEx’s GCN-BERT, a python-based Relation Extraction Framework developed to identify relations between two entities. BERT utilizes positional information to capture the local con- textual information within a sentence, whereas GCN captures the global context information by performing convolution operations on neighbor nodes in a graph. Here, we combine BERT 1 https://huggingface.co/monologg/biobert_v1.1_pubmed with GCN to better represent local contextual information and global association information between words. We treat the EE task as a binary classification task building a separate model for each trigger word-entity type to determine whether a relation exists between them: (1) Positive class - there is a relation between the trigger word and the entity, and (2) Negative class - there is no relation between the trigger word and the entity (no-relation). To determine the relation between two entities, we first locate the sentence where the trigger word-entity pair is located. A sentence can have multiple such trigger word-entity pairs; therefore, we need to represent the targeted trigger word-entity pair in a distinguishable way from other pairs. Here, we replace the non-targeted trigger word-entity pairs with ’X’ from the input sentence except for the targeted trigger word-entity pair. BERT captures the contextual information within a sentence or document locally however it fails to capture the global information. On the other hand, GCN captures the global information between the nodes but may fail to capture local information. Therefore, we proposed a novel architecture that combines BERT with GCN to benefit from capturing both local and global information and allowing them to influence mutually and build together a final representation for classification. First, we extract the sentence where the trigger word-entity pair is located. Then, we use the BERT tokenizer for word tokenization. BERT uses a hybrid of word-level and character- level tokenization to handle the Out-Of-Vocabulary (OOV) words. After sentence tokenization, we build a vocabulary map mapping the unique tokens to integers. Second, we generate a vocabulary graph G =(V, E) where we denote the word nodes in the graph by the mapped integers, and we measure the weight of the edge between two word nodes (word-word nodes) using Point-wise Mutual Information(PMI). PMI represents a quantified measure for how likely we are to see the two words co-occur, given their individual probabilities, and relative to the case where the two words do not correlate. We calculate PMI shown in the Equation 1 using the number of observations for word 𝑥, the number of observations for word 𝑦, the probabilities for the words 𝑥 and 𝑦, and the co-occurrence of 𝑥 and 𝑦. A positive PMI value indicates a high semantic correlation between words, whereas a negative PMI value indicates little or no semantic correlation. (︂ )︂ 𝑃 (𝑥, 𝑦) 𝑃 𝑀 𝐼(𝑥, 𝑦) = log (1) 𝑃 (𝑥)𝑃 (𝑦) Next, we pass the graph through a two-layer GCN, which performs multiple levels of con- volution to capture the global information between the nodes that are not connected directly and generates the graph embeddings. Third, we combine the mapped word indices with the generated graph embeddings. BERT is a transformer that applies multi-head self-attention. BERT architecture initially takes a token, segment, and position embeddings of the input text. BERT converts the input token embeddings into a vector representation at first. At this point, we concatenate our graph embeddings vector with the converted vector representation. Then, BERT applies the bidirectional training, taking the previous and next tokens into account and producing a representation for the input sequence. Finally, the final embedding representation is fed into a fully connected layer for classification. Fig. 3 shows the structure of our RelEx’s GCN-BERT approach. Figure 3: Structure for the RelEx’s GCN-BERT approach. 3.3. Experimental Details Word Embeddings: We utilized ChemPatent embeddings [8] trained over a collection of 84,076 full patent documents (1B tokens) in our methods. BERT: We used BioBERT [14] as the base encoder for the transformer-based models. The transformer encoder used to further refine the BERT embeddings was implemented in PyTorch [15]. MedaCy: We used a PyTorch [15] implementation of the BiLSTM+CRF model. The models were trained for 40 epochs and optimized using stochastic gradient descent. A window size of three generated the best results. The source code is available in the MedaCy public repository 2 . RelEx: We used PyTorch [15] for the implementation of GCN-BERT. We experimented with 2 https://github.com/NLPatVCU/medaCy different sliding window sizes, filter sizes, loss functions for fine-tuning. PyTorch-Transformers3 by HuggingFace Team to build the BERT model. The source code is available in the RelEx-GCN public repository 4 . 3.4. Evaluation In the results, for both Tasks 1a and 1b, we report the precision, recall, and 𝐹1 scores. Precision is the ratio between correctly predicted mentions over the total set of predicted mentions for a specific entity; recall is the ratio of correctly predicted mentions over the actual number of mentions, and 𝐹1 is the harmonic mean between precision and recall. For Task 1a and 1b, we report both the exact and relaxed results. Two annotations are considered equal only if they have the same tag with exactly matching spans during the exact evaluation. In contrast, with the relaxed evaluation, two annotations are considered equal if they share the same tag, and their spans overlap each other. 4. Results and Discussion In this section, we discuss the results for Task 1a and 1b for both development and test sets. 4.1. Task 1a: Named Entity Recognition Table 3 shows the exact precision, recall and 𝐹1 results over the development set. The results show that our BiLSTM+CRF consistently obtained higher precision, recall, and 𝐹1 scores than BERT implementation. The results also show that the BiLSTM+CRF obtained higher than 95% precision and recall for all entities except for STARTING_MATERIAL. Table 3 The exact Precision (P), Recall (R), and 𝐹1 results for the development set using BiLSTM+CRF and BERT BiLSTM+CRF BERT Entity P R 𝐹1 P R 𝐹1 EXAMPLE_LABEL 0.98 0.99 0.99 0.94 0.97 0.95 OTHER_COMPOUND 0.99 0.98 0.99 0.94 0.95 0.94 REACTION_PRODUCT 0.97 0.92 0.95 0.96 0.66 0.78 REAGENT_CATALYST 0.99 0.99 0.99 0.81 0.90 0.85 SOLVENT 0.99 0.99 0.99 0.90 0.89 0.89 STARTING_MATERIAL 0.99 0.65 0.78 0.93 0.59 0.72 TEMPERATURE 0.99 1.00 0.99 0.99 0.98 0.99 TIME 0.99 1.00 0.99 1.00 0.99 0.99 YIELD_OTHER 0.96 0.99 0.98 0.91 0.97 0.94 YIELD_PERCENT 0.99 0.93 0.96 1.00 0.97 0.99 System 0.99 0.93 0.96 0.94 0.85 0.89 Tables 4 and 5 show the results over the test data for the BiLSTM+CRF and BERT respectively. Similarly, the BiLSTM+CRF obtained higher precision, recall, and 𝐹1 scores than BERT. We 3 https://pytorch.org/hub/huggingface_pytorch-transformers/ 4 https://github.com/NLPatVCU/RelEx-GCN believe this is due to the embedding representations for the BiLSTM+CRF being trained on patents while our BERT implementation was trained over PubMed journal articles. Table 4 Precision (P), Recall (R), and 𝐹1 results for the test set using BiLSTM+CRF trained over training data with ChemPatent embeddings Exact Relaxed Entity P R 𝐹1 P R 𝐹1 EXAMPLE_LABEL 0.94 0.96 0.96 0.95 0.97 0.96 OTHER_COMPOUND 0.86 0.84 0.85 0.92 0.90 0.91 REACTION_PRODUCT 0.39 0.49 0.43 0.69 0.87 0.77 REAGENT_CATALYST 0.84 0.82 0.83 0.88 0.86 0.87 SOLVENT 0.90 0.93 0.91 0.92 0.94 0.93 STARTING_MATERIAL 0.44 0.72 0.55 0.53 0.88 0.66 TEMPERATURE 0.94 0.96 0.95 0.97 0.99 0.98 TIME 0.84 0.86 0.85 0.98 0.99 0.99 YIELD_OTHER 0.78 0.76 0.77 0.94 0.91 0.93 YIELD_PERCENT 0.93 0.98 0.95 0.95 0.99 0.97 System 0.73 0.81 0.77 0.83 0.92 0.87 Table 5 Precision (P), Recall (R), and 𝐹1 results for the test set using BERT Exact Relaxed Entity P R 𝐹1 P R 𝐹1 EXAMPLE_LABEL 0.93 0.95 0.94 0.94 0.95 0.95 OTHER_COMPOUND 0.86 0.80 0.83 0.93 0.86 0.90 REACTION_PRODUCT 0.42 0.64 0.51 0.59 0.91 0.72 REAGENT_CATALYST 0.69 0.64 0.66 0.79 0.73 0.76 SOLVENT 0.83 0.89 0.86 0.86 0.92 0.88 STARTING_MATERIAL 0.34 0.59 0.43 0.49 0.85 0.62 TEMPERATURE 0.95 0.97 0.96 0.98 0.99 0.98 TIME 0.84 0.85 0.85 0.98 1.00 0.99 YIELD_OTHER 0.80 0.76 0.78 0.95 0.90 0.92 YIELD_PERCENT 0.92 0.99 0.96 0.93 1.00 0.96 System 0.70 0.79 0.74 0.79 0.90 0.84 Table 6 shows the true positive (tp), false positive (fp), true negative (tn) and false negative (fn) for each of the entities using the BiLSTM+CRF and BERT systems. The results show that most of the false negative and false positive issues are with the Chemical entities (OTHER_COMPOUND, STARTING_MATERIAL, REACTION_PRODUCT). These entities have a high amount of ambi- guity within their mentions as a chemical can be the starting material in one experiment and the reaction product in the other. 4.2. Task 1b: Event Extraction Table 7 shows the precision, recall, and 𝐹1 scores obtained over the development set for EE. Here we used our RelEx’s GCN-BERT trained over the ChemPatent embeddings with the trigger words identified using MedaCy’s BiLSTM+CRF trained over the ChemPatent embeddings. Table 6 Error analysis for both BiLSTM+CRF and BERT systems for the exact match evaluation over test data. BiLSTM+CRF BERT Entity tp fp fn tp fp fn ¯ ¯ EXAMPLE_LABEL 329 20 14 325 23 18 OTHER_COMPOUND 1384 227 270 1319 217 335 REACTION_PRODUCT 400 626 418 526 725 292 REAGENT_CATALYST 364 67 78 282 125 160 SOLVENT 360 38 29 347 69 42 STARTING_MATERIAL 469 593 178 380 738 267 TEMPERATURE 586 37 23 588 30 21 TIME 364 68 61 363 69 62 YIELD_OTHER 317 87 98 316 80 99 YIELD_PERCENT 347 25 6 350 29 3 System 4796 2015 1299 4796 2105 1299 GCN-BERT obtained an overall precision of 0.85, recall of 0.94, and 𝐹1 score of 0.89 with the development data. The system obtained higher 𝐹1 scores with the REACTION_STEP classes than with WORKUP classes. This is mainly because the REACTION_STEP classes have more training instances than most WORKUP classes. Also, we can see that the performance of each Trigger word-Entity pair is proportional to the number of instances in the training set. For example, classes with a higher number of instances, such as REACTION_STEP-STARTING_MATERIAL, REACTION_STEP-REAGENT_CATALYST, REACTION_STEP-YIELD_OTHER achieved higher 𝐹1 scores. In contrast, classes that have a moderate number of instances comparatively, such as REACTION_STEP-OTHER_COMPOUND, WORKUP-TEMPERATURE, achieved mod- erate 𝐹1 scores. Trigger word-Entity pairs such as WORKUP-SOLVENT and WORKUP- STARTING_MATERIAL that have very few instances obtained an 𝐹1 score of zero. Table 8 shows both exact and relaxed precision, recall, and 𝐹1 scores obtained over the Test set for EE. Here we used our RelEx’s GCN-BERT trained over the ChemPatent embeddings with the trigger words identified using MedaCy’s BiLSTM+CRF trained over the ChemPatent embeddings. GCN-BERT obtained an overall exact precision of 0.82, recall of 0.68, and 𝐹1 score of 0.75 and relaxed precision of 0.88, recall of 0.73, and 𝐹1 score of 0.79 with the test data. From the results we can see similar observations between the results of the test set and the development set. Here also the REACTION_STEP classes performed better than the WORKUP classes. Most of the classes performed better when evaluated in the relaxed mode. This is because the NER model may not always find the complete span of the entity when performing inference. Table 9 shows a detailed error analysis of the GCN-BERT system over the test set during the exact match evaluation. Here, we report the number of true positives (tp), false positives (fp), and false negatives (fn) and also “fpm” and “fnm”, two metrics that represent the number of false positives and false negatives, of which the corresponding entities are missing. The confusion matrix allows visualizing the performance of an algorithm. We can see that class imbalance played a role in the miss annotations of the events. More importantly, we can see that for the chemical entities, we have a high number of false negatives indicating that the system over generating relations and indicating relations between chemicals and trigger words Table 7 Precision (P), Recall (R) and 𝐹1 results for the development set using GCN-BERT system with trigger words identified using MedaCy’s BiLSTM + CRF trained with ChemPatent embeddings Argument Trigger Entity # Train P R 𝐹1 OTHER_COMPOUND 161 0.73 0.48 0.58 REACTION_PRODUCT 1101 0.97 0.98 0.98 REACTION_STEP REAGENT_CATALYST 1272 0.88 0.95 0.91 SOLVENT 1134 0.86 0.94 0.90 ARG1 STARTING_MATERIAL 1747 0.87 0.93 0.90 OTHER_COMPOUND 4097 0.89 0.97 0.93 REACTION_PRODUCT 11 0.00 0.00 0.00 WORKUP SOLVENT 4 0.00 0.00 0.00 STARTING_MATERIAL 4 0.00 0.00 0.00 TEMPERATURE 813 0.53 0.91 0.67 TIME 839 0.75 0.92 0.83 REACTION_STEP YIELD_OTHER 1043 0.95 0.99 0.97 ARGM YIELD_PERCENT 937 0.95 0.99 0.97 TEMPERATURE 242 0.75 0.73 0.74 WORKUP TIME 81 0 .00 0.00 0.00 System 0.85 0.94 0.89 Table 8 Precision (P), Recall (R) and 𝐹1 results for the test set using GCN-BERT system with trigger words identified using MedaCy’s BiLSTM + CRF trained with ChemPatent embeddings Strict Relaxed Argument Trigger Entity P R 𝐹1 P R 𝐹1 OTHER_COMPOUND 0.52 0.53 0.53 0.52 0.53 0.53 REACTION_PRODUCT 0.77 0.54 0.63 0.91 0.65 0.76 REACTION_STEP REAGENT_CATALYST 0.84 0.68 0.75 0.86 0.70 0.77 SOLVENT 0.82 0.74 0.78 0.84 0.76 0.79 ARG1 STARTING_MATERIAL 0.71 0.59 0.65 0.79 0.66 0.72 OTHER_COMPOUND 0.88 0.77 0.82 0.93 0.82 0.87 REACTION_PRODUCT 0.00 0.00 0.00 0.00 0.00 0.00 WORKUP SOLVENT 0.00 0.00 0.00 0.00 0.00 0.00 STARTING_MATERIAL 0.00 0.00 0.00 0.00 0.00 0.00 TEMPERATURE 0.84 0.54 0.66 0.86 0.55 0.67 TIME 0.74 0.69 0.72 0.87 0.81 0.84 REACTION_STEP YIELD_OTHER 0.92 0.72 0.81 0.95 0.74 0.83 ARGM YIELD_PERCENT 0.97 0.90 0.93 0.97 0.91 0.94 TEMPERATURE 0.70 0.47 0.56 0.70 0.47 0.56 WORKUP TIME 0.45 0.35 0.39 0.45 0.35 0.39 System 0.82 0.68 0.75 0.88 0.73 0.79 that are inaccurate. 5. Conclusion In this paper, we describe our participation in the CLEF 2022 ChEMU Task 1a and 1b Challenge. For Task 1a, we evaluated two NER models to extract chemical reaction components from patents (1) a BiLSTM+CRF model over ChemPatent embeddings and (2) a BERT transformer model using BioBERT. Our results show that the BiLSTM+CRF outperformed the BERT model. We believe this is due to the embedding representations for the BiLSTM+CRF being trained on patents while our BERT implementation was trained over PubMed journal articles. In the future, further training in the BioBERT based model over patents may increase the BERT scores, Table 9 Error analysis for the GCN-BERT system with trigger words identified using MedaCy’s BiLSTM + CRF trained with ChemPatent embeddings Argument Trigger Entity tp fp fn fpm fnm OTHER_COMPOUND 39 36 34 9 21 REACTION_PRODUCT 222 68 188 67 178 REACTION_STEP REAGENT_CATALYST 298 55 140 32 89 SOLVENT 292 64 102 32 49 STARTING_MATERIAL 386 161 263 120 196 ARG1 OTHER_COMPOUND 1124 154 332 117 246 REACTION_PRODUCT 0 0 2 0 1 WORKUP REAGENT_CATALYST 0 0 3 0 3 SOLVENT 0 0 1 0 1 STARTING_MATERIAL 0 0 1 0 1 TEMPERATURE 262 50 226 29 45 TIME 262 91 117 67 89 REACTION_STEP YIELD_OTHER 294 26 115 26 104 YIELD_PERCENT 316 11 34 10 16 ARGM TEMPERATURE 62 26 71 3 36 TIME 13 16 24 7 15 WORKUP YIELD_OTHER 0 0 2 0 1 YIELD_PERCENT 0 0 1 0 0 System 3570 758 1656 519 1091 or utilizing ChemicalBERT rather than BioBERT. For Task 1b, we combined BERT with GCN to integrate the local contextual and global information between the words. We replaced the non-targeted trigger word-entity pairs with ’X’ except for the targeted trigger word-entity pair in the input sentences to distinguish the trigger word-entity pairs. The results showed that it performed reasonably for REACTION_STEP trigger words but less for WORKUP, partly due to class imbalance. However, we also noted that it over-generated the relations when linking trigger words to their chemical, which requires further investigation. In the future, we plan to investigate expanding our system to perform multi-class classification and benchmark against different datasets. And also to try different trigger word-entity pair representations to efficiently represent the trigger word-entity pair in a sentence. Acknowledgments We want to thank Jorge Vargas for the initial development of the NER figures. This work was funded by the National Science Foundation (NSF) under Grant No. CMMI 1651957. References [1] W. Bort, I. I. Baskin, P. Sidorov, G. Marcou, D. Horvath, T. Madzhidov, A. Varnek, T. Gi- madiev, R. Nugmanov, A. Mukanov, Discovery of novel chemical reactions by deep generative recurrent neural network (2020). [2] K. Wang, L. Wang, Q. Yuan, S. Luo, J. Yao, S. Yuan, C. Zheng, J. Brandt, Construction of a generic reaction knowledge base by reaction data mining, Journal of Molecular Graphics and Modelling 19 (2001) 427–433. [3] H. Yoshikawa, D. Q. Nguyen, Z. Zhai, C. Druckenbrodt, C. Thorne, S. A. Akhondi, T. Bald- win, K. Verspoor, Detecting chemical reactions in patents (2019). [4] S. Farnsworth, G. Gurdin, J. Vargas, A. Mulyar, N. Lewinski, B. T. McInnes, Extracting experimental parameter entities from scientific articles, Journal of Biomedical Informatics (2021) 103970. [5] D. Mahendran, C. Tang, B. McInnes, Graph convolutional networks for chemical rela- tion extraction, Proceedings of the Semantics-enabled Biomedical Literature Analytics (SeBiLAn) (2022). [6] J. He, D. Q. Nguyen, S. A. Akhondi, C. Druckenbrodt, C. Thorne, R. Hoessel, Z. Afzal, Z. Zhai, B. Fang, H. Yoshikawa, et al., An extended overview of the clef 2020 chemu lab, in: the Conference and Labs of the Evaluation Forum (CLEF), 22-25 September 2020, 2020. [7] J. He, D. Q. Nguyen, S. A. Akhondi, C. Druckenbrodt, C. Thorne, R. Hoessel, Z. Afzal, Z. Zhai, B. Fang, H. Yoshikawa, et al., Chemu 2020: Natural language processing methods are effective for information extraction from chemical patents, Frontiers in Research Metrics and Analytics 6 (2021) 12. [8] D. Q. Nguyen, Z. Zhai, H. Yoshikawa, B. Fang, C. Druckenbrodt, C. Thorne, R. Hoessel, S. A. Akhondi, T. Cohn, T. Baldwin, et al., Chemu: Named entity recognition and event extraction of chemical reactions from patents, in: European Conference on Information Retrieval, Springer, 2020, pp. 572–579. [9] D. Mahendran, G. Gurdin, N. Lewinski, C. Tang, B. T. McInnes, Identifying chemical reactions and their associated attributes in patents, Frontiers in Research Metrics and Analytics (2021) 42. [10] Z. Huang, W. Xu, K. Yu, Bidirectional lstm-crf models for sequence tagging, arXiv preprint arXiv:1508.01991 (2015). [11] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in: Advances in neural information processing systems, 2013, pp. 3111–3119. [12] M. Gridach, Character-level neural network for biomedical named entity recognition, Journal of biomedical informatics 70 (2017) 85–91. [13] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, CoRR abs/1810.04805 (2018). URL: http://arxiv. org/abs/1810.04805. arXiv:1810.04805. [14] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, Biobert: a pre-trained biomedical language representation model for biomedical text mining, CoRR abs/1901.08746 (2019). URL: http://arxiv.org/abs/1901.08746. arXiv:1901.08746. [15] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch: An imperative style, high-performance deep learning library, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alche-Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Systems 32, Curran Associates, Inc., 2019, pp. 8024–8035.