EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature Jennifer D’Souza Sören Auer TIB Leibniz Information Centre for Science and TIB Leibniz Information Centre for Science and Technology Technology & L3S Research Center Hannover, Germany Hannover, Germany jennifer.dsouza@tib.eu soeren.auer@tib.eu ABSTRACT CCS CONCEPTS We describe an annotation initiative to capture the scholarly contri- • General and reference → Computing standards, RFCs and guide- butions in natural language processing (NLP) articles, particularly, lines; • Information systems → Document structure; Ontolo- for the articles that discuss machine learning (ML) approaches for gies; Data encoding and canonicalization. various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly ar- KEYWORDS ticles presenting contributions to five information extraction tasks dataset, annotation guidelines, semantic publishing, digital libraries, 1. machine translation, 2. named entity recognition, 3. question scholarly knowledge graphs, open science graphs answering, 4. relation classification, and 5. text classification. In this article, we describe the outcomes of this pilot annotation phase. Through the exercise we have obtained an annotation methodology; 1 INTRODUCTION and found ten core information units that reflect the contribution of the NLP-ML scholarly investigations. The resulting annotation As the rate of research publications increases [51], there is a growing scheme we developed based on these information units is called need within digital libraries to equip researchers with alternative NLPContributions. knowledge representations, other than the traditional document- The overarching goal of our endeavor is four-fold: 1) to find a based format, for keeping pace with the rapid research progress [3]. systematic set of patterns of subject-predicate-object statements for In this regard, several efforts exist or are currently underway for the semantic structuring of scholarly contributions that are more semantifying scholarly articles for their improved machine inter- or less generically applicable for NLP-ML research articles; 2) to pretability and ease in comprehension [19, 24, 38, 49]. These models apply the discovered patterns in the creation of a larger annotated equip experts with a tool for semantifying their scholarly publi- dataset for training machine readers [18] of research contributions; cations ranging from strictly-ontologized methodologies [19, 49] 3) to ingest the dataset into the Open Research Knowledge Graph to less-strict, flexible description schemes [24, 37], wherein the (ORKG) infrastructure as a showcase for creating user-friendly latter aim toward the bottom-up, data-driven discovery of an ontol- state-of-the-art overviews; 4) to integrate the machine readers into ogy. Consequently, knowledge graphs [1, 4] are being advocated the ORKG to assist users in the manual curation of their respective as a promising alternative to the document-based format for repre- article contributions. We envision that the NLPContributions senting scholarly knowledge for the enhanced content ingestion methodology engenders a wider discussion on the topic toward its enabled via their fine-grained machine interpretability. further refinement and development. Our pilot annotated dataset of The automated semantic extraction from scholarly publications 50 NLP-ML scholarly articles according to the NLPContributions using text mining has seen early initiatives based on sentences as scheme is openly available to the research community at https: the basic unit of analysis. To this end, ontologies and vocabularies //doi.org/10.25835/0019761. were created [14, 39, 46, 47], corpora were annotated [20, 32], and machine learning methods were applied [31]. Recently, scientific IE has targeted search technology, thus newer corpora have been annotated at the phrasal unit of information with three or six types of scientific concepts in up to ten disciplines [5, 16, 22, 33] facilitat- ing machine learning system development [2, 8, 10, 34]. In general, a phrase-focused annotation scheme more directly influences the building of a scholarly knowledge graph, since phrases constitute knowledge graph statements. Nonetheless, sentence-level anno- tations are just as poignant offering knowledge graph modelers Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 16 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China D’Souza and Auer better context from which the phrases are obtained for improved for this purpose, but they are not machine interpretable, in other knowledge graph curation. words, they cannot be comparatively organized. Further, the un- Over which, many recent data collection and annotation ef- structured abstracts representation still treats research as data silos, forts [26–28, 36] are steering new directions in text mining re- thus with this model, research endeavors, in general, continue to search on scholarly publications. These initiatives are focused on be susceptible to redundancy [23], lacking a meaningful way of the shallow semantic structuring of the instructional content in lab connecting structured and unstructured information. protocols or descriptions of chemical synthesis reactions. This has entailed generating annotated datasets via structuring recipes to 1.1 Our Contribution facilitate their automatic content mining for machine-actionable In this paper, we propose a surface semantically structured dataset information which are presented otherwise in adhoc ways within of 50 scholarly articles for their research contributions in the field scholarly documentation. Such datasets inadvertently facilitate the of natural language processing focused on machine learning ap- development of machine readers. In the past, such similar text min- plications (the NLP-ML domain) across five different information ing research was conducted as the unsupervised mining of Schemas extraction tasks to be integrable within the ORKG. To this end, (also called scripts, templates, or frames)—as a generalization of re- we (1) identify sentences in scholarly articles that reflect research curring event knowledge (involving a sequence of three to ten contributions; (2) create structured (subject,predicate,object) anno- events) with various participants [40]—primarily over newswire tations from these sentences by identifying mentions of the con- articles [7, 11–13, 42–44]. They were a potent task at generaliz- tribution candidate term phrases and their relations; and (3) group ing over similar but distinct narratives—can be seen as knowledge collections of such triples, that arise from either consecutive or units—with the goal of revealing their underlying common ele- non-consecutive sentences, under one of ten core information units ments. However, little insight was garnered on their practical task that capture an aspect of the contribution of NLP-ML scholarly relevance. This has changed with the recent surface semantic struc- articles. These core information units are conceptually posited as turing initiatives over instructional content. It has led to the re- thematic scripts [40]. The resulting model formalized from the pilot alization of a seemingly new practicable direction that taps into annotation exercise we call the NLPContributions scheme. the structuring of text and the structured information aggregation It has the following characteristics: (1) via a contribution-centered under Scripts-based knowledge themes. model, it makes realistic the otherwise forbidding task of semanti- Since scientific literature is growing at a rapid rate and researchers cally structuring full-text scholarly articles—our task only needs a today are faced with this publications deluge [25], it is increas- surface structuring of the highlights of the approach which often ingly tedious, if not practically impossible to keep up with the can be found in the Title, the Abstract, one or two paragraphs in the progress even within one’s own narrow discipline. The Open Re- Introduction, and in the Results section; (2) it offers guidance for a search Knowledge Graph (ORKG) [4] is posited as a solution to structuring methodology, albeit still encompassing subjective deci- the problem of keeping track of research progress minus the cog- sions to a certain degree, but overall presenting a uniform model nitive overload that reading dozens of full papers impose. It aims for identifying and structuring contributions—note that without to build a comprehensive knowledge graph that publishes the re- a model, such structuring decisions may not end up being compa- search contributions of scholarly publications per paper, where the rable across users and their modeled papers (see Figure 6); (3) the contributions are interconnected via the graph even across papers. dataset is annotated in JSON format since it preserves relation hier- At https://www.orkg.org/ one can view the contribution knowledge archies; (4) the annotated data we produce can be practically lever- graph of a single paper as a summary over its key contribution prop- aged within frameworks such as the ORKG that support structured erties and values; or compare the contribution knowledge graphs scholarly content-based knowledge ingestion. With the integration over common properties across several papers in a tabulated survey. of our semantically structured scholarly contributions data in the Practical examples of the latter can be found accessible online at ORKG, we aim to address the tedious and time-consuming scholarly https://www.orkg.org/orkg/featured-comparisons. This practically knowledge ingestion problem via its contributions comparison fea- addresses the knowledge ingestion problem for researchers. How? ture. And further, by using the graph-based model, we also address With the ORKG comparisons feature, researchers are no longer the problem of scholarly information produced as data silos, as the faced with the daunting cognitive ingestion obstacle from manually ORKG connects the structured information across papers. scouring through dozens of papers of unstructured content in their field. Where this process traditionally would take several days or months, using the ORKG contributions comparison tabulated view, 2 BACKGROUND AND RELATED WORK the task is reduced to just a few minutes. Assuming the individual Sentence-based Annotations of Scholarly Publications. Early paper contributions are structured in the ORKG, they can then sim- initiatives in semantically structuring scholarly publications fo- ply deconstruct the graph, tap into the aspects they are interested cused on sentences as the basic unit of analysis. In these sentence- in, and can enhance it for their purposes. Further, they can select based annotation schemes, all annotation methodologies [20, 32, multiple such paper graphs and with the click of a button gener- 47, 48] have had very specific aims for scientific knowledge cap- ate their tabulated comparison. For additional details on systems ture. Seminal works in this direction consider the CoreSC (Core and methods beyond just the contribution highlights, they can still Scientific Concepts) sentence-based annotation scheme [32]. This choose to read the original articles, but this time around equipped scheme aimed to model in finer granularity, i.e. at the sentence- with a better selective understanding of which articles they should level, concepts that are necessary for the description of a scientific read in depth. Of-course scholarly article abstracts are intended investigation, while traditional approaches employ section names 17 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents NLPContributions: An Annotation Scheme EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China serving as coarse-grained paragraph-level annotations. Such se- metabolomics, cancer and stem cell biology, with actions corre- mantified scientific knowledge capture was apt at highlighting sponding to lab procedures and their attributes including materials, selected sentences within computer-based readers. In this applica- instruments and devices used to perform specific actions. Thereby tion context, mere sectional information organization for papers the protocols then constituted a prespecified machine-readable for- was considered as missing the finer rhetorical semantic classifi- mat as opposed to the ad-hoc documentation norm. Kulkarni et cations. E.g., in a Results section, the author may also provide al. [27] even release a large human-annotated corpus of semantified some sentences of background information, which in a sentence- wet lab protocols to facilitate machine learning of such shallow se- wise semantic labeling are called Background and not Results. mantic parsing over natural language instructions. Within scholarly As another sentence-based scheme is the Argument Zoning (AZ) articles, such instructions are typically published in the Materials scheme [48]. This scheme aimed at modeling the rhetorics around and Method section in Biology and Chemistry fields. knowledge claims between the current work and cited work. They Along similar lines, inorganic materials synthesis reactions and used semantic classes as “Own_Method,” “Own_Result,” “Other,” procedures continue to reside as natural language descriptions in “Previous_Own,” “Aim,” etc., each elaborating on the rhetorical path the text of journal articles. There is a growing impetus in such to various knowledge claims. This latter scheme was apt for citation fields to find ways to systematically reduce the time and effort summaries, sentiment analysis and the extraction of information required to synthesize novel materials that presently remains one pertaining to knowledge claims. In general, such complementary of the grand challenges in the field. In [26, 36], to facilitate machine aims for the sentence-based semantification of scholarly publica- learning models for automatic extraction of materials syntheses tions can be fused to generate more comprehensive summaries. from text, they present datasets of synthesis procedures annotated with semantic structure by domain experts in Materials Science. Phrase-based Annotations of Scholarly Publications. The The types of information captured include synthesis operations trend towards scientific terminology mining methods in NLP steered (i.e. predicates), and the materials, conditions, apparatus and other the release of phrase-based annotated datasets in various domains. entities participating in each synthesis step. An early dataset in this line of work was the ACL RD-TEC cor- The NLPContributions annotation methodology proposed in pus [22] which identified seven conceptual classes for terms in the this paper draws on each of the earlier categorizations of related full-text of scholarly publications in Computational Linguistics, work. First, the full-text of scholarly articles including the Title viz. Technology and Method; Tool and Library; Language Resource; and the Abstract are annotated in a sentence-wise granularity with Language Resource Product; Models; Measures and Measurements; the aim of the annotated sentences being only those restricted to and Other. Similar to terminology mining is the task of scientific the contributions of the investigation. We selectively consider the keyphrase extraction. Extracting keyphrases is an important task full-text of the article by focusing only on specific sections of the in publishing platforms as they help recommend articles to read- article such as the Abstract, Introduction, and the Results sections. ers, highlight missing citations to authors, identify potential re- Sometimes we also model the contribution highlights from the viewers for submissions, and analyse research trends over time. Approach/System description in case if the Introduction does not Scientific keyphrases, in particular, of type Processes, Tasks and contain such pertinent information of the proposed model. We skip Materials were the focus of the SemEval17 corpus annotations [5]. the Background, Related Work, and Conclusion sections altogether. The dataset comprised annotations of the full text articles in Com- These sentences are then grouped under one of ten main informa- puter Science, Material Sciences, and Physics. Following suit was tion units, viz. ResearchProblem, Objective, Approach, Tasks, the SciERC corpus [33] of annotated abstracts from the Artificial ExperimentalSetup, Hyperparameters, Baselines, Results, and Intelligence domain. It included annotations for six concepts, viz. AblationAnalysis. Each of these units are defined in detail in Task, Method, Metric, Material, Other-Scientific Term, and Generic. the next section. Second, from the grouped contribution-centered Finally, in the realm of corpora having phrase-based annotations, sentences, we perform phrase-based annotations for (subject, predi- was the recently introduced STEM-ECR corpus [16] notable for its cate, object) triples to model in a knowledge graph. And Third, the multidisciplinarity including the Science, Technology, Engineering, resulting dataset has an overarching knowledge capture objective: and Medicine domains. It was annotated with four generic concept capturing the contribution of the scholarly article and, in particular, types, viz. Process, Method, Material, and Data that mapped across to facilitate the training of machine readers for the purpose along all domains, and further with terms grounded in the real-world via the lines of the machine-interpretable wet-lab protocols. Wikipedia/Wiktionary links. Next, we discuss related works that semantically model instruc- tional scientific content. In these works, the overarching scientific 3 THE NLPCONTRIBUTIONS MODEL knowledge capture theme is the end-to-end semantification of an 3.1 Goals experimental process. The development of the NLPContributions annotation model was Shallow Semantic Structural Annotations of Instructional backed by four primary goals: Content in Scholarly Publications. Increasingly, text mining ini- tiatives are seeking out recipes or formulaic semantic patterns to (1) We aim to produce a semantic representation based on ex- automatically mine machine-actionable information from scholarly isting work, that can be well motivated as an annotation articles [26–28, 36]. scheme for the application domain of NLP-ML scholarly ar- In [27], they annotate wet lab protocols, covering a large spec- ticles, and is specifically aimed at the knowledge capture of trum of experimental biology, including neurology, epigenetics, the contributions in scholarly articles; 18 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China D’Souza and Auer (2) The annotated scholarly contributions based on NLPContri- butions should be integrable in the Open Research Knowl- edge Graph (ORKG)1 –the state-of-the-art content-based knowl- edge capturing platform of scholarly articles’ contributions. (3) The NLPContributions model should be useful to produce data for the development of machine learning models in the form of machine readers [18] of scholarly contributions. Such trained models can serve to automatically extract such structured information for downstream applications, either in completely automated or semi-automated workflows as recommenders.2 (4) The NLPContributions model should be amenable to feed- back via a consensus approval or content annotation change suggestions from a large group of authors toward their schol- arly article contribution descriptions (an experiment that is beyond the scope of the present work and planned as following work). The NLPContributions annotation model is designed for build- ing a knowledge graph. It is not ontologized, therefore, we assume Figure 1: Fine-grained modeling illustration from a single a bottom-up data-driven design toward ontology discovery as more sentence for part of an Approach proposed in [9]. annotated contributions data is available. Nonetheless, we do pro- pose a core skeleton model for organizing the information at the top-level KG nodes. This involves a root node called Contribu- ResearchProblem did not involve semantic annotation granularity tion, following which, at the first level of the knowledge graph, beyond one level, annotating the Approach can. Sometimes the are ten nodes representing core information units under which the annotations (one or multi-layered) are created using the elements scholarly contributions data is organized. within a single sentence itself (see Figure 1); at other times, if they are multi-layered semantic annotations, they are formed by bridg- 3.2 The Ten Core Information Units ing two or more sentences based on their coreference relations. In this section, we describe the ten information units in our model. For the annotation element content itself, while, in general, the subject, predicate, and object phrases are obtained directly from the ResearchProblem. Determines the research challenge ad- sentence text, at times the predicate phrases have to be introduced dressed by a contribution using the predicate hasResearchProblem. as generic terms such as “has” or “on” or “has description” wherein By definition, it is the focus of the research investigation, in other the latter predicate is used for including, as objects, longer text words, the issue for which the solution must be obtained. fragments within a finer annotation granularity to describe the The task entails identifying only the research problem addressed top-level node. The actual type of approach is restricted to those in the paper and not research problems in general. For instance, sub-types stated in the beginning of the paragraph and is decided in the paper about the BioBERT word embeddings [30], their re- based on the the reference to the solution used by the authors or search problem is just the ‘domain-customization of BERT’ and not the solution description section name itself. If the reference to the ‘biomedical text mining,’ since it is a secondary objective. solution or its section name is specific to the paper, such as ‘Joint The ResearchProblem is typically found in an article’s Title, model,’ then we rename it to just ‘Model.’ In general, any alternate Abstract and first few paragraphs of the Introduction. The task in- namings of the solution, other than those mentioned earlier, includ- volves annotating one or more sentences and precisely the research ing “idea”, are normalized to “Model.” Finally, as machine learning problem phrase boundaries in the sentences. solutions, they are often given names. E.g., the model BioBERT [30], The subsequent seven information objects are connected to Con- in which case we introduce the predicate ‘called,’ as in (Method, tribution via the generic predicate has. called, BioBERT). Approach. Depending on the paper’s content, is referred to as The Approach is found in the article’s Introduction section in the Model or Method or Architecture or System or Application. context of cue phrases such as “we take the approach,” “we propose Essentially, this is the contribution of the paper as the solution the model,” “our system architecture,” or “the method proposed in this proposed for the research problem. paper.” However, there are exceptions when the Introduction does The annotations are made only for the high-level overview of the not present an overview of the system, in which case we analyze approach without going into system details. Therefore, the equa- the first few lines within the main system description content in tions associated with the model and all the system architecture the article. Also, if the paper refers to their system by “method” figures are not part of the annotations. While annotating the earlier or “application,” this is normalized to Approach information unit. System or Architecture is Model information unit. 1 https://www.orkg.org/orkg/ 2 In future work, we will expand our current pilot annotated dataset of 50 articles with Objective. This is the defined function for the machine learning at least 400 additional similarly annotated articles to facilitate machine learning. algorithm to optimize over. 19 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents NLPContributions: An Annotation Scheme EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China In some cases, the Approach objective is a complex function. In such cases, it is isolated as a separate information object connected directly to the Contribution. ExperimentalSetup. Has the alternate name Hyperparame- ters. It includes details about the platform including both hardware (e.g., GPU) and software (e.g., Tensorflow library) for implementing the machine learning solution; and of variables, that determine the network structure (e.g., number of hidden units) and how the network is trained (e.g., learning rate), for tuning the software to the task objective. Recent machine learning models are all neural based and such models have several associated variables such as hidden units, model regularization parameters, learning rate, word embedding dimensions, etc. Thus to offer users a glance at the contributed sys- Figure 2: Illustration of modeling of Result (from [53]) w.r.t. tem, this aspect is included in NLPContributions. We only model a precedence of its elements as [dataset -> task -> metric -> the experimental setup that are expressed in a few sentences or score]. that are concisely tabulated. There are cases when the experimental setup is not modeled at all within NLPContributions. E.g., for the complex “machine translation” models that involve many parame- ters. Thus, whether the experimental setup should be modeled or Experiments. Are an encompassing information unit that in- not, may appear as a subjective decision, however, over the course cludes one or more of the earlier discussed units. Can include a of several annotated articles becomes apparent especially when the combination of ExperimentalSetup and Results, or it can be com- annotator begins to recognize the simple sentences that describe bination of lists of Tasks and their Results, or a combination of the experimental setup. Approach, ExperimentalSetup and Results. The ExperimentalSetup unit is found in the sections called Ex- Recently, more and more multitask systems are being developed. periment, Experimental Setup, Implementation, Hyperparameters, Consider, the BERT model [15] as an example. Therefore, modeling or Training. ExperimentalSetup with Results or Tasks with Results is nec- Results. Are the main findings or outcomes reported in the essary in such systems since the experimental setup often changes article for the ResearchProblem. per task producing a different set of results. Hence, this information Each Result unit involves some of the following elements: {dataset, unit encompassing two or more sub information units is relevant. metric, task, performance score}. Regardless of how the sentence(s) are written involving these elements, we assume the following AblationAnalysis. Is a form of Results that describes the precedence order: [dataset -> task -> metric -> score] or [task -> performance of components in systems. dataset -> metric -> score], as far as it can be applied without sig- Unlike Results, AblationAnalysis is not performed in all pa- nificantly changing the information in the sentence. Consider this pers. Further, in papers that have them, we only model these results illustrated in Figure 2. In the figure, the JSON is arranged starting if they are expressed in a few sentences, similar to our modeling at the dataset, followed by the task, then the metric, and finally decision for Hyperparameters. the actual reported result. While this information unit is named The AblationAnalysis information unit is found in the sections per those stated in the earlier paragraph, if in a paper the section that have Ablation in their title. Otherwise, it can also be found name is non-generic, e.g., “Main results,” “End-to-end results,” it is in the written text without having a dedicated section for it. For normalized to a default name “Results.” instance, in the paper “End-to-End Relation Extraction using LSTMs The Results unit is found in the Results, Experiments, or Tasks on Sequences and Tree Structures” [35] there is no section title with sections. While the results are often highlighted in the Introduction, Ablation, but this information is extracted from the text via cue unlike the Approach unit, in this case, we annotate the dedicated, phrases that indicate ablation results are being discussed. detailed section on Results because results constitute a primary aspect of the contribution. Next we discuss the Tasks information Baselines. are those listed systems that a proposed approach unit, and note that Results can include Tasks and vice versa as we is compared against. describe next. The Baselines information unit is found in sections that have Baseline in their title. Otherwise, it can also be found in sections that Tasks. : The Approach or Model, particularly in multi-task set- are not directly titled Baseline, but require annotator judgement tings, are tested on more than one task, in which case, we list all the to infer that baseline systems are being discussed. For instance, experimental tasks. The experimental tasks are often synonymous in the paper “Extracting Multiple-Relations in One-Pass with Pre- with the experimental datasets since it is common in NLP for tasks Trained Transformers,” [50] the baselines are discussed in subsec- to be defined over datasets. Where lists of Tasks are concerned, tion ‘Methods.’ Or in paper “Outrageously large neural networks: the Tasks can include one or more of the ExperimentalSetup, The sparsely-gated mixture-of-experts layer,” [41], the baselines are Hyperparameters, and Results as sub information units. discussed in a section called “Previous State-of-the-Art.” 20 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China D’Souza and Auer Of these ten information units, only three are mandatory. They are ResearchProblem, Approach, and Results; the other seven may or may not be present depending on the content of the article. Code. is a link to the software on Github or on other similar open source platforms, or even on author’s website. 3.3 Contribution Sequences within Information Units Except for ResearchProblem, each of the remaining nine infor- mation units encapsulate different aspects of the contributions of scholarly investigations in the NLP-ML domain; with the Re- searchProblem offering the primary contribution context. Within the seven different aspects, there are what we call Contribution Sequences. Here, with the help of an example depicted in Figure 3 we il- lustrate the notion of contribution sequences. In this example, we model contribution sequences in the context of the Experimen- talSetup information unit. In the figure, this information unit has two contribution sequences. The first connected by predicate ‘used’ to the object ‘BERTBase model,’ and the second, also connected by predicate ‘used’ to the object ‘NVIDIA V100 (32GB) GPUs.’ The ‘BERTBase model’ contribution sequence includes a second level of detail expressed via two different predicates ‘pre-trained for’ and ‘pre-trained on.’ As a model of scientific knowledge, the triple with the entities connected by the first predicate, i.e. (BERTBase model, pre-trained for, 1M steps) reflects that the ‘BertBase model’ was pretrained for 1 million steps. The second predicate produces two Figure 3: Illustration of the modeling of Contribution Se- triples: (BERTBase model, pre-trained on, English Wikipedia) and quences in the Experimental Setup Information Unit (BERTBase model, pre-trained on, BooksCorpus). In each case, the (from [30]). Created using https://jsoneditoronline.org scientific knowledge captured by these two triples is that BERTBase was pretrained on {Wikipedia, BooksCorpus}. Note in the JSON data structure, the predicate connects the two objects as an array. 4.1 Pilot Task Steps Next, the second contribution sequence, hinged at ‘NVIDIA V100 (a) Contribution-Focused Sentence Annotations. In this stage, (32GB) GPUs’ as the subject has two levels of granularity. Consider sentences from scholarly articles were selected as candidate con- the following three triples: (NVIDIA V100 (32GB) GPUs, used, ten) tribution sentences under each of the aforementioned mandatory and (ten, for, pre-training). Note, in this nesting pattern, except three information units (viz., ResearchProblem, Approach, and for ‘NVIDIA V100 (32 GB) GPUs,’ the predicates {used, for} and Results) and, if applicable to the article, for one or more of the remaining entities {ten, pre-training} are nested according to their remaining seven information units as well. order of appearance in the written text. Therefore, in conclusion, To identify the contribution sentences in the article, the full-text an information unit can have several contribution sequences, and of the article is searched. However, as discussed at the end of Sec- the contribution sequences need not be identically modeled. For tion 2, the Background, Related Work, and Conclusions sections are instance, our second contribution sequence is modeled in a fine entirely omitted from the search. Further, the section discussing the grained manner, i.e. in multiple levels. And when fine-grained mod- Approach or the System is only referred to when the Introduction eling is employed, it is relatively straightforward to spot in the section does not offer sufficient highlights of this information unit. sentence(s) being modeled. In addition, except for tabulated hyperparameters, we do not con- sider other tables for annotation within the NLPContributions model. 4 THE PILOT ANNOTATION TASK To better clarify the pilot task process, in this subsection, we use The pilot annotation task was performed by a postdoctoral re- Figure 2 as the running example. From the example, at this stage, searcher with a background in natural language processing. The the sentence “For NER (Table 7), S-LSTM gives an F1-score of 91.57% NLPContributions model or scheme just described, were devel- on the CoNLL test set, which is significantly better compared with oped over the course of the pilot task. At a high-level, the annota- BiLSTMs.” is selected as one of the contribution sentence candidates tions were performed in three main steps. They are presented next, as part of the Results information unit. This sentence is selected after which we describe the annotation guidelines. from a Results subsection in [53], but is just one among three others. 21 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents NLPContributions: An Annotation Scheme EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China (b) Chunking Phrase Spans for Subject, Predicate, Object En- sentence is either put as an array element, or as a nested dictionary tities. Then for the selected sentences, we annotate their scientific element. knowledge entities. The entities are annotated by annotators hav- Are the nested contribution sequences always obtained from ing an implicit understanding of whether they take the subject, a single sentence? The triples can be nested based on information predicate, or object roles in a per triple context. As a note, by our from one or more sentences in the article. Further, the sentences annotation scheme, predicates are not mandatorily verbs and can need not be consecutive in the running text. As mentioned earlier, be nouns as well. the evidence sentences are attached to the first element or the Resorting to our running example, for the selected sentence, second element by the predicate “from sentence.” If a contribution this stage involves annotating the phrases “For,” “NER,” “F1-score,” sequence is generated from a table then the table number in the “91.57%,” and “CoNLL test set,” with the annotator cognizant of original paper is referenced. the fact that they will use the [dataset -> task -> metric -> score] When is the Approach actually modeled from the dedicated scientific entity precedence in the next step. section as opposed to the Introduction? In general, we avoid an- (c) Creating contribution sequences. This involves relating the notating the Approach or Model sections for their contribution subjects and objects within triples, which as illustrated in Section sentences as they tend to delve deeply into the approach or model 3.3, the object in one triple can be a subject in another triple if details, and involve complicated elements such as equations, etc. the annotation is performed at a fine-grained level of detail. For Instead, we restrict ourselves to the system higlights in the Intro- the most part, the nesting is done per order of appearance of the duction. However, in some articles the Introduction doesn’t offer entities in the text, except for those involving the scientific entities system highlights which is when we resort to using the dedicated {dataset, task, metric, score} under the Results information unit. section for the contribution highlights in this mandatory informa- In the context of our running example, given the early annotated tion unit. scientific entities, in this stage, the annotator will form the following Do we explore details about hardware used as part of the con- two triples: (CoNLL test set, For, NER), (NER, F1-score, 91.57%) as a tribution? Yes, if it is explicitly part of the hyperparameters. single contribution sequence. What is not depicted in Figure 1 are Are predicates always verbs? Predicates are not always verbs. the top-level annotations including the root node and one of the ten They can also be nouns especially in the hyperparameters section. information unit nodes. This is modeled as follows: (Contribution, Creating contribution sequences from tabulated hyperparam- has, Results), and (Results, has, CoNLL test set). eters. Only for hyperparameters, we model their tabulated version if given. This is done as follows: 1) for the predicate, we use the name of the parameter; and 2) for the object, the value against the name. Sometimes, however, if there are two-level hierarchical 4.2 Task Guidelines parameters, then the predicate is the first name, object is the value, In this section, we elicit a set of general guidelines that inform the and the value is qualified by the parameter name lower in the hier- annotation task. archy. Qualifying the second name involves introducing the “for” How are information unit names selected? For information units predicate. such as Approach, ExperimentalSetup, and Results that each How are lists modeled within contribution sequences? As part have a set of candidate names, the applied name is the one selected of the contribution sentence candidates, are also included sentences based on the closest section title or cue phrase. with lists. Such sentences are predominantly found for the Exper- Which of the ten information units does the sentence belong imentalSetup or Result information units. This is modeled as to? Conversely to the above, if a sentence is first identified as a depicted in Figure 4 for the first two list elements. Here, the Model contribution sentence candidate, it is placed within the information information unit has two contribution sequences, each pertaining unit category that is identified directly based on the section header to a specific list item in the sentence. Further, the predicate “has for the sentence in the paper or inferred from cue phrases from the description” is introduced for linking text descriptions. first few sentences in its section. Which JSON structures are used to represent the data? Flexibly, Inferring Predicates. In ideal settings, the constraint on the text they include dictionaries, or nested dictionaries, or arrays of items, used for subjects, objects, and predicates in contribution sequences where the items can be strings, dictionaries, nested dictionaries, or is that they should be found in their corresponding sentence. How- arrays themselves. ever, for predicates this is not always possible. Since predicate How are appositives handled? We introduce a new predicate information may not always be found in the text, it is sometimes “name” to handle appositives. annotated additionally based on the annotator judgment. However, even this open-ended choice remains restricted to a predefined set 5 MATERIALS AND TOOLS of candidates. It includes {“has”, “on”, “by”, “for”, “has value”, “has description”, “based on”, “called”}. 5.1 Paper Selection How are the supporting sentences linked to their correspond- A collection of scholarly articles is downloaded based on the ones in ing contribution sequence within the overall JSON object? The the publicly available leaderboard of tasks in artificial intelligence sentence(s) is stored in a dictionary with a “from sentence” key, called https://paperswithcode.com/. It predominantly represents which is then attached to either the first element or, if it is a nested papers in the Natural Language Processing and Computer Vision triples hierarchy, sometimes even to the second element of a contri- fields. For the purposes of our NLPContributions model, we re- bution sequence. The dictionary data-type containing the evidence strict ourselves just to the NLP papers. From the set, we randomly 22 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China D’Souza and Auer 5.2 Data Representation Format and Annotation Tools JSON was the chosen data format for storing the semantified parts of the scholarly articles contributions. To avoid syntax errors in creating the JSON objects, the annotations were made via https: //jsoneditoronline.org which imposes valid JSON syntax checks. Finally, in the early stages of the annotation task, some of the an- notations were made manually in the ORKG infrastructure https: //www.orkg.org/orkg/ to test their practical suitability in a knowl- edge graph; three of such annotated papers are depicted in Figure 6. The links in the Figure captions can be visited to explore the anno- tations at their finer granularity of detail. 5.3 Annotated Dataset Characteristics Overall, the annotated corpus contains a total of 2631 triples (avg. of 52 triples per article). Its data elements comprise 1033 unique subjects, 843 unique predicates, and 2182 unique objects. In Table 1 below, we show the per-task distribution of triples and their ele- ments. Of all tasks, relation classification has the highest number of unique triples (544) and named entity recognition the least (473). Generally, in the context of triples formation, predicates are often selected from a closed set and hence comprise a smaller group Figure 4: Illustration of the modeling of a sentence with a list of items. In the NLPContribution model, however, predicates as part of the Model Information Unit (from [29]). Created are extracted from the text if present. This leads to a much larger using https://jsoneditoronline.org set of predicates that would require the application of predicate normalization functions to find the smaller core semantic set. In Figure 5, to offer some insights to this end, we show the predicates select 10 papers in five different NLP-ML research tasks: 1. machine that appear more than 15 times over all the triples. We find the translation, 2. named entity recognition, 3. question answering, 4. predicate has appears most frequently since its function often serves relation classification, and 5. text classification. as a filler predicate. A complete list of the predicates is released in our dataset repository online https://doi.org/10.25835/0019761. MT NER QA RC TC achieves 15 Subject 259 209 203 228 221 propose 17 Predicate 243 220 187 201 252 used 18 from 24 Object 471 434 515 455 459 is 24 Total 502 473 497 544 504 to 28 Table 1: Per-task (machine translation (MT), named entity using 32 recognition (NER), question answering (QA), relation classi- called 36 fication (RC), text classification (TC)) triples distribution in has description 38 terms of unique subject, predicate, object, and overall. outperforms 39 of 41 in 42 use 47 from sentence 53 6 USE CASE: NLPCONTRIBUTIONS IN ORKG by 54 As a use case of the ORKG infrastructure, instead of presenting just with 82 for the annotations obtained from NLPContributions, we present a 90 on 119 further enriched showcase. Specifically, we model the evolution of has research problem 159 the annotation scheme at three different attempts with the third one has 251 arriving at NLPContributions. This is depicted in Figure 6. Our use case is an enriched one for two reasons: 1) it depicts the ORKG 0 100 200 infrastructure flexibility for data-driven ontology discovery that Frequency of Occurrence makes allowances for different design decisions; and 2) it also shows how within flexible infrastructures the possibilities can be too wide Figure 5: A list of the predicates in our triples dataset that that arriving at a consensus can potentially prove a challenge if it appear more than 15 times. isn’t mandated at a critical point in the data accumulation. 23 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents NLPContributions: An Annotation Scheme EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China (a) Research paper [45] top-level snapshot in ORKG https://www.orkg.org/orkg/paper/R41467/ (b) Research paper [21] top-level snapshot in ORKG https://www.orkg.org/orkg/paper/R41374 (c) Research paper [54] top-level snapshot in ORKG https://www.orkg.org/orkg/paper/R44287 Figure 6: Figures 6(a),6(b),6(c) depict evolution of the annotation scheme over three different research papers. Fig. 6(c) is the resulting selected format NLPContributions that is proposed in this paper. 24 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China D’Souza and Auer Figure 6(a) depicts the first modeling attempts of an NLP-ML others. E.g., information units such as ResearchProblem, Experi- contribution. For predicates, the model restricts itself to use only mentalSetup, Results, and Baselines are readily amenable for those found in the text. The limitation of such a model is that not systematic templates discovery toward their structured modeling normalizing linguistic variations very rarely creates comparable within the ORKG; whereas the remaining information units, espe- models across investigations even if they imply the same thing. cially Approach or Model, will require additional normalization Hence, we found that for comparability a common predicate vocab- steps toward the search for their better structuring. ulary at the top-level in the model minimally needs to be in place. Figure 6(b) is the second attempt of modeling a different NLP-ML contribution. In this attempt, the predicates at the top-level are 9 CONCLUSIONS AND FUTURE DIRECTIONS mostly normalized to a generic “has,” however, “has” is connected The Open Research Knowledge Graph [3] makes scholarly knowl- to various information items again lexically based on the text of edge about research contributions machine-actionable: i.e. findable, the scholarly articles, one or more of which can be grouped under structured, and comparable. Manually building such a knowledge a common category. Via such observations, we systematized the graph is time-consuming and requires the expertise of paper au- knowledge organization at the top-level of the graph by introducing thors and domain experts. In order to efficiently build a scholarly the ten information unit nodes. Figure 6(c) is the resulting NLP- knowledge contributions graph, we will leverage the technology of Contributions annotations model. Within this model, scholarly machine readers [18] to assist the user in annotating scholarly arti- contributions with one or more of the information units in common, cle contributions. But the machine readers will need to be trained viz. “Ablation study,” “Baseline Models,” “Model,” and “Results,” can for such a task objective. To this end, in this work, we have proposed be uniformly compared. an annotation scheme for capturing the contributions in natural language processing scholarly articles, in order to create such train- ing datasets for machine readers. In addition, we also provide a 7 LIMITATIONS set of 50 annotated articles by the NLPContributions scheme Obtaining disjoint (subject, predicate, object) triples as con- as a practical demonstration of feasibility of the annotation task. tribution sequences. It was not possible to extract disjoint triples However, for the training of machine learning models in future from all sentences. In many cases, we extract the main predicate and work we will release a larger dataset annotated by the proposed use as object the relevant full sentence or its clausal part. From [30], scheme. To facilitate future research, our pilot dataset is released for instance, under the ExperimentalResults information unit, online at https://doi.org/10.25835/0019761. we model the following: (Contribution, has, Experimental results); Finally, aligned with the initiatives within research communities (Experimental results, on, all datasets); and (all datasets, achieves, to build the Internet of FAIR Data and Services (IFDS) [6], the data BioBERT achieves higher scores than BERT). Note, in the last triple, within ORKG are compliant [38] with such FAIR data principles [52] “achieves” was used as a predicate and its object “BioBERT achieves thus making them Findable, Accessible, Interoperable and Reusable. higher scores than BERT” is modeled as a clausal sentence part. Since the dataset we annotate by our proposed scheme is designed Employing coreference relations between scientific entities. In to be ORKG-compliant, we adopt the cutting-edge standard of data the fine-grained modeling of schemas, scientific entities within creation within the research community. triples are sometimes nested across sentences by leveraging their Nevertheless, the NLPContribution model is a surface seman- coreference relations. We consider this a limitation toward the tic structuring scheme for the contributions in unstructured text. To automated machine reading task, since coreference resolution itself realize a full-fledged machine-actionable and inferenceable knowl- is often challenging to perform automatically. edge graph of scholarly contributions, as future directions, there are Tabulated results are not incorporated within NLPContribu- a few IE modules that would need to be improved or added. They tions. Unlike tabulated hyperparameters which have a standard are (1) improving the PDF parser to produce less noisy output; (2) format, tabulated results have significantly varying formats. Thus incorporating an entity and relation linking and normalization mod- their automated table parsing is a challenging task in itself. Nonethe- ule; (3) merging phrases from the unstructured text with known less, by considering the textual results, we relegate ourselves to ontologies (e.g., the MEX vocabulary [17]) to align resources and their summarized description, which often serves sufficient for thus ensure data interoperability and reusability; and (4) extending highlighting the contribution. the model to more scholarly disciplines and domains. Can all NLP-ML papers be modeled by NLPContributions? While we can conclude that some papers are easier to model than REFERENCES others (e.g., articles addressing ‘relation extraction’ vs. ‘machine [1] Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Craw- translation’ which are harder), it is possible that all papers can be ford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu modelled by at least some if not all the information units of the Ha, et al. 2018. Construction of the Literature Graph in Semantic Scholar. In NAACL, Volume 3 (Industry Papers). 84–91. model we propose. [2] Waleed Ammar, Matthew E. Peters, Chandra Bhagavatula, and Russell Power. 2017. The AI2 system at SemEval-2017 Task 10 (ScienceIE): semi-supervised end-to-end entity and relation extraction. In SemEval@ACL. [3] Sören Auer. 2018. Towards an Open Research Knowledge Graph. https://doi. 8 DISCUSSION org/10.5281/zenodo.1157185 From the pilot dataset annotation exercise, we note the following [4] Sören Auer, Viktor Kovtun, Manuel Prinz, Anna Kasprzik, Markus Stocker, and Maria Esther Vidal. 2018. Towards a knowledge graph for science. In Proceedings regarding task practically. Knowledge modeled under some infor- of the 8th International Conference on Web Intelligence, Mining and Semantics. mation units are more amenable to systematic structuring than 1–6. 25 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents NLPContributions: An Annotation Scheme EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China [5] Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, and An- [30] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, drew McCallum. 2017. SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases Chan Ho So, and Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical language and Relations from Scientific Publications. In SemEval@ACL. representation model for biomedical text mining. Bioinformatics 36, 4 (2020), [6] Paul Ayris, Jean-Yves Berthou, Rachel Bruce, Stefanie Lindstaedt, Anna Monreale, 1234–1240. Barend Mons, Yasuhiro Murayama, Caj Södergård, Klaus Tochtermann, and [31] Maria Liakata, Shyamasree Saha, Simon Dobnik, Colin Batchelor, and Dietrich Ross Wilkinson. 2016. Realising the European open science cloud. Luxembourg. Rebholz-Schuhmann. 2012. Automatic recognition of conceptualization zones in https://doi.org/10.2777/940154 scientific articles and two life science applications. Bioinformatics 28, 7 (2012), [7] Niranjan Balasubramanian, Stephen Soderland, Oren Etzioni, et al. 2013. Gener- 991–1000. ating coherent event schemas at scale. In EMNLP. 1721–1731. [32] Maria Liakata, Simone Teufel, Advaith Siddharthan, and Colin R. Batchelor. 2010. [8] Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A pretrained language Corpora for the Conceptualisation and Zoning of Scientific Papers. In LREC. model for scientific text. In EMNLP-IJCNLP. 3606–3611. [33] Yi Luan, Luheng He, Mari Ostendorf, and Hannaneh Hajishirzi. 2018. Multi-Task [9] Antoine Bordes, Jason Weston, and Nicolas Usunier. 2014. Open question an- Identification of Entities, Relations, and Coreference for Scientific Knowledge swering with weakly supervised embedding models. In Joint European conference Graph Construction. In EMNLP. on machine learning and knowledge discovery in databases. 165–180. [34] Yi Luan, Mari Ostendorf, and Hannaneh Hajishirzi. 2017. Scientific information [10] Arthur Brack, Jennifer D’Souza, Anett Hoppe, Sören Auer, and Ralph Ewerth. extraction with semi-supervised neural tagging. arXiv preprint arXiv:1708.06075 2020. Domain-Independent Extraction of Scientific Concepts from Research (2017). Articles. In Advances in Information Retrieval. Springer International Publishing, [35] Makoto Miwa and Mohit Bansal. 2016. End-to-End Relation Extraction using 251–266. LSTMs on Sequences and Tree Structures. In Proceedings of the 54th ACL (Volume [11] Nathanael Chambers. 2013. Event schema induction with a probabilistic entity- 1: Long Papers). 1105–1116. driven model. In EMNLP. 1797–1807. [36] Sheshera Mysore, Zachary Jensen, Edward Kim, Kevin Huang, Haw-Shiuan [12] Nathanael Chambers and Dan Jurafsky. 2008. Unsupervised learning of narrative Chang, Emma Strubell, Jeffrey Flanigan, Andrew McCallum, and Elsa Olivetti. event chains. In Proceedings of ACL-08: HLT. 789–797. 2019. The Materials Science Procedural Text Corpus: Annotating Materials [13] Nathanael Chambers and Dan Jurafsky. 2009. Unsupervised learning of narrative Synthesis Procedures with Shallow Semantic Structures. In Proceedings of the schemas and their participants. In Proceedings of the Joint Conference of the 47th 13th Linguistic Annotation Workshop. 56–64. Annual Meeting of the ACL and the 4th International Joint Conference on Natural [37] Allard Oelen, Mohamad Yaser Jaradeh, Kheir Eddine Farfar, Markus Stocker, and Language Processing of the AFNLP: Volume 2-Volume 2. 602–610. Sören Auer. 2019. Comparing Research Contributions in a Scholarly Knowledge [14] Alexandru Constantin, Silvio Peroni, Steve Pettifer, David Shotton, and Fabio Graph. In K-CAP 2019. 21–26. Vitali. 2016. The document components ontology (DoCO). Semantic web 7, 2 [38] A. Oelen, M. Y. Jaradeh, M. Stocker, and S. Auer. 2020. Generate FAIR Literature (2016), 167–181. Surveys with Scholarly Knowledge Graphs. [15] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: [39] Vayianos Pertsas and Panos Constantopoulos. 2017. Scholarly Ontology: mod- Pre-training of Deep Bidirectional Transformers for Language Understanding. In elling scholarly practices. International Journal on Digital Libraries 18, 3 (2017), Proceedings of NAACL, Volume 1 (Long and Short Papers). Minneapolis, Minnesota, 173–190. 4171–4186. [40] Roger C Schank and Robert P Abelson. 1977. Scripts, plans, goals and under- [16] Jennifer D’Souza, Anett Hoppe, Arthur Brack, Mohmad Yaser Jaradeh, Sören standing: An inquiry into human knowledge structures. (1977). Auer, and Ralph Ewerth. 2020. The STEM-ECR Dataset: Grounding Scientific [41] Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Geoffrey Hinton, and Jeff Dean. 2017. Outrageously large neural networks: The Lexicographic Sources. In LREC. Marseille, France, 2192–2203. sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538 (2017). [17] Diego Esteves, Diego Moussallem, Ciro Baron Neto, Tommaso Soru, Ricardo [42] Dan Simonson and Anthony Davis. 2015. Interactions between narrative schemas Usbeck, Markus Ackermann, and Jens Lehmann. 2015. MEX vocabulary: a and document categories. In Proceedings of the First Workshop on Computing News lightweight interchange format for machine learning experiments. In Proceedings Storylines. 1–10. of the 11th International Conference on Semantic Systems. 169–176. [43] Dan Simonson and Anthony Davis. 2016. NASTEA: Investigating narrative [18] Oren Etzioni, Michele Banko, and Michael J Cafarella. 2006. Machine Reading.. schemas through annotated entities. In Proceedings of the 2nd Workshop on Com- In AAAI, Vol. 6. 1517–1519. puting News Storylines. 57–66. [19] Said Fathalla, Sahar Vahdati, Sören Auer, and Christoph Lange. 2017. Towards a [44] Dan Simonson and Anthony Davis. 2018. Narrative Schema Stability in News Text. knowledge graph representing research findings by semantifying survey articles. In Proceedings of the 27th International Conference on Computational Linguistics. In TPDL. Springer, 315–327. 3670–3680. [20] Beatríz Fisas, Francesco Ronzano, and Horacio Saggion. 2016. A Multi-Layered [45] Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski. Annotated Corpus of Scientific Papers. In LREC. 2019. Matching the Blanks: Distributional Similarity for Relation Learning. In [21] Zhijiang Guo, Yan Zhang, and Wei Lu. 2019. Attention Guided Graph Convolu- ACL. 2895–2905. tional Networks for Relation Extraction. In ACL. 241–251. [46] Larisa N. Soldatova and Ross D. King. 2006. An ontology of scientific experiments. [22] Siegfried Handschuh and Behrang QasemiZadeh. 2014. The ACL RD-TEC: a Journal of the Royal Society, Interface 3 11 (2006), 795–803. dataset for benchmarking terminology extraction and classification in computa- [47] Simone Teufel, Jean Carletta, and Marc Moens. 1999. An annotation scheme for tional linguistics. In COLING 2014: 4th international workshop on computational discourse-level argumentation in research articles. In Proceedings of the ninth terminology. conference on European chapter of ACL. 110–117. [23] John PA Ioannidis. 2016. The mass production of redundant, misleading, and [48] Simone Teufel, Advaith Siddharthan, and Colin Batchelor. 2009. Towards conflicted systematic reviews and meta-analyses. The Milbank Quarterly 94, 3 discipline-independent argumentative zoning: evidence from chemistry and (2016), 485–514. computational linguistics. In EMNLP: Volume 3. 1493–1502. [24] Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jen- [49] Lars Vogt, Jennifer D’Souza, Markus Stocker, and Sören Auer. 2020. Toward nifer D’Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. 2019. Open Representing Research Contributions in Scholarly Knowledge Graphs Using Research Knowledge Graph: Next Generation Infrastructure for Semantic Schol- Knowledge Graph Cells. In JCDL ’20, August 1–5, 2020, Virtual Event, China. arly Knowledge. In KCAP (Marina Del Rey, CA, USA). ACM, New York, NY, USA, [50] Haoyu Wang, Ming Tan, Mo Yu, Shiyu Chang, Dakuo Wang, Kun Xu, Xiaoxiao 243–246. Guo, and Saloni Potdar. 2019. Extracting Multiple-Relations in One-Pass with [25] Arif E Jinha. 2010. Article 50 million: an estimate of the number of scholarly Pre-Trained Transformers. In Proceedings of the 57th ACL. 1371–1377. articles in existence. Learned Publishing 23, 3 (2010), 258–263. [51] Mark Ware and Michael Mabe. 2015. The STM Report: An overview of scientific [26] Olga Kononova, Haoyan Huo, Tanjin He, Ziqin Rong, Tiago Botari, Wenhao Sun, and scholarly journal publishing. (03 2015). Vahe Tshitoyan, and Gerbrand Ceder. 2019. Text-mined dataset of inorganic [52] Mark D Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Apple- materials synthesis recipes. Scientific data 6, 1 (2019), 1–11. ton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino [27] Chaitanya Kulkarni, Wei Xu, Alan Ritter, and Raghu Machiraju. 2018. An da Silva Santos, Philip E Bourne, et al. 2016. The FAIR Guiding Principles for Annotated Corpus for Machine Reading of Instructions in Wet Lab Proto- scientific data management and stewardship. Scientific data 3 (2016). cols. In NAACL: HLT, Volume 2 (Short Papers). New Orleans, Louisiana, 97–106. [53] Yue Zhang, Qi Liu, and Linfeng Song. 2018. Sentence-State LSTM for Text https://doi.org/10.18653/v1/N18-2016 Representation. In Proceedings of the 56th ACL (Volume 1: Long Papers). 317–327. [28] Fusataka Kuniyoshi, Kohei Makino, Jun Ozawa, and Makoto Miwa. 2020. Annotat- [54] Yuhao Zhang, Peng Qi, and Christopher D Manning. 2018. Graph Convolution ing and Extracting Synthesis Process of All-Solid-State Batteries from Scientific over Pruned Dependency Trees Improves Relation Extraction. In EMNLP. 2205– Literature. In LREC. 1941–1950. 2215. [29] Joohong Lee, Sangwoo Seo, and Yong Suk Choi. 2019. Semantic Relation Clas- sification via Bidirectional LSTM Networks with Entity-aware Attention using Latent Entity Typing. arXiv preprint arXiv:1901.08163 (2019). 26 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China D’Souza and Auer A TWICE MODELING AGREEMENT day and blind from the the first. While neither are incorrect, the sec- In general, even if the annotations are performed by a single anno- ond has taken the least annotated information route possibly due to tator, there will be an annotation discrepancy. Compare the same annotator fatigue, hence a two-pass methodology is recommended. information unit “Experimental Setup” modeled in Figure 7 below versus that modeled in Figure 3. Fig. 7 was the first annotation at- tempt and includes the second attempted model, done on a different Figure 7: Illustration of modeling of Contribution Sequences in the Experimental Setup Information Unit (from [30]) in a first annotation attempt. Contrast with second attempt depicted in Figure 3 in the main paper content. 27