EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


                   NLPContributions: An Annotation Scheme for
                   Machine Reading of Scholarly Contributions in
                      Natural Language Processing Literature
                         Jennifer D’Souza                                                              Sören Auer
         TIB Leibniz Information Centre for Science and                           TIB Leibniz Information Centre for Science and
                           Technology                                                   Technology & L3S Research Center
                       Hannover, Germany                                                        Hannover, Germany
                     jennifer.dsouza@tib.eu                                                     soeren.auer@tib.eu
ABSTRACT                                                                      CCS CONCEPTS
We describe an annotation initiative to capture the scholarly contri-         • General and reference → Computing standards, RFCs and guide-
butions in natural language processing (NLP) articles, particularly,          lines; • Information systems → Document structure; Ontolo-
for the articles that discuss machine learning (ML) approaches for            gies; Data encoding and canonicalization.
various information extraction tasks. We develop the annotation
task based on a pilot annotation exercise on 50 NLP-ML scholarly ar-          KEYWORDS
ticles presenting contributions to five information extraction tasks          dataset, annotation guidelines, semantic publishing, digital libraries,
1. machine translation, 2. named entity recognition, 3. question              scholarly knowledge graphs, open science graphs
answering, 4. relation classification, and 5. text classification. In
this article, we describe the outcomes of this pilot annotation phase.
Through the exercise we have obtained an annotation methodology;
                                                                              1 INTRODUCTION
and found ten core information units that reflect the contribution
of the NLP-ML scholarly investigations. The resulting annotation              As the rate of research publications increases [51], there is a growing
scheme we developed based on these information units is called                need within digital libraries to equip researchers with alternative
NLPContributions.                                                             knowledge representations, other than the traditional document-
   The overarching goal of our endeavor is four-fold: 1) to find a            based format, for keeping pace with the rapid research progress [3].
systematic set of patterns of subject-predicate-object statements for         In this regard, several efforts exist or are currently underway for
the semantic structuring of scholarly contributions that are more             semantifying scholarly articles for their improved machine inter-
or less generically applicable for NLP-ML research articles; 2) to            pretability and ease in comprehension [19, 24, 38, 49]. These models
apply the discovered patterns in the creation of a larger annotated           equip experts with a tool for semantifying their scholarly publi-
dataset for training machine readers [18] of research contributions;          cations ranging from strictly-ontologized methodologies [19, 49]
3) to ingest the dataset into the Open Research Knowledge Graph               to less-strict, flexible description schemes [24, 37], wherein the
(ORKG) infrastructure as a showcase for creating user-friendly                latter aim toward the bottom-up, data-driven discovery of an ontol-
state-of-the-art overviews; 4) to integrate the machine readers into          ogy. Consequently, knowledge graphs [1, 4] are being advocated
the ORKG to assist users in the manual curation of their respective           as a promising alternative to the document-based format for repre-
article contributions. We envision that the NLPContributions                  senting scholarly knowledge for the enhanced content ingestion
methodology engenders a wider discussion on the topic toward its              enabled via their fine-grained machine interpretability.
further refinement and development. Our pilot annotated dataset of               The automated semantic extraction from scholarly publications
50 NLP-ML scholarly articles according to the NLPContributions                using text mining has seen early initiatives based on sentences as
scheme is openly available to the research community at https:                the basic unit of analysis. To this end, ontologies and vocabularies
//doi.org/10.25835/0019761.                                                   were created [14, 39, 46, 47], corpora were annotated [20, 32], and
                                                                              machine learning methods were applied [31]. Recently, scientific
                                                                              IE has targeted search technology, thus newer corpora have been
                                                                              annotated at the phrasal unit of information with three or six types
                                                                              of scientific concepts in up to ten disciplines [5, 16, 22, 33] facilitat-
                                                                              ing machine learning system development [2, 8, 10, 34]. In general,
                                                                              a phrase-focused annotation scheme more directly influences the
                                                                              building of a scholarly knowledge graph, since phrases constitute
                                                                              knowledge graph statements. Nonetheless, sentence-level anno-
                                                                              tations are just as poignant offering knowledge graph modelers


      Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                         16
                       EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China                                                                             D’Souza and Auer


better context from which the phrases are obtained for improved                  for this purpose, but they are not machine interpretable, in other
knowledge graph curation.                                                        words, they cannot be comparatively organized. Further, the un-
   Over which, many recent data collection and annotation ef-                    structured abstracts representation still treats research as data silos,
forts [26–28, 36] are steering new directions in text mining re-                 thus with this model, research endeavors, in general, continue to
search on scholarly publications. These initiatives are focused on               be susceptible to redundancy [23], lacking a meaningful way of
the shallow semantic structuring of the instructional content in lab             connecting structured and unstructured information.
protocols or descriptions of chemical synthesis reactions. This has
entailed generating annotated datasets via structuring recipes to                1.1    Our Contribution
facilitate their automatic content mining for machine-actionable
                                                                                 In this paper, we propose a surface semantically structured dataset
information which are presented otherwise in adhoc ways within
                                                                                 of 50 scholarly articles for their research contributions in the field
scholarly documentation. Such datasets inadvertently facilitate the
                                                                                 of natural language processing focused on machine learning ap-
development of machine readers. In the past, such similar text min-
                                                                                 plications (the NLP-ML domain) across five different information
ing research was conducted as the unsupervised mining of Schemas
                                                                                 extraction tasks to be integrable within the ORKG. To this end,
(also called scripts, templates, or frames)—as a generalization of re-
                                                                                 we (1) identify sentences in scholarly articles that reflect research
curring event knowledge (involving a sequence of three to ten
                                                                                 contributions; (2) create structured (subject,predicate,object) anno-
events) with various participants [40]—primarily over newswire
                                                                                 tations from these sentences by identifying mentions of the con-
articles [7, 11–13, 42–44]. They were a potent task at generaliz-
                                                                                 tribution candidate term phrases and their relations; and (3) group
ing over similar but distinct narratives—can be seen as knowledge
                                                                                 collections of such triples, that arise from either consecutive or
units—with the goal of revealing their underlying common ele-
                                                                                 non-consecutive sentences, under one of ten core information units
ments. However, little insight was garnered on their practical task
                                                                                 that capture an aspect of the contribution of NLP-ML scholarly
relevance. This has changed with the recent surface semantic struc-
                                                                                 articles. These core information units are conceptually posited as
turing initiatives over instructional content. It has led to the re-
                                                                                 thematic scripts [40]. The resulting model formalized from the pilot
alization of a seemingly new practicable direction that taps into
                                                                                 annotation exercise we call the NLPContributions scheme.
the structuring of text and the structured information aggregation
                                                                                    It has the following characteristics: (1) via a contribution-centered
under Scripts-based knowledge themes.
                                                                                 model, it makes realistic the otherwise forbidding task of semanti-
   Since scientific literature is growing at a rapid rate and researchers
                                                                                 cally structuring full-text scholarly articles—our task only needs a
today are faced with this publications deluge [25], it is increas-
                                                                                 surface structuring of the highlights of the approach which often
ingly tedious, if not practically impossible to keep up with the
                                                                                 can be found in the Title, the Abstract, one or two paragraphs in the
progress even within one’s own narrow discipline. The Open Re-
                                                                                 Introduction, and in the Results section; (2) it offers guidance for a
search Knowledge Graph (ORKG) [4] is posited as a solution to
                                                                                 structuring methodology, albeit still encompassing subjective deci-
the problem of keeping track of research progress minus the cog-
                                                                                 sions to a certain degree, but overall presenting a uniform model
nitive overload that reading dozens of full papers impose. It aims
                                                                                 for identifying and structuring contributions—note that without
to build a comprehensive knowledge graph that publishes the re-
                                                                                 a model, such structuring decisions may not end up being compa-
search contributions of scholarly publications per paper, where the
                                                                                 rable across users and their modeled papers (see Figure 6); (3) the
contributions are interconnected via the graph even across papers.
                                                                                 dataset is annotated in JSON format since it preserves relation hier-
At https://www.orkg.org/ one can view the contribution knowledge
                                                                                 archies; (4) the annotated data we produce can be practically lever-
graph of a single paper as a summary over its key contribution prop-
                                                                                 aged within frameworks such as the ORKG that support structured
erties and values; or compare the contribution knowledge graphs
                                                                                 scholarly content-based knowledge ingestion. With the integration
over common properties across several papers in a tabulated survey.
                                                                                 of our semantically structured scholarly contributions data in the
Practical examples of the latter can be found accessible online at
                                                                                 ORKG, we aim to address the tedious and time-consuming scholarly
https://www.orkg.org/orkg/featured-comparisons. This practically
                                                                                 knowledge ingestion problem via its contributions comparison fea-
addresses the knowledge ingestion problem for researchers. How?
                                                                                 ture. And further, by using the graph-based model, we also address
With the ORKG comparisons feature, researchers are no longer
                                                                                 the problem of scholarly information produced as data silos, as the
faced with the daunting cognitive ingestion obstacle from manually
                                                                                 ORKG connects the structured information across papers.
scouring through dozens of papers of unstructured content in their
field. Where this process traditionally would take several days or
months, using the ORKG contributions comparison tabulated view,                  2     BACKGROUND AND RELATED WORK
the task is reduced to just a few minutes. Assuming the individual                  Sentence-based Annotations of Scholarly Publications. Early
paper contributions are structured in the ORKG, they can then sim-               initiatives in semantically structuring scholarly publications fo-
ply deconstruct the graph, tap into the aspects they are interested              cused on sentences as the basic unit of analysis. In these sentence-
in, and can enhance it for their purposes. Further, they can select              based annotation schemes, all annotation methodologies [20, 32,
multiple such paper graphs and with the click of a button gener-                 47, 48] have had very specific aims for scientific knowledge cap-
ate their tabulated comparison. For additional details on systems                ture. Seminal works in this direction consider the CoreSC (Core
and methods beyond just the contribution highlights, they can still              Scientific Concepts) sentence-based annotation scheme [32]. This
choose to read the original articles, but this time around equipped              scheme aimed to model in finer granularity, i.e. at the sentence-
with a better selective understanding of which articles they should              level, concepts that are necessary for the description of a scientific
read in depth. Of-course scholarly article abstracts are intended                investigation, while traditional approaches employ section names


                                                                            17
                     EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


NLPContributions: An Annotation Scheme                                                         EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China


serving as coarse-grained paragraph-level annotations. Such se-              metabolomics, cancer and stem cell biology, with actions corre-
mantified scientific knowledge capture was apt at highlighting               sponding to lab procedures and their attributes including materials,
selected sentences within computer-based readers. In this applica-           instruments and devices used to perform specific actions. Thereby
tion context, mere sectional information organization for papers             the protocols then constituted a prespecified machine-readable for-
was considered as missing the finer rhetorical semantic classifi-            mat as opposed to the ad-hoc documentation norm. Kulkarni et
cations. E.g., in a Results section, the author may also provide             al. [27] even release a large human-annotated corpus of semantified
some sentences of background information, which in a sentence-               wet lab protocols to facilitate machine learning of such shallow se-
wise semantic labeling are called Background and not Results.                mantic parsing over natural language instructions. Within scholarly
As another sentence-based scheme is the Argument Zoning (AZ)                 articles, such instructions are typically published in the Materials
scheme [48]. This scheme aimed at modeling the rhetorics around              and Method section in Biology and Chemistry fields.
knowledge claims between the current work and cited work. They                   Along similar lines, inorganic materials synthesis reactions and
used semantic classes as “Own_Method,” “Own_Result,” “Other,”                procedures continue to reside as natural language descriptions in
“Previous_Own,” “Aim,” etc., each elaborating on the rhetorical path         the text of journal articles. There is a growing impetus in such
to various knowledge claims. This latter scheme was apt for citation         fields to find ways to systematically reduce the time and effort
summaries, sentiment analysis and the extraction of information              required to synthesize novel materials that presently remains one
pertaining to knowledge claims. In general, such complementary               of the grand challenges in the field. In [26, 36], to facilitate machine
aims for the sentence-based semantification of scholarly publica-            learning models for automatic extraction of materials syntheses
tions can be fused to generate more comprehensive summaries.                 from text, they present datasets of synthesis procedures annotated
                                                                             with semantic structure by domain experts in Materials Science.
    Phrase-based Annotations of Scholarly Publications. The
                                                                             The types of information captured include synthesis operations
trend towards scientific terminology mining methods in NLP steered
                                                                             (i.e. predicates), and the materials, conditions, apparatus and other
the release of phrase-based annotated datasets in various domains.
                                                                             entities participating in each synthesis step.
An early dataset in this line of work was the ACL RD-TEC cor-
                                                                                 The NLPContributions annotation methodology proposed in
pus [22] which identified seven conceptual classes for terms in the
                                                                             this paper draws on each of the earlier categorizations of related
full-text of scholarly publications in Computational Linguistics,
                                                                             work. First, the full-text of scholarly articles including the Title
viz. Technology and Method; Tool and Library; Language Resource;
                                                                             and the Abstract are annotated in a sentence-wise granularity with
Language Resource Product; Models; Measures and Measurements;
                                                                             the aim of the annotated sentences being only those restricted to
and Other. Similar to terminology mining is the task of scientific
                                                                             the contributions of the investigation. We selectively consider the
keyphrase extraction. Extracting keyphrases is an important task
                                                                             full-text of the article by focusing only on specific sections of the
in publishing platforms as they help recommend articles to read-
                                                                             article such as the Abstract, Introduction, and the Results sections.
ers, highlight missing citations to authors, identify potential re-
                                                                             Sometimes we also model the contribution highlights from the
viewers for submissions, and analyse research trends over time.
                                                                             Approach/System description in case if the Introduction does not
Scientific keyphrases, in particular, of type Processes, Tasks and
                                                                             contain such pertinent information of the proposed model. We skip
Materials were the focus of the SemEval17 corpus annotations [5].
                                                                             the Background, Related Work, and Conclusion sections altogether.
The dataset comprised annotations of the full text articles in Com-
                                                                             These sentences are then grouped under one of ten main informa-
puter Science, Material Sciences, and Physics. Following suit was
                                                                             tion units, viz. ResearchProblem, Objective, Approach, Tasks,
the SciERC corpus [33] of annotated abstracts from the Artificial
                                                                             ExperimentalSetup, Hyperparameters, Baselines, Results, and
Intelligence domain. It included annotations for six concepts, viz.
                                                                             AblationAnalysis. Each of these units are defined in detail in
Task, Method, Metric, Material, Other-Scientific Term, and Generic.
                                                                             the next section. Second, from the grouped contribution-centered
Finally, in the realm of corpora having phrase-based annotations,
                                                                             sentences, we perform phrase-based annotations for (subject, predi-
was the recently introduced STEM-ECR corpus [16] notable for its
                                                                             cate, object) triples to model in a knowledge graph. And Third, the
multidisciplinarity including the Science, Technology, Engineering,
                                                                             resulting dataset has an overarching knowledge capture objective:
and Medicine domains. It was annotated with four generic concept
                                                                             capturing the contribution of the scholarly article and, in particular,
types, viz. Process, Method, Material, and Data that mapped across
                                                                             to facilitate the training of machine readers for the purpose along
all domains, and further with terms grounded in the real-world via
                                                                             the lines of the machine-interpretable wet-lab protocols.
Wikipedia/Wiktionary links.
    Next, we discuss related works that semantically model instruc-
tional scientific content. In these works, the overarching scientific        3 THE NLPCONTRIBUTIONS MODEL
knowledge capture theme is the end-to-end semantification of an
                                                                             3.1 Goals
experimental process.
                                                                             The development of the NLPContributions annotation model was
   Shallow Semantic Structural Annotations of Instructional                  backed by four primary goals:
Content in Scholarly Publications. Increasingly, text mining ini-
tiatives are seeking out recipes or formulaic semantic patterns to              (1) We aim to produce a semantic representation based on ex-
automatically mine machine-actionable information from scholarly                    isting work, that can be well motivated as an annotation
articles [26–28, 36].                                                               scheme for the application domain of NLP-ML scholarly ar-
   In [27], they annotate wet lab protocols, covering a large spec-                 ticles, and is specifically aimed at the knowledge capture of
trum of experimental biology, including neurology, epigenetics,                     the contributions in scholarly articles;


                                                                        18
                          EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China                                                                                           D’Souza and Auer


    (2) The annotated scholarly contributions based on NLPContri-
        butions should be integrable in the Open Research Knowl-
        edge Graph (ORKG)1 –the state-of-the-art content-based knowl-
        edge capturing platform of scholarly articles’ contributions.
    (3) The NLPContributions model should be useful to produce
        data for the development of machine learning models in
        the form of machine readers [18] of scholarly contributions.
        Such trained models can serve to automatically extract such
        structured information for downstream applications, either
        in completely automated or semi-automated workflows as
        recommenders.2
    (4) The NLPContributions model should be amenable to feed-
        back via a consensus approval or content annotation change
        suggestions from a large group of authors toward their schol-
        arly article contribution descriptions (an experiment that
        is beyond the scope of the present work and planned as
        following work).
   The NLPContributions annotation model is designed for build-
ing a knowledge graph. It is not ontologized, therefore, we assume                              Figure 1: Fine-grained modeling illustration from a single
a bottom-up data-driven design toward ontology discovery as more                                sentence for part of an Approach proposed in [9].
annotated contributions data is available. Nonetheless, we do pro-
pose a core skeleton model for organizing the information at the
top-level KG nodes. This involves a root node called Contribu-                                  ResearchProblem did not involve semantic annotation granularity
tion, following which, at the first level of the knowledge graph,                               beyond one level, annotating the Approach can. Sometimes the
are ten nodes representing core information units under which the                               annotations (one or multi-layered) are created using the elements
scholarly contributions data is organized.                                                      within a single sentence itself (see Figure 1); at other times, if they
                                                                                                are multi-layered semantic annotations, they are formed by bridg-
3.2      The Ten Core Information Units                                                         ing two or more sentences based on their coreference relations.
In this section, we describe the ten information units in our model.                            For the annotation element content itself, while, in general, the
                                                                                                subject, predicate, and object phrases are obtained directly from the
   ResearchProblem. Determines the research challenge ad-                                       sentence text, at times the predicate phrases have to be introduced
dressed by a contribution using the predicate hasResearchProblem.                               as generic terms such as “has” or “on” or “has description” wherein
By definition, it is the focus of the research investigation, in other                          the latter predicate is used for including, as objects, longer text
words, the issue for which the solution must be obtained.                                       fragments within a finer annotation granularity to describe the
   The task entails identifying only the research problem addressed                             top-level node. The actual type of approach is restricted to those
in the paper and not research problems in general. For instance,                                sub-types stated in the beginning of the paragraph and is decided
in the paper about the BioBERT word embeddings [30], their re-                                  based on the the reference to the solution used by the authors or
search problem is just the ‘domain-customization of BERT’ and not                               the solution description section name itself. If the reference to the
‘biomedical text mining,’ since it is a secondary objective.                                    solution or its section name is specific to the paper, such as ‘Joint
   The ResearchProblem is typically found in an article’s Title,                                model,’ then we rename it to just ‘Model.’ In general, any alternate
Abstract and first few paragraphs of the Introduction. The task in-                             namings of the solution, other than those mentioned earlier, includ-
volves annotating one or more sentences and precisely the research                              ing “idea”, are normalized to “Model.” Finally, as machine learning
problem phrase boundaries in the sentences.                                                     solutions, they are often given names. E.g., the model BioBERT [30],
   The subsequent seven information objects are connected to Con-                               in which case we introduce the predicate ‘called,’ as in (Method,
tribution via the generic predicate has.                                                        called, BioBERT).
   Approach. Depending on the paper’s content, is referred to as                                   The Approach is found in the article’s Introduction section in the
Model or Method or Architecture or System or Application.                                       context of cue phrases such as “we take the approach,” “we propose
Essentially, this is the contribution of the paper as the solution                              the model,” “our system architecture,” or “the method proposed in this
proposed for the research problem.                                                              paper.” However, there are exceptions when the Introduction does
   The annotations are made only for the high-level overview of the                             not present an overview of the system, in which case we analyze
approach without going into system details. Therefore, the equa-                                the first few lines within the main system description content in
tions associated with the model and all the system architecture                                 the article. Also, if the paper refers to their system by “method”
figures are not part of the annotations. While annotating the earlier                           or “application,” this is normalized to Approach information unit.
                                                                                                System or Architecture is Model information unit.
1 https://www.orkg.org/orkg/
2 In future work, we will expand our current pilot annotated dataset of 50 articles with           Objective. This is the defined function for the machine learning
at least 400 additional similarly annotated articles to facilitate machine learning.            algorithm to optimize over.


                                                                                           19
                     EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


NLPContributions: An Annotation Scheme                                                           EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China


   In some cases, the Approach objective is a complex function. In
such cases, it is isolated as a separate information object connected
directly to the Contribution.

   ExperimentalSetup. Has the alternate name Hyperparame-
ters. It includes details about the platform including both hardware
(e.g., GPU) and software (e.g., Tensorflow library) for implementing
the machine learning solution; and of variables, that determine
the network structure (e.g., number of hidden units) and how the
network is trained (e.g., learning rate), for tuning the software to
the task objective.
   Recent machine learning models are all neural based and such
models have several associated variables such as hidden units,
model regularization parameters, learning rate, word embedding
dimensions, etc. Thus to offer users a glance at the contributed sys-          Figure 2: Illustration of modeling of Result (from [53]) w.r.t.
tem, this aspect is included in NLPContributions. We only model                a precedence of its elements as [dataset -> task -> metric ->
the experimental setup that are expressed in a few sentences or                score].
that are concisely tabulated. There are cases when the experimental
setup is not modeled at all within NLPContributions. E.g., for the
complex “machine translation” models that involve many parame-
ters. Thus, whether the experimental setup should be modeled or
                                                                                  Experiments. Are an encompassing information unit that in-
not, may appear as a subjective decision, however, over the course
                                                                               cludes one or more of the earlier discussed units. Can include a
of several annotated articles becomes apparent especially when the
                                                                               combination of ExperimentalSetup and Results, or it can be com-
annotator begins to recognize the simple sentences that describe
                                                                               bination of lists of Tasks and their Results, or a combination of
the experimental setup.
                                                                               Approach, ExperimentalSetup and Results.
   The ExperimentalSetup unit is found in the sections called Ex-
                                                                                  Recently, more and more multitask systems are being developed.
periment, Experimental Setup, Implementation, Hyperparameters,
                                                                               Consider, the BERT model [15] as an example. Therefore, modeling
or Training.
                                                                               ExperimentalSetup with Results or Tasks with Results is nec-
    Results. Are the main findings or outcomes reported in the                 essary in such systems since the experimental setup often changes
article for the ResearchProblem.                                               per task producing a different set of results. Hence, this information
    Each Result unit involves some of the following elements: {dataset,        unit encompassing two or more sub information units is relevant.
metric, task, performance score}. Regardless of how the sentence(s)
are written involving these elements, we assume the following                      AblationAnalysis. Is a form of Results that describes the
precedence order: [dataset -> task -> metric -> score] or [task ->             performance of components in systems.
dataset -> metric -> score], as far as it can be applied without sig-              Unlike Results, AblationAnalysis is not performed in all pa-
nificantly changing the information in the sentence. Consider this             pers. Further, in papers that have them, we only model these results
illustrated in Figure 2. In the figure, the JSON is arranged starting          if they are expressed in a few sentences, similar to our modeling
at the dataset, followed by the task, then the metric, and finally             decision for Hyperparameters.
the actual reported result. While this information unit is named                   The AblationAnalysis information unit is found in the sections
per those stated in the earlier paragraph, if in a paper the section           that have Ablation in their title. Otherwise, it can also be found
name is non-generic, e.g., “Main results,” “End-to-end results,” it is         in the written text without having a dedicated section for it. For
normalized to a default name “Results.”                                        instance, in the paper “End-to-End Relation Extraction using LSTMs
    The Results unit is found in the Results, Experiments, or Tasks            on Sequences and Tree Structures” [35] there is no section title with
sections. While the results are often highlighted in the Introduction,         Ablation, but this information is extracted from the text via cue
unlike the Approach unit, in this case, we annotate the dedicated,             phrases that indicate ablation results are being discussed.
detailed section on Results because results constitute a primary
aspect of the contribution. Next we discuss the Tasks information                 Baselines. are those listed systems that a proposed approach
unit, and note that Results can include Tasks and vice versa as we             is compared against.
describe next.                                                                    The Baselines information unit is found in sections that have
                                                                               Baseline in their title. Otherwise, it can also be found in sections that
   Tasks. : The Approach or Model, particularly in multi-task set-             are not directly titled Baseline, but require annotator judgement
tings, are tested on more than one task, in which case, we list all the        to infer that baseline systems are being discussed. For instance,
experimental tasks. The experimental tasks are often synonymous                in the paper “Extracting Multiple-Relations in One-Pass with Pre-
with the experimental datasets since it is common in NLP for tasks             Trained Transformers,” [50] the baselines are discussed in subsec-
to be defined over datasets. Where lists of Tasks are concerned,               tion ‘Methods.’ Or in paper “Outrageously large neural networks:
the Tasks can include one or more of the ExperimentalSetup,                    The sparsely-gated mixture-of-experts layer,” [41], the baselines are
Hyperparameters, and Results as sub information units.                         discussed in a section called “Previous State-of-the-Art.”


                                                                          20
                       EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China                                                                        D’Souza and Auer


   Of these ten information units, only three are mandatory. They
are ResearchProblem, Approach, and Results; the other seven
may or may not be present depending on the content of the article.


  Code. is a link to the software on Github or on other similar
open source platforms, or even on author’s website.


3.3     Contribution Sequences within
        Information Units
Except for ResearchProblem, each of the remaining nine infor-
mation units encapsulate different aspects of the contributions
of scholarly investigations in the NLP-ML domain; with the Re-
searchProblem offering the primary contribution context. Within
the seven different aspects, there are what we call Contribution
Sequences.
   Here, with the help of an example depicted in Figure 3 we il-
lustrate the notion of contribution sequences. In this example, we
model contribution sequences in the context of the Experimen-
talSetup information unit. In the figure, this information unit has
two contribution sequences. The first connected by predicate ‘used’
to the object ‘BERTBase model,’ and the second, also connected
by predicate ‘used’ to the object ‘NVIDIA V100 (32GB) GPUs.’ The
‘BERTBase model’ contribution sequence includes a second level of
detail expressed via two different predicates ‘pre-trained for’ and
‘pre-trained on.’ As a model of scientific knowledge, the triple with
the entities connected by the first predicate, i.e. (BERTBase model,
pre-trained for, 1M steps) reflects that the ‘BertBase model’ was
pretrained for 1 million steps. The second predicate produces two
                                                                             Figure 3: Illustration of the modeling of Contribution Se-
triples: (BERTBase model, pre-trained on, English Wikipedia) and             quences in the Experimental Setup Information Unit
(BERTBase model, pre-trained on, BooksCorpus). In each case, the             (from [30]). Created using https://jsoneditoronline.org
scientific knowledge captured by these two triples is that BERTBase
was pretrained on {Wikipedia, BooksCorpus}. Note in the JSON
data structure, the predicate connects the two objects as an array.          4.1    Pilot Task Steps
Next, the second contribution sequence, hinged at ‘NVIDIA V100               (a) Contribution-Focused Sentence Annotations. In this stage,
(32GB) GPUs’ as the subject has two levels of granularity. Consider          sentences from scholarly articles were selected as candidate con-
the following three triples: (NVIDIA V100 (32GB) GPUs, used, ten)            tribution sentences under each of the aforementioned mandatory
and (ten, for, pre-training). Note, in this nesting pattern, except          three information units (viz., ResearchProblem, Approach, and
for ‘NVIDIA V100 (32 GB) GPUs,’ the predicates {used, for} and               Results) and, if applicable to the article, for one or more of the
remaining entities {ten, pre-training} are nested according to their         remaining seven information units as well.
order of appearance in the written text. Therefore, in conclusion,              To identify the contribution sentences in the article, the full-text
an information unit can have several contribution sequences, and             of the article is searched. However, as discussed at the end of Sec-
the contribution sequences need not be identically modeled. For              tion 2, the Background, Related Work, and Conclusions sections are
instance, our second contribution sequence is modeled in a fine              entirely omitted from the search. Further, the section discussing the
grained manner, i.e. in multiple levels. And when fine-grained mod-          Approach or the System is only referred to when the Introduction
eling is employed, it is relatively straightforward to spot in the           section does not offer sufficient highlights of this information unit.
sentence(s) being modeled.                                                   In addition, except for tabulated hyperparameters, we do not con-
                                                                             sider other tables for annotation within the NLPContributions
                                                                             model.
4     THE PILOT ANNOTATION TASK                                                 To better clarify the pilot task process, in this subsection, we use
The pilot annotation task was performed by a postdoctoral re-                Figure 2 as the running example. From the example, at this stage,
searcher with a background in natural language processing. The               the sentence “For NER (Table 7), S-LSTM gives an F1-score of 91.57%
NLPContributions model or scheme just described, were devel-                 on the CoNLL test set, which is significantly better compared with
oped over the course of the pilot task. At a high-level, the annota-         BiLSTMs.” is selected as one of the contribution sentence candidates
tions were performed in three main steps. They are presented next,           as part of the Results information unit. This sentence is selected
after which we describe the annotation guidelines.                           from a Results subsection in [53], but is just one among three others.


                                                                        21
                     EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


NLPContributions: An Annotation Scheme                                                            EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China


(b) Chunking Phrase Spans for Subject, Predicate, Object En-                     sentence is either put as an array element, or as a nested dictionary
tities. Then for the selected sentences, we annotate their scientific            element.
knowledge entities. The entities are annotated by annotators hav-                Are the nested contribution sequences always obtained from
ing an implicit understanding of whether they take the subject,                  a single sentence? The triples can be nested based on information
predicate, or object roles in a per triple context. As a note, by our            from one or more sentences in the article. Further, the sentences
annotation scheme, predicates are not mandatorily verbs and can                  need not be consecutive in the running text. As mentioned earlier,
be nouns as well.                                                                the evidence sentences are attached to the first element or the
   Resorting to our running example, for the selected sentence,                  second element by the predicate “from sentence.” If a contribution
this stage involves annotating the phrases “For,” “NER,” “F1-score,”             sequence is generated from a table then the table number in the
“91.57%,” and “CoNLL test set,” with the annotator cognizant of                  original paper is referenced.
the fact that they will use the [dataset -> task -> metric -> score]             When is the Approach actually modeled from the dedicated
scientific entity precedence in the next step.                                   section as opposed to the Introduction? In general, we avoid an-
(c) Creating contribution sequences. This involves relating the                  notating the Approach or Model sections for their contribution
subjects and objects within triples, which as illustrated in Section             sentences as they tend to delve deeply into the approach or model
3.3, the object in one triple can be a subject in another triple if              details, and involve complicated elements such as equations, etc.
the annotation is performed at a fine-grained level of detail. For               Instead, we restrict ourselves to the system higlights in the Intro-
the most part, the nesting is done per order of appearance of the                duction. However, in some articles the Introduction doesn’t offer
entities in the text, except for those involving the scientific entities         system highlights which is when we resort to using the dedicated
{dataset, task, metric, score} under the Results information unit.               section for the contribution highlights in this mandatory informa-
   In the context of our running example, given the early annotated              tion unit.
scientific entities, in this stage, the annotator will form the following        Do we explore details about hardware used as part of the con-
two triples: (CoNLL test set, For, NER), (NER, F1-score, 91.57%) as a            tribution? Yes, if it is explicitly part of the hyperparameters.
single contribution sequence. What is not depicted in Figure 1 are               Are predicates always verbs? Predicates are not always verbs.
the top-level annotations including the root node and one of the ten             They can also be nouns especially in the hyperparameters section.
information unit nodes. This is modeled as follows: (Contribution,               Creating contribution sequences from tabulated hyperparam-
has, Results), and (Results, has, CoNLL test set).                               eters. Only for hyperparameters, we model their tabulated version
                                                                                 if given. This is done as follows: 1) for the predicate, we use the
                                                                                 name of the parameter; and 2) for the object, the value against
                                                                                 the name. Sometimes, however, if there are two-level hierarchical
4.2    Task Guidelines                                                           parameters, then the predicate is the first name, object is the value,
In this section, we elicit a set of general guidelines that inform the           and the value is qualified by the parameter name lower in the hier-
annotation task.                                                                 archy. Qualifying the second name involves introducing the “for”
How are information unit names selected? For information units                   predicate.
such as Approach, ExperimentalSetup, and Results that each                       How are lists modeled within contribution sequences? As part
have a set of candidate names, the applied name is the one selected              of the contribution sentence candidates, are also included sentences
based on the closest section title or cue phrase.                                with lists. Such sentences are predominantly found for the Exper-
Which of the ten information units does the sentence belong                      imentalSetup or Result information units. This is modeled as
to? Conversely to the above, if a sentence is first identified as a              depicted in Figure 4 for the first two list elements. Here, the Model
contribution sentence candidate, it is placed within the information             information unit has two contribution sequences, each pertaining
unit category that is identified directly based on the section header            to a specific list item in the sentence. Further, the predicate “has
for the sentence in the paper or inferred from cue phrases from the              description” is introduced for linking text descriptions.
first few sentences in its section.                                              Which JSON structures are used to represent the data? Flexibly,
Inferring Predicates. In ideal settings, the constraint on the text              they include dictionaries, or nested dictionaries, or arrays of items,
used for subjects, objects, and predicates in contribution sequences             where the items can be strings, dictionaries, nested dictionaries, or
is that they should be found in their corresponding sentence. How-               arrays themselves.
ever, for predicates this is not always possible. Since predicate                How are appositives handled? We introduce a new predicate
information may not always be found in the text, it is sometimes                 “name” to handle appositives.
annotated additionally based on the annotator judgment. However,
even this open-ended choice remains restricted to a predefined set               5 MATERIALS AND TOOLS
of candidates. It includes {“has”, “on”, “by”, “for”, “has value”, “has
description”, “based on”, “called”}.                                             5.1 Paper Selection
How are the supporting sentences linked to their correspond-                     A collection of scholarly articles is downloaded based on the ones in
ing contribution sequence within the overall JSON object? The                    the publicly available leaderboard of tasks in artificial intelligence
sentence(s) is stored in a dictionary with a “from sentence” key,                called https://paperswithcode.com/. It predominantly represents
which is then attached to either the first element or, if it is a nested         papers in the Natural Language Processing and Computer Vision
triples hierarchy, sometimes even to the second element of a contri-             fields. For the purposes of our NLPContributions model, we re-
bution sequence. The dictionary data-type containing the evidence                strict ourselves just to the NLP papers. From the set, we randomly


                                                                            22
                       EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China                                                                          D’Souza and Auer


                                                                               5.2    Data Representation Format and
                                                                                      Annotation Tools
                                                                               JSON was the chosen data format for storing the semantified parts
                                                                               of the scholarly articles contributions. To avoid syntax errors in
                                                                               creating the JSON objects, the annotations were made via https:
                                                                               //jsoneditoronline.org which imposes valid JSON syntax checks.
                                                                               Finally, in the early stages of the annotation task, some of the an-
                                                                               notations were made manually in the ORKG infrastructure https:
                                                                               //www.orkg.org/orkg/ to test their practical suitability in a knowl-
                                                                               edge graph; three of such annotated papers are depicted in Figure 6.
                                                                               The links in the Figure captions can be visited to explore the anno-
                                                                               tations at their finer granularity of detail.

                                                                               5.3    Annotated Dataset Characteristics
                                                                               Overall, the annotated corpus contains a total of 2631 triples (avg.
                                                                               of 52 triples per article). Its data elements comprise 1033 unique
                                                                               subjects, 843 unique predicates, and 2182 unique objects. In Table 1
                                                                               below, we show the per-task distribution of triples and their ele-
                                                                               ments. Of all tasks, relation classification has the highest number
                                                                               of unique triples (544) and named entity recognition the least (473).
                                                                                  Generally, in the context of triples formation, predicates are
                                                                               often selected from a closed set and hence comprise a smaller group
Figure 4: Illustration of the modeling of a sentence with a list               of items. In the NLPContribution model, however, predicates
as part of the Model Information Unit (from [29]). Created                     are extracted from the text if present. This leads to a much larger
using https://jsoneditoronline.org                                             set of predicates that would require the application of predicate
                                                                               normalization functions to find the smaller core semantic set. In
                                                                               Figure 5, to offer some insights to this end, we show the predicates
select 10 papers in five different NLP-ML research tasks: 1. machine           that appear more than 15 times over all the triples. We find the
translation, 2. named entity recognition, 3. question answering, 4.            predicate has appears most frequently since its function often serves
relation classification, and 5. text classification.                           as a filler predicate. A complete list of the predicates is released in
                                                                               our dataset repository online https://doi.org/10.25835/0019761.

                                                                                                       MT NER QA RC TC
                      achieves         15                                                 Subject      259 209      203 228 221
                      propose          17
                                                                                          Predicate 243 220         187 201 252
                          used         18
                         from           24                                                Object       471 434      515 455 459
                              is        24                                                Total        502 473      497 544 504
                             to         28                                     Table 1: Per-task (machine translation (MT), named entity
                         using           32                                    recognition (NER), question answering (QA), relation classi-
                        called            36                                   fication (RC), text classification (TC)) triples distribution in
               has description            38                                   terms of unique subject, predicate, object, and overall.
                  outperforms             39
                             of           41
                             in            42
                           use             47
                from sentence               53                                 6     USE CASE: NLPCONTRIBUTIONS IN ORKG
                             by             54                                 As a use case of the ORKG infrastructure, instead of presenting just
                          with                   82
                            for
                                                                               the annotations obtained from NLPContributions, we present a
                                                  90
                            on                         119                     further enriched showcase. Specifically, we model the evolution of
         has research problem                                159               the annotation scheme at three different attempts with the third one
                           has                                      251        arriving at NLPContributions. This is depicted in Figure 6. Our
                                                                               use case is an enriched one for two reasons: 1) it depicts the ORKG
                                   0         100              200
                                                                               infrastructure flexibility for data-driven ontology discovery that
                                       Frequency of Occurrence
                                                                               makes allowances for different design decisions; and 2) it also shows
                                                                               how within flexible infrastructures the possibilities can be too wide
Figure 5: A list of the predicates in our triples dataset that                 that arriving at a consensus can potentially prove a challenge if it
appear more than 15 times.                                                     isn’t mandated at a critical point in the data accumulation.


                                                                          23
                     EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


NLPContributions: An Annotation Scheme                                                                  EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China


                                   (a) Research paper [45] top-level snapshot in ORKG https://www.orkg.org/orkg/paper/R41467/


                                   (b) Research paper [21] top-level snapshot in ORKG https://www.orkg.org/orkg/paper/R41374


                                   (c) Research paper [54] top-level snapshot in ORKG https://www.orkg.org/orkg/paper/R44287


Figure 6: Figures 6(a),6(b),6(c) depict evolution of the annotation scheme over three different research papers. Fig. 6(c) is the
resulting selected format NLPContributions that is proposed in this paper.


                                                                              24
                       EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China                                                                                     D’Souza and Auer


   Figure 6(a) depicts the first modeling attempts of an NLP-ML                 others. E.g., information units such as ResearchProblem, Experi-
contribution. For predicates, the model restricts itself to use only            mentalSetup, Results, and Baselines are readily amenable for
those found in the text. The limitation of such a model is that not             systematic templates discovery toward their structured modeling
normalizing linguistic variations very rarely creates comparable                within the ORKG; whereas the remaining information units, espe-
models across investigations even if they imply the same thing.                 cially Approach or Model, will require additional normalization
Hence, we found that for comparability a common predicate vocab-                steps toward the search for their better structuring.
ulary at the top-level in the model minimally needs to be in place.
Figure 6(b) is the second attempt of modeling a different NLP-ML
contribution. In this attempt, the predicates at the top-level are
                                                                                9    CONCLUSIONS AND FUTURE DIRECTIONS
mostly normalized to a generic “has,” however, “has” is connected               The Open Research Knowledge Graph [3] makes scholarly knowl-
to various information items again lexically based on the text of               edge about research contributions machine-actionable: i.e. findable,
the scholarly articles, one or more of which can be grouped under               structured, and comparable. Manually building such a knowledge
a common category. Via such observations, we systematized the                   graph is time-consuming and requires the expertise of paper au-
knowledge organization at the top-level of the graph by introducing             thors and domain experts. In order to efficiently build a scholarly
the ten information unit nodes. Figure 6(c) is the resulting NLP-               knowledge contributions graph, we will leverage the technology of
Contributions annotations model. Within this model, scholarly                   machine readers [18] to assist the user in annotating scholarly arti-
contributions with one or more of the information units in common,              cle contributions. But the machine readers will need to be trained
viz. “Ablation study,” “Baseline Models,” “Model,” and “Results,” can           for such a task objective. To this end, in this work, we have proposed
be uniformly compared.                                                          an annotation scheme for capturing the contributions in natural
                                                                                language processing scholarly articles, in order to create such train-
                                                                                ing datasets for machine readers. In addition, we also provide a
7    LIMITATIONS                                                                set of 50 annotated articles by the NLPContributions scheme
Obtaining disjoint (subject, predicate, object) triples as con-                 as a practical demonstration of feasibility of the annotation task.
tribution sequences. It was not possible to extract disjoint triples            However, for the training of machine learning models in future
from all sentences. In many cases, we extract the main predicate and            work we will release a larger dataset annotated by the proposed
use as object the relevant full sentence or its clausal part. From [30],        scheme. To facilitate future research, our pilot dataset is released
for instance, under the ExperimentalResults information unit,                   online at https://doi.org/10.25835/0019761.
we model the following: (Contribution, has, Experimental results);                  Finally, aligned with the initiatives within research communities
(Experimental results, on, all datasets); and (all datasets, achieves,          to build the Internet of FAIR Data and Services (IFDS) [6], the data
BioBERT achieves higher scores than BERT). Note, in the last triple,            within ORKG are compliant [38] with such FAIR data principles [52]
“achieves” was used as a predicate and its object “BioBERT achieves             thus making them Findable, Accessible, Interoperable and Reusable.
higher scores than BERT” is modeled as a clausal sentence part.                 Since the dataset we annotate by our proposed scheme is designed
Employing coreference relations between scientific entities. In                 to be ORKG-compliant, we adopt the cutting-edge standard of data
the fine-grained modeling of schemas, scientific entities within                creation within the research community.
triples are sometimes nested across sentences by leveraging their                   Nevertheless, the NLPContribution model is a surface seman-
coreference relations. We consider this a limitation toward the                 tic structuring scheme for the contributions in unstructured text. To
automated machine reading task, since coreference resolution itself             realize a full-fledged machine-actionable and inferenceable knowl-
is often challenging to perform automatically.                                  edge graph of scholarly contributions, as future directions, there are
Tabulated results are not incorporated within NLPContribu-                      a few IE modules that would need to be improved or added. They
tions. Unlike tabulated hyperparameters which have a standard                   are (1) improving the PDF parser to produce less noisy output; (2)
format, tabulated results have significantly varying formats. Thus              incorporating an entity and relation linking and normalization mod-
their automated table parsing is a challenging task in itself. Nonethe-         ule; (3) merging phrases from the unstructured text with known
less, by considering the textual results, we relegate ourselves to              ontologies (e.g., the MEX vocabulary [17]) to align resources and
their summarized description, which often serves sufficient for                 thus ensure data interoperability and reusability; and (4) extending
highlighting the contribution.                                                  the model to more scholarly disciplines and domains.
Can all NLP-ML papers be modeled by NLPContributions?
While we can conclude that some papers are easier to model than                 REFERENCES
others (e.g., articles addressing ‘relation extraction’ vs. ‘machine             [1] Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Craw-
translation’ which are harder), it is possible that all papers can be                ford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu
modelled by at least some if not all the information units of the                    Ha, et al. 2018. Construction of the Literature Graph in Semantic Scholar. In
                                                                                     NAACL, Volume 3 (Industry Papers). 84–91.
model we propose.                                                                [2] Waleed Ammar, Matthew E. Peters, Chandra Bhagavatula, and Russell Power.
                                                                                     2017. The AI2 system at SemEval-2017 Task 10 (ScienceIE): semi-supervised
                                                                                     end-to-end entity and relation extraction. In SemEval@ACL.
                                                                                 [3] Sören Auer. 2018. Towards an Open Research Knowledge Graph. https://doi.
8    DISCUSSION                                                                      org/10.5281/zenodo.1157185
From the pilot dataset annotation exercise, we note the following                [4] Sören Auer, Viktor Kovtun, Manuel Prinz, Anna Kasprzik, Markus Stocker, and
                                                                                     Maria Esther Vidal. 2018. Towards a knowledge graph for science. In Proceedings
regarding task practically. Knowledge modeled under some infor-                      of the 8th International Conference on Web Intelligence, Mining and Semantics.
mation units are more amenable to systematic structuring than                        1–6.


                                                                           25
                         EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


NLPContributions: An Annotation Scheme                                                                               EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China


 [5] Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, and An-              [30] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim,
     drew McCallum. 2017. SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases                   Chan Ho So, and Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical language
     and Relations from Scientific Publications. In SemEval@ACL.                                    representation model for biomedical text mining. Bioinformatics 36, 4 (2020),
 [6] Paul Ayris, Jean-Yves Berthou, Rachel Bruce, Stefanie Lindstaedt, Anna Monreale,               1234–1240.
     Barend Mons, Yasuhiro Murayama, Caj Södergård, Klaus Tochtermann, and                     [31] Maria Liakata, Shyamasree Saha, Simon Dobnik, Colin Batchelor, and Dietrich
     Ross Wilkinson. 2016. Realising the European open science cloud. Luxembourg.                   Rebholz-Schuhmann. 2012. Automatic recognition of conceptualization zones in
     https://doi.org/10.2777/940154                                                                 scientific articles and two life science applications. Bioinformatics 28, 7 (2012),
 [7] Niranjan Balasubramanian, Stephen Soderland, Oren Etzioni, et al. 2013. Gener-                 991–1000.
     ating coherent event schemas at scale. In EMNLP. 1721–1731.                               [32] Maria Liakata, Simone Teufel, Advaith Siddharthan, and Colin R. Batchelor. 2010.
 [8] Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A pretrained language                     Corpora for the Conceptualisation and Zoning of Scientific Papers. In LREC.
     model for scientific text. In EMNLP-IJCNLP. 3606–3611.                                    [33] Yi Luan, Luheng He, Mari Ostendorf, and Hannaneh Hajishirzi. 2018. Multi-Task
 [9] Antoine Bordes, Jason Weston, and Nicolas Usunier. 2014. Open question an-                     Identification of Entities, Relations, and Coreference for Scientific Knowledge
     swering with weakly supervised embedding models. In Joint European conference                  Graph Construction. In EMNLP.
     on machine learning and knowledge discovery in databases. 165–180.                        [34] Yi Luan, Mari Ostendorf, and Hannaneh Hajishirzi. 2017. Scientific information
[10] Arthur Brack, Jennifer D’Souza, Anett Hoppe, Sören Auer, and Ralph Ewerth.                     extraction with semi-supervised neural tagging. arXiv preprint arXiv:1708.06075
     2020. Domain-Independent Extraction of Scientific Concepts from Research                       (2017).
     Articles. In Advances in Information Retrieval. Springer International Publishing,        [35] Makoto Miwa and Mohit Bansal. 2016. End-to-End Relation Extraction using
     251–266.                                                                                       LSTMs on Sequences and Tree Structures. In Proceedings of the 54th ACL (Volume
[11] Nathanael Chambers. 2013. Event schema induction with a probabilistic entity-                  1: Long Papers). 1105–1116.
     driven model. In EMNLP. 1797–1807.                                                        [36] Sheshera Mysore, Zachary Jensen, Edward Kim, Kevin Huang, Haw-Shiuan
[12] Nathanael Chambers and Dan Jurafsky. 2008. Unsupervised learning of narrative                  Chang, Emma Strubell, Jeffrey Flanigan, Andrew McCallum, and Elsa Olivetti.
     event chains. In Proceedings of ACL-08: HLT. 789–797.                                          2019. The Materials Science Procedural Text Corpus: Annotating Materials
[13] Nathanael Chambers and Dan Jurafsky. 2009. Unsupervised learning of narrative                  Synthesis Procedures with Shallow Semantic Structures. In Proceedings of the
     schemas and their participants. In Proceedings of the Joint Conference of the 47th             13th Linguistic Annotation Workshop. 56–64.
     Annual Meeting of the ACL and the 4th International Joint Conference on Natural           [37] Allard Oelen, Mohamad Yaser Jaradeh, Kheir Eddine Farfar, Markus Stocker, and
     Language Processing of the AFNLP: Volume 2-Volume 2. 602–610.                                  Sören Auer. 2019. Comparing Research Contributions in a Scholarly Knowledge
[14] Alexandru Constantin, Silvio Peroni, Steve Pettifer, David Shotton, and Fabio                  Graph. In K-CAP 2019. 21–26.
     Vitali. 2016. The document components ontology (DoCO). Semantic web 7, 2                  [38] A. Oelen, M. Y. Jaradeh, M. Stocker, and S. Auer. 2020. Generate FAIR Literature
     (2016), 167–181.                                                                               Surveys with Scholarly Knowledge Graphs.
[15] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT:             [39] Vayianos Pertsas and Panos Constantopoulos. 2017. Scholarly Ontology: mod-
     Pre-training of Deep Bidirectional Transformers for Language Understanding. In                 elling scholarly practices. International Journal on Digital Libraries 18, 3 (2017),
     Proceedings of NAACL, Volume 1 (Long and Short Papers). Minneapolis, Minnesota,                173–190.
     4171–4186.                                                                                [40] Roger C Schank and Robert P Abelson. 1977. Scripts, plans, goals and under-
[16] Jennifer D’Souza, Anett Hoppe, Arthur Brack, Mohmad Yaser Jaradeh, Sören                       standing: An inquiry into human knowledge structures. (1977).
     Auer, and Ralph Ewerth. 2020. The STEM-ECR Dataset: Grounding Scientific                  [41] Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le,
     Entity References in STEM Scholarly Content to Authoritative Encyclopedic and                  Geoffrey Hinton, and Jeff Dean. 2017. Outrageously large neural networks: The
     Lexicographic Sources. In LREC. Marseille, France, 2192–2203.                                  sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538 (2017).
[17] Diego Esteves, Diego Moussallem, Ciro Baron Neto, Tommaso Soru, Ricardo                   [42] Dan Simonson and Anthony Davis. 2015. Interactions between narrative schemas
     Usbeck, Markus Ackermann, and Jens Lehmann. 2015. MEX vocabulary: a                            and document categories. In Proceedings of the First Workshop on Computing News
     lightweight interchange format for machine learning experiments. In Proceedings                Storylines. 1–10.
     of the 11th International Conference on Semantic Systems. 169–176.                        [43] Dan Simonson and Anthony Davis. 2016. NASTEA: Investigating narrative
[18] Oren Etzioni, Michele Banko, and Michael J Cafarella. 2006. Machine Reading..                  schemas through annotated entities. In Proceedings of the 2nd Workshop on Com-
     In AAAI, Vol. 6. 1517–1519.                                                                    puting News Storylines. 57–66.
[19] Said Fathalla, Sahar Vahdati, Sören Auer, and Christoph Lange. 2017. Towards a            [44] Dan Simonson and Anthony Davis. 2018. Narrative Schema Stability in News Text.
     knowledge graph representing research findings by semantifying survey articles.                In Proceedings of the 27th International Conference on Computational Linguistics.
     In TPDL. Springer, 315–327.                                                                    3670–3680.
[20] Beatríz Fisas, Francesco Ronzano, and Horacio Saggion. 2016. A Multi-Layered              [45] Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski.
     Annotated Corpus of Scientific Papers. In LREC.                                                2019. Matching the Blanks: Distributional Similarity for Relation Learning. In
[21] Zhijiang Guo, Yan Zhang, and Wei Lu. 2019. Attention Guided Graph Convolu-                     ACL. 2895–2905.
     tional Networks for Relation Extraction. In ACL. 241–251.                                 [46] Larisa N. Soldatova and Ross D. King. 2006. An ontology of scientific experiments.
[22] Siegfried Handschuh and Behrang QasemiZadeh. 2014. The ACL RD-TEC: a                           Journal of the Royal Society, Interface 3 11 (2006), 795–803.
     dataset for benchmarking terminology extraction and classification in computa-            [47] Simone Teufel, Jean Carletta, and Marc Moens. 1999. An annotation scheme for
     tional linguistics. In COLING 2014: 4th international workshop on computational                discourse-level argumentation in research articles. In Proceedings of the ninth
     terminology.                                                                                   conference on European chapter of ACL. 110–117.
[23] John PA Ioannidis. 2016. The mass production of redundant, misleading, and                [48] Simone Teufel, Advaith Siddharthan, and Colin Batchelor. 2009. Towards
     conflicted systematic reviews and meta-analyses. The Milbank Quarterly 94, 3                   discipline-independent argumentative zoning: evidence from chemistry and
     (2016), 485–514.                                                                               computational linguistics. In EMNLP: Volume 3. 1493–1502.
[24] Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jen-              [49] Lars Vogt, Jennifer D’Souza, Markus Stocker, and Sören Auer. 2020. Toward
     nifer D’Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. 2019. Open                      Representing Research Contributions in Scholarly Knowledge Graphs Using
     Research Knowledge Graph: Next Generation Infrastructure for Semantic Schol-                   Knowledge Graph Cells. In JCDL ’20, August 1–5, 2020, Virtual Event, China.
     arly Knowledge. In KCAP (Marina Del Rey, CA, USA). ACM, New York, NY, USA,                [50] Haoyu Wang, Ming Tan, Mo Yu, Shiyu Chang, Dakuo Wang, Kun Xu, Xiaoxiao
     243–246.                                                                                       Guo, and Saloni Potdar. 2019. Extracting Multiple-Relations in One-Pass with
[25] Arif E Jinha. 2010. Article 50 million: an estimate of the number of scholarly                 Pre-Trained Transformers. In Proceedings of the 57th ACL. 1371–1377.
     articles in existence. Learned Publishing 23, 3 (2010), 258–263.                          [51] Mark Ware and Michael Mabe. 2015. The STM Report: An overview of scientific
[26] Olga Kononova, Haoyan Huo, Tanjin He, Ziqin Rong, Tiago Botari, Wenhao Sun,                    and scholarly journal publishing. (03 2015).
     Vahe Tshitoyan, and Gerbrand Ceder. 2019. Text-mined dataset of inorganic                 [52] Mark D Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Apple-
     materials synthesis recipes. Scientific data 6, 1 (2019), 1–11.                                ton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino
[27] Chaitanya Kulkarni, Wei Xu, Alan Ritter, and Raghu Machiraju. 2018. An                         da Silva Santos, Philip E Bourne, et al. 2016. The FAIR Guiding Principles for
     Annotated Corpus for Machine Reading of Instructions in Wet Lab Proto-                         scientific data management and stewardship. Scientific data 3 (2016).
     cols. In NAACL: HLT, Volume 2 (Short Papers). New Orleans, Louisiana, 97–106.             [53] Yue Zhang, Qi Liu, and Linfeng Song. 2018. Sentence-State LSTM for Text
     https://doi.org/10.18653/v1/N18-2016                                                           Representation. In Proceedings of the 56th ACL (Volume 1: Long Papers). 317–327.
[28] Fusataka Kuniyoshi, Kohei Makino, Jun Ozawa, and Makoto Miwa. 2020. Annotat-              [54] Yuhao Zhang, Peng Qi, and Christopher D Manning. 2018. Graph Convolution
     ing and Extracting Synthesis Process of All-Solid-State Batteries from Scientific              over Pruned Dependency Trees Improves Relation Extraction. In EMNLP. 2205–
     Literature. In LREC. 1941–1950.                                                                2215.
[29] Joohong Lee, Sangwoo Seo, and Yong Suk Choi. 2019. Semantic Relation Clas-
     sification via Bidirectional LSTM Networks with Entity-aware Attention using
     Latent Entity Typing. arXiv preprint arXiv:1901.08163 (2019).


                                                                                          26
                       EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


EEKE 2020 @ JCDL ’20, August 1–5, 2020, Virtual Event, China                                                                       D’Souza and Auer


A     TWICE MODELING AGREEMENT                                              day and blind from the the first. While neither are incorrect, the sec-
In general, even if the annotations are performed by a single anno-         ond has taken the least annotated information route possibly due to
tator, there will be an annotation discrepancy. Compare the same            annotator fatigue, hence a two-pass methodology is recommended.
information unit “Experimental Setup” modeled in Figure 7 below
versus that modeled in Figure 3. Fig. 7 was the first annotation at-
tempt and includes the second attempted model, done on a different


Figure 7: Illustration of modeling of Contribution Sequences in the Experimental Setup Information Unit (from [30]) in a
first annotation attempt. Contrast with second attempt depicted in Figure 3 in the main paper content.


                                                                       27