=Paper=
{{Paper
|id=Vol-3164/paper16
|storemode=property
|title=Extraction of Competing Models using Distant Supervision and Graph Ranking
|pdfUrl=https://ceur-ws.org/Vol-3164/paper16.pdf
|volume=Vol-3164
|authors=Swayatta Daw,Vikram Pudi
|dblpUrl=https://dblp.org/rec/conf/aaai/DawP22
}}
==Extraction of Competing Models using Distant Supervision and Graph Ranking==
Extraction of Competing Models using Distant Supervision
and Graph Ranking
Swayatta Daw, Vikram Pudi
Data Sciences and Analytics Center
IIIT Hyderabad, India
swayatta.daw@research.iiit.ac.in, vikram@iiit.ac.in
Abstract
We introduce the task of detection of competing model entities from scientific documents. We define competing models
as those models that solve a particular task that is investigated in the target research document. The task is challenging
due to the fact that contextual information is required from the entire target document to predict the model entities. Hence,
traditional sequence labelling approaches fail in such settings. Furthermore, model entities themselves are long-tailed in
nature, i.e, their prevalence in scientific literature is limited, along with a scarcity of labelled data for training supervised
learning techniques. To address the above bottlenecks, we combine an Unsupervised Graph Ranking algorithm with a
SciBERT-CRF based sequence labeller to predict the entities. We introduce a strong baseline using the above mentioned
pipeline. Also, to address the label scarcity of long-tailed model entities, we use distant supervision leveraging an external
Knowledge Base (KB) to generate synthetic training data. We address the problem of overfitting in small sized datasets for
supervised NER baselines using a simple entity replacement technique. We introduce this model as part of a starting point for
an end-to-end automated framework to extract relevant model names and link them with their respective cited papers from
research documents. We believe this task will serve as an important starting point to map the research landscape of computer
science in a scalable manner, needing minimal human intervention. The code and dataset is available in the given link :
https://github.com/Swayatta/Competing-Models.
Keywords
NER, Graph Ranking, Distant Supervision, CEUR-WS
1. Introduction model names from a research paper and links them to
their respective citation. While browsing related work
The number of scientific publications in the computer for a given task, a researcher has to manually visit every
science domain has increased exponentially in the recent research paper that uses a competing model that is used
past. Hence, it has become increasingly cumbersome for the same task. This process is time-consuming if a
for researchers to keep track of the advancement of the survey of a research landscape is to be done on a large
research landscape. Often, research papers introduce scale. Our motivation is to automate this process by au-
new models that perform strongly in comparison with tomatically extracting model names that solve a similar
the baseline or advance the state-of-the-art. In order to task and linking them to their corresponding cited paper.
effectively benchmark models and compare their perfor- If executed on a large scale, this pipeline would be able
mances, it is important to be able to map the research to effectively map the computer science research land-
landscape for similar or related tasks. Papers with Code scape in an automatic and scalable manner with minimal
(Pwc1 ) is a community driven corpus that serves to au- human intervention.
tomatically list models that solve particular subtasks , We introduce a strong baseline for this task by com-
with links to the scientific research paper that introduced bining an unsupervised document level graph ranking
the model. Our aim is to build a similar but automated algorithm and a supervised BERT-based sequence tagger
end-to-end pipeline which detects model names from sci- to obtain entity model names. Essentially, we treat the
entific papers and benchmarks them against other similar relevant keyphrases extracted by the graph ranker as a
models that solve the same task. superset of candidates for the sequence labeller.
In this paper, we introduce the task of extracting com- We introduce two datasets for this task. For training
peting model names from a research paper. We establish the supervised sequence tagger, we create weakly super-
an end-to-end pipeline that extracts all the competing vised distant labels using an external Knowledge Base
Proceedings of the AAAI-22 Workshop on Scientific Document and unlabelled corpora. We also release a manually anno-
Understanding at the Thirty-Fifth AAAI Conference on Artificial tated dataset for the evaluation purpose of the sequence
Intelligence (AAAI-22) tagger. For evaluating the entire framework of compet-
© 2021 Copyright for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). ing model name extraction, we release another dataset
CEUR
CEUR Workshop Proceedings (CEUR-WS.org)
http://ceur-ws.org
with full paper document level annotation. Furthermore,
Workshop ISSN 1613-0073
Proceedings
1
https://paperswithcode.com/
we use a simple entity citation linking technique to link using supervised training using deep learning models.
the extracted model names with their respective citation However, supervised learning techniques require a large
in the research document. We believe this task will be a amount of token-level labelled data for NER tasks. Anno-
significant step forward towards mapping the research tating a large number of tokens can be time-consuming,
landscape of computer science. expensive and laborious. For real-life applications, the
Our contributions can be summarised as follows: lack of labelled data has become a bottleneck on adopting
deep learning models to NER tasks.
• We introduce a novel approach of treating ranked
Most scientific named entities can be classified as long-
keyphrases as a superset of sequence labellers for
tailed entities because of the rarity and domain-specificity
solving this task. To the best of our knowledge,
of their occurrence. Recent work on NER in scientific doc-
this approach has not been used before in prior
uments has been concentrated around detecting biomed-
research work. We believe this approach can be
ical named entities [10] or scientific entities like tasks,
extended to other similar tasks that require docu-
methods and datasets [1, 2, 11]. Some papers like [12]
ment level contextual information for NER.
focus on the detection of a single specific entity-type (like
• We create an annotated dataset of annotated full dataset names) from scientific documents. Although pre-
papers for evaluation of the pipeline. Previous vious work has focused on identifying methods [1, 2] as
datasets for sequence labelling in the scientific named entities, but what constitutes a method can have
literature focused only on annotating abstracts of a significant variance when it comes to human annotated
scientific papers [1, 2]. We believe the approach data. The authors [1] report the Kappa score of 76.9% for
of incorporating full length document informa- inter-annotator agreement in the SciERC dataset, which
tion is crucial to capture the entire document con- is widely used as a benchmark for scientific entity extrac-
text, hence we introduce a full paper annotated tion.
dataset for final evaluation. NER has traditionally been treated as a sequence la-
• We introduce strong baselines while relying only belling problem, using CRF [13] and HMM [14]. Recent
on distantly supervised weak labels to train our approaches have used deep learning based models [15]
sequence labeller. We evaluate the trained model to address this task, which require a large amount of
on our annotated evaluation dataset. labelled data to train. The high cost of labelling remains
the main challenge to train such models on rare long
2. Related Work tailed entity types, where availability of labelled data is
scarce. In order to address the label scarcity problem,
Unsupervised Ranking Algorithms for Keyphrase several methods like Active Learning [16], Distant Super-
Extraction: EmbedRank[3] extracts candidate phrases vision [17, 18, 19], Reinforcement Learning-based Distant
based on POS sequences and uses sentence embeddings Supervision[20, 21] have been proposed. [12] focused
(Doc2Vec or Sent2vec) to represent both the candidate on detecting dataset mentions from scientific text and
phrases and the document in the same high-dimensional used data augmentation to overcome the label scarcity
vector space and ranks them using cosine similarity with problem.
respect to the document embedding. [4] propose Wiki-
Rank, an unsupervised automatic keyphrase extraction
method that links semantic meaning to text. In graph-
3. Motivation
based ranking algorithms, candidate phrases are treated Papers with Code (PwC2 ) is a community driven cor-
as nodes and related candidate phrases are connected pus that serves to automatically list models that solve
by edges. TextRank [5] considered related candidates as particular subtasks, with links to the scientific research
co-occurring phrases within a given window. SingleR- paper that introduced the model. Our aim is to build a
ank [6] added weights to the edges between related can- similar but automated end-to-end pipeline that detects
didates.SGRank [7] and PositionRank [8] incorporated model names from scientific papers and benchmarks
statistical and positional heuristics into a graph-based them against other similar models that solve the same
algorithm to obtain ranked keyphrases. MultipartiteRank task. We believe the task introduced in this paper (ex-
[9] is an advanced version of TextRank that incorporates traction of competing model names from scientific docu-
positional knowledge in edge weights, leading to state- ments) to be a significant step forward towards the whole
of-the-art performances over benchmark datasets. pipeline.
Sequence labelling for Named Entity Recognition:
Long tailed entities are named entities which rarely occur
in text documents. For these types of entities, the task of
Named Entity Recognition (NER) is non-trivial. Recent
approaches have aimed at solving the problem of NER 2
https://github.com/paperswithcode/paperswithcode-data
Type Sentence Paper Title
Competing Other transition-based models extend TransE to additionally use projection A Novel Embedding
vectors or matrices to translate head and tail embeddings into the relation Model for Knowledge
vector space, such as: TransH (Wang et al., 2014), TransR (Lin et al., 2015b), Base Completion Based
TransD (Ji et al., 2015), STransE (Nguyen et al., 2016b) and TranSparse (Ji on Convolutional Neural
et al., 2016). Network
Competing In Table 2, we compare SCIBERT results with reported BIOBERT results on SCIBERT: A Pretrained
the subset of datasets included in (Lee et al., 2019). Language Model for Sci-
entific Text
Non-competing TransE [4] is a translation based model inspired by Word2Vec [16] On Evaluating Embed-
ding Models for Knowl-
edge Base Completion
Non-competing (Xie et al. 2016) use convolutional neural networks (CNN) to encode word se- KG-BERT: BERT for
quences in entity descriptions. Knowledge Graph Com-
pletion
Non-competing To find the hyper-parameters, we used HyperOpt (Bergstra et al., 2015), which Tabular Data: Deep
uses Bayesian optimization. Learning is Not All You
Need
Table 1
Few examples of competing and non-competing models. The competing models are highlighted in bold, whereas
the non-competing models are highlighted in underlined italic.
4. Task Definition In this paper, we present SDP-LSTM, a novel neural network to classify the
relation of two entities in a sentence.
Inspired by the unique feature representation learning capability of deep
We define competing models as model names that at- autoencoder, we propose a novel model, named Deep Autoencoder-like NMF
tempt to solve the same task as investigated by the target (DANMF), for community detection.
research paper. For example, if a research paper investi- We introduce the Multi-View Transformation Network (MVTN) that regresses
gates the task of producing knowledge base embeddings, optimal view-points for 3D shape recognition, building upon advances in
differentiable rendering.
TransR [22] will be a competing model name as it has
been introduced by prior research work to solve the same
task. If a research paper investigates the task of Ques- Figure 1: Example sentences with annotated model name
tion Answering, some competing model names can be T5 entities
model [23] or XL-Net [24], because these are models that
have been used to solve this task in prior research work.
A non-competing model name would be a model that has extracting the model names, we link the extracted entities
not been used directly to solve the same task. We pro- with their respective cited papers.
vide a few examples to illustrate the difference between
a competing and a non-competing model in Table 1. For
the first two examples, the models highlighted in bold are 5. Annotation Process
competing models because they directly solve the task
We create two datasets for training and evaluation. We
investigated in the input research paper. For the third
annotate sentences from scientific papers as per token-
example, TransE is a competing model, but Word2Vec is
level BIO tagging scheme to evaluate our sequence la-
not. The reason for this is that TransE produces Knowl-
beller, which only uses contextual information from an
edge Base embeddings directly that aid in Knowledge
input sentence for sequence tagging. To evaluate the
Base completion (which is the target task in the research
whole pipeline, we provide document-level annotations
paper). But, Word2Vec is a language model that TransE
with full length research papers as input and competing
is inspired by, as denoted in the sentence. Hence, it only
model names as the annotated output. We use two dif-
contributes indirectly to the research task. So, it is a
ferent datasets for a more comprehensive evaluation, as
non-competing model. Similarly, HyperOpt, in the last
our pipeline uses two stages. The first stage involves
example, is non-competing, as it is an algorithm the au-
extracting candidate keyphrases utilising the entire doc-
thors used for hyperparameter search and is not a model
ument level information for keyphrase ranking. The
that contributes directly in solving the task investigated
second stage is our sequence labeller that uses sentence
in the input research paper.
level information to find model named entities. We de-
Our task in this paper is to detect competing model
scribe the annotation process for the dataset creation
names given an input research document. Also, after
for sequence labelling first. Considering our end goal
Train Test Total length research papers. We read through the introduc-
# sentences 7800 1000 8800
tion and find out the task the paper solves. Then we
# tokens 232600 22873 255473
browse the entire paper and find all mentions of model
# entities 19012 3647 22659
# unique entities 14748 1249 15672 names that solve a similar task. The process has a low
avg # tokens per sen- 29.82 22.873 29.03 level of ambiguity because a majority of the model men-
tence tions occur in the related work section, citation contexts
avg # entities per sen- 2.44 3.65 2.57 or experimental results section. It is a standard prac-
tence tice among authors to cite the relevant research paper if
they mention any model names from prior research work.
Table 2
Hence, we only consider models that the authors cite to
Overall statistics of train and evaluation dataset for sequence
labeller evaluation be candidates for competing models. We make sure the
labelled entities are model names by referring to Google
Scholar and Semantic Scholar. If there is any ambiguity
# total papers 75 regarding whether a labelled entity is a model name or
# total sentences 34656 not, we discard the full paper. To infer if a model is a
# avg sentences per paper 462.08 competing model or not, we find the task or the problem
# entities 622 the paper solves. This is usually mentioned clearly in the
# unique entities 473 introduction and the related work section. We label the
# avg entities per paper 8.29 model entities (that the authors mention as solving a sim-
Table 3 ilar problem or task as the original paper) as competing
Overall statistics of the document-level annotated dataset for models. To further verify that the claim by the authors is
evaluation of the entire pipeline indeed true, we visit the cited research paper and ensure
that the model is solving a similar task. Furthermore,
we only consider papers where the “competing” relation
among models is clear and discard any paper where there
of automating a high precision framework of extracting is ambiguity regarding this relation. Hence, we ensure
related model names and to minimise ambiguity, we con- ambiguity to be significantly low regarding our annota-
sider only named models as model entities for this task . tions. The statistical details about the annotations are
Few examples are - NMN+LSTM+FT, SpERT (with overlap), provided in Table 3. As we ensure a negligible level of
B-BOT + Attention and CL loss, SA-FastRCNN, DS-CNNs ambiguity, we use only one human annotator (one of
(Random Walk), Sparse Transformer 59M (strided). We the authors in this paper) for our annotation process.
consider model entities that have a unique name or that We believe the need for multiple annotators for an inter-
are formed by combination of other model names, eg - annotator agreement is insignificant for our task, as a
NMN+LSTM+FT. A few example sentences with model low level of ambiguity is ensured by considering only
entities are displayed in Figure 1. We define and annotate named models and clearly defined tasks with competing
the test corpus using the standard BIO tagging scheme. model names.
Each model entity type was defined to have maximum
span length. For Acronyms, we consider the full length
entity name instead of the short form acronym if it occurs
6. Method
in text - eg. DeCLUTR: Deep Contrastive Learning for Un- Our entire pipeline has two components. Firstly, we ex-
supervised Textual Representations. On average, there are tract all citation sentences from the input research paper.
2.5 tokens per entity. We refer Google Scholar and Seman- We combine all the citation sentences to create a mini-
tic Scholar to confirm entity types. We randomly selected document. We use a graph ranking algorithm to extract
a subset of abstracts from the arxiv dataset containing all the candidate keyphrases from this mini-document.
1.7M+ paper data and metadata and randomly select sen- This graph ranking algorithm utilises document level
tences from them to annotate. Also, we randomly sample information to rank keyphrases. Secondly, we use a se-
the DBLP citation dataset containing 1,511,035 papers quence labeller for extracting named entities from the
and obtain the full length versions from the available pa- positively labelled citation sentences. Lastly, we merge
pers using DOI matching and obtained a random sample the results of the graph ranker and the sequence labeller
of sentences from the full text. We use two different sets to output final competing model entities. In the subsec-
of corpus because we want our model to be evaluated tion Sequence Tagging , we provide details about the
on multiple domains within computer science and differ- training process and the model for our sequence tagger.
ent publication venues. All the statistics related to our In subsection Graph-Ranking Algorithm, we provide de-
annotated corpus and train set are provided in Table 2 tails about the unsupervised graph ranking algorithm for
For evaluating the whole pipeline, we annotated full
keyphrase extraction. is provided in section Distantly Supervised NER Model.
The training process overview for the sequence labeller
6.1. Graph-Ranking Algorithm is shown in Figure Training pipeline for the Sequence
Labeller.
We use Multipartite Rank [9] as it had proved to be the
state-of-the-art among all keyphrase ranking algorithms, 6.2.1. Training Set Creation with Entity
performing particularly well on longer scholarly docu- Replacement
ments. We briefly describe how we use this algorithm
for unsupervised keyphrase extraction. We utilise the publicly available Papers with Code (PwC)
Let 𝐶 be the set of all citation sentences in a docu- corpus as a Knowledge Base. We crawl PwC and ob-
ment 𝑑. 𝐶 forms an order set of citation sentences, which tain all the model names occurring in the metadata for
is collectively treated as a document. We build a graph each task and subtask. We obtain a total of 14,748 model
representation of 𝐶. A set of candidate keyphrases 𝐾 is ex- names. For the unlabelled corpora, we use a total of
tracted from 𝐶. The candidate keyphrases 𝐾 are grouped 227,000 abstracts from arxiv and obtain all sentences
into topics based on the stem forms of the words they (7800) containing a model name mention. We find that
share using hierarchical agglomerative clustering with the occurrence of some model names is much more fre-
average linkage. The candidate keyphrases are used to quent in literature (e.g - CNN). Due to the small dataset
build a multipartite graph, where the nodes are keyphrase size and the large imbalance in few entity mentions, the
candidates that are only connected if they belong to a dif- model is prone to overfitting. To mitigate this, we use
ferent topic. The edges between each node is weighted as a simple entity replacement technique, where we find
the inverse of the distance between the two keyphrases all model entity mentions, and randomly replace them
𝐾𝑖 , 𝐾𝑗 in 𝐶. Weight 𝑤𝑖𝑗 is calculated as the sum of the with other names to ensure a fairer distribution. The
inverse distances between 𝐾 𝑖 and 𝐾 𝑗: distribution pre-replacement is shown in Figure 4. We
use all 14,748 model entities at least once and limit an
1 entity occurrence to at most 2 in the train dataset, after
𝑤𝑖𝑗 = ∑ ∑
𝑝 ∈𝑃(𝐾 ) 𝑝 ∈𝑃(𝐾 )
𝑝𝑖 − 𝑝𝑗 replacement.
𝑖 𝑖 𝑗 𝑗
where 𝑃(𝐾𝑖 ) is a set of word offset positions of 𝐾𝑖 . The 6.2.2. Distantly Supervised NER Model
first occurring candidates of each topic are promoted
more as they capture higher relevance. Weights of the We treat NER as a sequence labelling problem. Given a
first occurring candidates of each topic is modified ac- sequence of 𝑁 tokens 𝑋 = [𝑥1 , ..., 𝑥𝑁 ], we aim to find an
cording: entity which is a span of tokens 𝑠 = [𝑥𝑖 , ..., 𝑥𝑗 ](0 ≤ 𝑖 ≤
1
𝑗 ≤ 𝑁 ) associated with the entity type model name. We
𝑤𝑖𝑗 = 𝑤𝑖𝑗 + 𝛼.𝑒 𝑝𝑖 ∑ 𝑤𝑘𝑖
formulate this as a sequence labelling task of assigning
𝐾𝑘 ∈𝑇 (𝐾𝑗 )\𝐾𝑗
a sequence of labels 𝑌 = [𝑦1 , ..., 𝑦𝑁 ]. The aim of our
where 𝛼 is a hyperparameter that controls the strength sequence labeller is to classify each token as a certain
of the weight adjustment, 𝑇 (𝐾𝑗 ) is the set of candidates entity type as per the BIO tagging scheme.
belonging to the same topic as 𝐾𝑗 , 𝑝𝑖 is the offset position We consider 𝐾 train sentences denoted as {(𝑋𝑘 , 𝑌𝑘 )}𝐾𝑘=1
of the first occurrence of candidate 𝐾𝑖 . After the graph with distant token level annotations. We aim to learn a
is built, a ranking algorithm is then used to order each function 𝑓 (𝑋 , 𝜃), which can correctly predict the entity
keyphrase candidate 𝐾𝑖 . We adopt the popular TextRank labels for a train sentence 𝑋𝑘 . We minimise the loss:
Algorithm [5] for the ranking mechanism. A final set of
top ranked keyphrases 𝐾̃ is obtained. 𝐾
1
𝜃 ∗ = arg min ∑ 𝑙(𝑌 , 𝑓 (𝑋𝑘 , 𝜃))
𝜃 𝐾 𝑘=1 𝑘
6.2. Sequence Tagging
over {(𝑋𝑘 , 𝑌𝑘 )}𝐾 where 𝜃 is the parameter and 𝑙 is the
For training our sequence tagger, we only rely on distant cross-entropy 𝑘=1 loss.
labels created using an external Knowledge Base and an We experiment with multiple baselines which are stan-
unlabelled research text corpus. We also demonstrate dard for the sequence labelling process.
that for long-tailed entity types, there is a need to en-
sure fairer distribution among entity occurrence in order • A BiLSTM + CRF model where the bidirectional
to prevent overfitting, which occurs in the form of the contextual representations are captured by the
model memorising certain popular entity names. The de- BiLSTM model, and the resultant representations
tails about the training set creation is provided in section are passed to the Conditional Random Field (CRF)
Training Set Creation with Entity Replacement. The de- that produces sequence labels as output.
tails about the model and the results on the evaluation set
Sentences
The optimized 4-layer BiLSTM model was then calibrated
and validated for multiple prediction horizons.
Furthermore, case studies show that SIMCLDA
can effectively predict candidate lncRNAs
for renal cancer. B-Model I-Model O O O O
Longformer's attention mechanism is a
drop-in replacement for the standard self-attention. Entity Replacement
The optimized 4-layer BiLSTM model CRF Layer
was then calibrated and ...
Bi-LSTM MODEL
SIMCLDA MODEL
Longformer MODEL The optimized 4-layer TransE model
was then calibrated and ... SciBERT Embeddings
Unlabelled
corpora
Weak Labels
Distantly Labelled
Training Data
Knowledge
Base
Figure 2: Training pipeline for the Sequence Labeller
Graph-Ranker
Entity-Citation
Citation
Linker
Sentences
Predicted
The authors use CNN [1] Entities The authors use CNN [1] layer on top of
layer on top of BERT [2] BERT [2] embeddings.
embeddings
BERT
Target
CNN
Scientific B-Model
Document
...
The ImdB dataset [2] is
.
popular for sentiment
The authors use CNN [1] layer on
classification
top of BERT [2] embeddings Predicted Entities ( with
citation link)
CNN [1]
SciBERT Embeddings +
CRF
BERT [2]
Trained NER Model
(Model Name Extractor)
Figure 3: Inference Pipeline of the end-to-end framework
BERT-based language model train on large un-
labelled scientific corpora using MLM objective.
The output embeddings are passed to the linear
CRF layer which predicts token labels from con-
Figure 4: Distribution of entity occurrence frequency in the
textual representations.
training dataset pre-replacement We evaluate our baselines using our evaluation dataset
and the results are displayed in Table 4. We demonstrate
that entity replacement provides a significant boost in
• A BERT + CRF model where the contextualised performance for each of these models. The reason is
embeddings are captured by a pre-trained BERT that the model does not memorise entity names for the
base uncased model and passed onto the CRF replaced dataset and uses the context to predict the en-
layer to produce token labels. tity types. The results also prove that standard NER
• A SciBERT + CRF model where the domain spe- approaches can provide decent results on the evaluation
cific contextualised embeddings are captured by dataset while relying only on weakly labelled training
a pre-trained SciBERT [25] model. SciBERT is data.
P R F1 P R F1
BiLSTM + CRF (w/o replace- 0.205 0.519 0.294 TextRank 0.063 0.273 0.098
ment) PositionRank 0.098 0.841 0.162
BERT + CRF (w/o replace- 0.389 0.310 0.345 SingleRank 0.105 0.863 0.179
ment) MultipartiteRank 0.123 0.834 0.214
SciBERT+CRF (w/o replace- 0.391 0.312 0.346 SciBERT-CRF 0.290 0.764 0.420
ment) TextRank + SciBERT-CRF 0.512 0.235 0.322
BERT+CRF (with replace- 0.575 0.563 0.569 PositionRank + SciBERT-CRF 0.608 0.661 0.633
ment) SingleRank + SciBERT-CRF 0.609 0.679 0.642
BiLSTM + CRF (with replace- 0.628 0.631 0.629 MultipartiteRank+SciBERT-CRF 0.639 0.672 0.655
ment)
SciBERT+CRF (with replace- 0.641 0.632 0.636 Table 5
ment) Result on evaluation on the document level annotated dataset
Table 4
Result on Evaluation Dataset
model entity mentions with a good accuracy while consid-
ering sentences as contextual information as reported in
Table 4, not all models are competing. In order to discern
7. Combining Graph-Ranker and which of the extracted candidate entities are competing
Sequence Tagger models, document context is needed. Hence, we find
that combining the two approaches leads to a significant
We used the Unsupervised Keyphrase Extraction algo- boost in precision while maintaining a decent recall. The
rithm to capture only those keyphrases that are most highest performance is yielded by the combination of
relevant to the document. Although the Sequence Tag- Multipartite Rank with SciBERT-CRF, despite Multipar-
ger performs well on detecting model name mentions tite Rank having a slightly lower recall than SingleRank.
using sentences as the contextual information, we need The reason can be attributed to the higher precision of
to capture document level relevance as well to extract Multipartite Rank among all unsupervised keyphrase ex-
competing models. The reason is that not all model name traction algorithms investigated. The higher precision
mentions are relevant to the task the given target research in Multipartite Rank can be attributed to the fact that it
paper aims to solve. Hence, we predict only those entities aims to select the most relevant phrases by incorporating
which are common to both top-ranked keyphrases and positional information among edge weights among the
the extracted model names from our distantly supervised candidate keyphrases. Hence, its combination with the
sequence tagger. More formally, sequence labeller yields the highest F1-score among all
″ ̃ ̃ combinations.
𝑌 = 𝑌 ⋂𝐾
where 𝑌̃ is the set of predicted entities by the sequence 9. Entity Citation Linker
tagger, 𝐾̃ is the set of top-ranked keyphrases and 𝑌 ″ is
the final set of predicted entities. The entire inference The entity citation linker is inspired from the prior work
pipeline is illustrated in the Figure 3. of [27]. The aim of this algorithm is to link the entities
with their corresponding citation. The first step is to
obtain all the possible entities and the citations. Then,
8. Results a closeness score is calculated for each entity-citation
We use the evaluation metric of micro-average Preci- pair, which is the string distance between the entity and
sion, Recall and F1-Score to evaluate the performance the citation. Then, we take all the citations and keep
of the different baselines investigated. We use the full only the closest citations per entity. Finally, we take all
document-level annotated dataset for this evaluation. the entities and keep the closest entity per citation. As
We report the results in Table 5. We compare perfor- demonstrated by the authors, this technique is able to
mances of 4 Unsupervised Graph-Rankers for keyphrase accurately map most entities with their corresponding
extraction: TextRank [5], SingleRank [26], PositionRank citations. We use this technique to link all the extracted
[8] and MultipartiteRank [9]. We observe that the recall model entities with their respective citations.
is highest for SingleRank, as it extracts most of the rele-
vant candidate keyphrases and ensures a high amount
of entity coverage. For SciBERT-CRF model, we notice
that even though the recall is high, the precision is signif-
icantly low. It is due to the fact that although it detects
10. Error Analysis 12. Conclusion and Future work
We conduct error analysis for the Unsupervised We have introduced the task of extraction of competing
keyphrase extraction, model entity extraction using se- models from a research paper. We use a novel approach
quence labelling, the two-stage framework and entity of treating relevant keyphrases extracted using an Un-
citation linking. For the keyphrase extraction, the graph- supervised Graph Ranking algorithm as the superset of
ranker extracts most of the relevant model candidates. a BERT-based sequence labeller. We also use distant
However, precision suffers significantly as most mod- supervision to train our sequence labeller. We test our se-
els are not keyphrases. Their are multiple keyphrases quence labeller and the entire pipeline on two annotated
extracted by the algorithm that are not model names datasets. We also utilise a simple entitiy replacement
- few examples being domain names like ‘Information technique to reduce overfitting in the sequence labeller.
Retrieval’, ‘Networking architecture’, dataset names like Finally, we use the entity-citation linking technique to
‘SquaD 1.1’ or other terms that are relevant to the re- link all the extracted model entities with their respective
search paper. citation. We believe this work to be a significant step for-
For the sequence labeller, we observe mainly two types ward to map the research landscape of Computer Science
of error. First, we notice precision error being introduced in an automated and scalable manner.
into the model because in the training set we consider
maximum span of each entity and the occurrence of I-
Model ( token lying inside a named entity) is relatively References
high. However, in the evaluation test set of the sequence
[1] Y. Luan, L. He, M. Ostendorf, H. Hajishirzi, Multi-
labeller, the occurrence of singular B-Model entities is
task identification of entities, relations, and coref-
massively more. This leads to the misclassification of O
erence for scientific knowledge graph construc-
as an I by the model. Also, although the model is able to
tion, in: Proceedings of the 2018 Conference
detect model entities reasonably given the sentence as the
on Empirical Methods in Natural Language Pro-
context, it is unable to discern competing models from
cessing, Association for Computational Linguis-
unrelated ones. This leads to a significant precision de-
tics, Brussels, Belgium, 2018, pp. 3219–3232. URL:
crease when evaluated on the document-level annotated
https://aclanthology.org/D18-1360. doi:10.18653/
evaluation set.
v1/D18- 1360 .
Finally, after evaluating the performance of the two-
[2] S. Jain, M. van Zuylen, H. Hajishirzi, I. Beltagy,
stage pipeline on the document-level annotated dataset,
Scirex: A challenge dataset for document-level in-
we find that the model often mistakes dataset names for
formation extraction, in: Proceedings of the 58th
model entity mentions. This can be attributed to the high
Annual Meeting of the Association for Computa-
relevance of datasets with respect to the research paper.
tional Linguistics, 2020. arXiv:2005.00512 .
Lastly, for the entity citation linker, sometimes an en-
[3] K. Bennani-Smires, C. C. Musat, M. Jaggi, A. Hoss-
tity that is associated with a citation marker occurs in
mann, M. Baeriswyl, Embedrank: Unsupervised
the initial part of a sentence and its not the closest to the
keyphrase extraction using sentence embeddings,
citation. This can lead to missed out or incorrect linking.
ArXiv abs/1801.04470 (2018).
[4] Y. Yu, V. Ng, Wikirank: Improving keyphrase ex-
11. Implementation details traction based on background knowledge, ArXiv
abs/1803.09000 (2018).
We implement the NER model in Pytorch. For tokeniza- [5] R. Mihalcea, P. Tarau, Textrank: Bringing order
tion, we use the pre-trained SciBERT tokenizer. The into text, in: EMNLP, 2004.
embedding layer is the output from the pre-trained SciB- [6] X. Wan, J. Xiao, Single document keyphrase extrac-
ERT model. We include a dropout layer with a dropout tion using neighborhood knowledge, in: Proceed-
probability of 0.5 to reduce overfitting. Learning rate ings of the 23rd National Conference on Artificial
is set to 1e-5 and we train all models for a total of 10 Intelligence - Volume 2, AAAI’08, AAAI Press, 2008,
epochs. The output from the dropout layer is passed p. 855–860.
through a linear layer with input dimension same as the [7] S. Danesh, T. Sumner, J. H. Martin, SGRank: Com-
hidden dimension of SciBERT (768). For all Unsupervised bining statistical and graphical methods to improve
Graph Ranker, we use the same hyperparameter settings the state of the art in unsupervised keyphrase
as specified in their respective papers extraction, in: Proceedings of the Fourth Joint
Conference on Lexical and Computational Se-
mantics, Association for Computational Linguis-
tics, Denver, Colorado, 2015, pp. 117–126. URL:
https://aclanthology.org/S15-1013. doi:10.18653/ distant supervision for low-resource named
v1/S15- 1013 . entity recognition, CoRR abs/2102.13129
[8] C. Florescu, C. Caragea, PositionRank: An unsuper- (2021). URL: https://arxiv.org/abs/2102.13129.
vised approach to keyphrase extraction from schol- arXiv:2102.13129 .
arly documents, in: Proceedings of the 55th An- [20] F. Nooralahzadeh, J. T. Lønning, L. Øvrelid,
nual Meeting of the Association for Computational Reinforcement-based denoising of distantly super-
Linguistics (Volume 1: Long Papers), Association vised NER with partial annotation, in: Proceedings
for Computational Linguistics, Vancouver, Canada, of the 2nd Workshop on Deep Learning Approaches
2017, pp. 1105–1115. URL: https://aclanthology.org/ for Low-Resource NLP (DeepLo 2019), Association
P17-1102. doi:10.18653/v1/P17- 1102 . for Computational Linguistics, Hong Kong, China,
[9] F. Boudin, Unsupervised keyphrase extraction 2019, pp. 225–233. URL: https://aclanthology.org/
with multipartite graphs, in: Proceedings of the D19-6125. doi:10.18653/v1/D19- 6125 .
2018 Conference of the North American Chapter [21] Y. Yang, W. Chen, Z. Li, Z. He, M. Zhang, Dis-
of the Association for Computational Linguistics: tantly supervised NER with partial annotation
Human Language Technologies, Volume 2 (Short learning and reinforcement learning, in: Proceed-
Papers), Association for Computational Linguistics, ings of the 27th International Conference on Com-
New Orleans, Louisiana, 2018, pp. 667–672. URL: putational Linguistics, Association for Computa-
https://aclanthology.org/N18-2105. doi:10.18653/ tional Linguistics, Santa Fe, New Mexico, USA,
v1/N18- 2105 . 2018, pp. 2159–2169. URL: https://aclanthology.org/
[10] V. Kocaman, D. Talby, Biomedical named entity C18-1183.
recognition at scale, CoRR abs/2011.06315 [22] Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning en-
(2020). URL: https://arxiv.org/abs/2011.06315. tity and relation embeddings for knowledge graph
arXiv:2011.06315 . completion, in: AAAI, 2015.
[11] S. Mesbah, C. Lofi, M. V. Torre, A. Bozzon, G.-J. [23] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang,
Houben, Tse-ner: An iterative approach for long- M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the
tail entity extraction in scientific publications, in: limits of transfer learning with a unified text-to-
International Semantic Web Conference, Springer, text transformer, Journal of Machine Learning Re-
2018, pp. 127–143. search 21 (2020) 1–67. URL: http://jmlr.org/papers/
[12] Q. Liu, P. cheng Li, W. Lu, Q. Cheng, Long-tail v21/20-074.html.
dataset entity recognition based on data augmenta- [24] Z. Yang, Z. Dai, Y. Yang, J. G. Carbonell, R. Salakhut-
tion, in: EEKE@JCDL, 2020. dinov, Q. V. Le, Xlnet: Generalized autoregres-
[13] J. Lafferty, A. McCallum, F. Pereira, Conditional sive pretraining for language understanding, in:
random fields: Probabilistic models for segmenting NeurIPS, 2019.
and labeling sequence data, in: ICML, 2001. [25] I. Beltagy, K. Lo, A. Cohan, Scibert: A pretrained
[14] H. L. Chieu, H. Ng, Named entity recognition with language model for scientific text, arXiv preprint
a maximum entropy approach, in: CoNLL, 2003. arXiv:1903.10676 (2019). URL: https://www.aclweb.
[15] J. Li, A. Sun, J. Han, C. Li, A survey on deep learning org/anthology/D19-1371/.
for named entity recognition, ArXiv abs/1812.09449 [26] X. Wan, J. Xiao, Single document keyphrase ex-
(2018). traction using neighborhood knowledge, in: AAAI,
[16] S. Goldberg, D. Z. Wang, C. Grant, A probabilisti- 2008.
cally integrated system for crowd-assisted text la- [27] S. Ganguly, V. Pudi, Competing algorithm detection
beling and extraction, J. Data and Information Qual- from research papers, in: Proceedings of the 3rd
ity 8 (2017). URL: https://doi.org/10.1145/3012003. IKDD Conference on Data Science, 2016, CODS ’16,
doi:10.1145/3012003 . Association for Computing Machinery, New York,
[17] X. Wang, Y. Guan, Y. Zhang, Q. Li, J. Han, Pattern- NY, USA, 2016. doi:10.1145/2888451.2888473 .
enhanced named entity recognition with distant
supervision, in: 2020 IEEE International Confer-
ence on Big Data (Big Data), 2020, pp. 818–827.
doi:10.1109/BigData50022.2020.9378052 .
[18] C. Liang, Y. Yu, H. Jiang, S. Er, R. Wang, T. Zhao,
C. Zhang, BOND: bert-assisted open-domain
named entity recognition with distant supervision,
CoRR abs/2006.15509 (2020). URL: https://arxiv.org/
abs/2006.15509. arXiv:2006.15509 .
[19] M. A. Hedderich, L. Lange, D. Klakow, ANEA: