Joint Extract Method from Scholarly Papers 1
Jianfan Ge, Ting Jiang

Nanjing University Of Finance & Economics, NanJing, China

                Abstract
                Entities and relations concisely reflect important information related to the subject matter of
                the literature, which is essential for understanding and analyzing it. In scientific research,
                methods are indispensable tools and important research objects for solving scientific problems
                (methods include tasks, discipline-specific methods, models, algorithms, and metrics, etc.).
                Therefore, 'method' is indispensable for the understanding and analysis of academic literature.
                This paper aims to extract method-like entities and relations from scientific abstracts using a
                semantically enhanced deep learning model. We explored the impact of linguistic information
                on the entity and relation extraction task, and to this end, we added additional POS tag
                information to the word vectors obtained through the pre-trained model to highlight POS tag
                information, which proved to be superior to the pre-trained word vectors alone. Individually,
                in the entity recognition part, the token sequence length of entities is considered as the feature,
                and in the relationship extraction part, performing max pooling over the context between entity
                candidates has been proven better than full context, additionally, the distance between entity
                candidates is embedded by us as an additional feature. Entity type is also entered as an
                additional feature. The sequences of rich token representations constitute a span, over which
                entities and relations are learned jointly. The results on several datasets show that the
                embedding of rich semantic information outperforms the original span-based model.

                Keywords
                entity extraction, relation extraction, method,POS tag,entity distance,token length,entity type
1. Introduction
    In the era when the data was dense, a large number of papers are published daily[1-3]. For most
academic researchers, considering the diversity and explosive growth of research in the field, the speed
of reading is much slower than the speed of publication, making it impossible to always access the latest
methods from the most recent literature, which means that traditional manual searches to find scientific
methods become challenging[4]. Therefore, in order to help scholars form a methodological system for
their research directions and obtain the most cutting-edge methods related to the research content while
greatly saving the labor and time of researchers, it is essential to study the method to extract methods
from large-scale academic literature. As is well known, entity recognition (ER) and relationship
extraction (RE) are essential and challenging tasks in natural language processing (NLP), which can be
beneficial to information extraction from academic literature. Entity recognition and relationship
extraction from academic texts refer to identifying academic entities, such as Task, Method, Metric, and
so on, and extracting semantic relations among these entities, e.g Evaluate-For, Used-For, and so on. ER
and RE from literature are used in a wide range of academic applications, including academic
information retrieval, knowledge graph construction, question answering, article recommendation, etc.
The joint model of ER and RE from scholarly texts aims to extract the entity-relationship-entity tuples.
For example, the following sentence S1 shown in Figure 1 contains two entities; we delineate the mention
with square brackets and the corresponding entity types with suffixes：


AHPCAI2022@2nd International Conference on Algorithms, High Performance Computing and Artificial Intelligence
EMAIL: gejianfan189@gmail.com (Jianfan Ge), * Corresponding author:1259850279@qq.com (Ting Jiang)
             © 2022 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                 59
Fig.1 The example sentence from the SciERC dataset. Generalized LR parsing and approach are entities,
which are methods and generic types, respectively. The relation (or relation type) that points from
generalized LR parsing to approach is Used-for.

    Previous research has shown that method entities identified by complex rules are more responsive to
searchers' needs than those identified by matching terms in academic literature[5]. As a result,
researchers have proposed more complex rules to accomplish the extraction task, including cue words,
language patterns, lexicality, word position, etc[6-10]. duck et al.[11] created a named entity recognizer
used in bioinformatics called bioNerDS to extract software entities and dataset entities from papers. Noun
phrases from academic papers were extracted and scored based on different rules. In the first round,
candidate entities are checked to see if they appear in the generated dictionary. In the second round,
strong rules extracted from the article, such as version information, references and URLs, are classified
into positive and negative rules and assigned different scores. In the last round, some clues, such as
specific verbs, and indicative but ambiguous titles, are combined into weak clues and assigned scores.
Candidate entities were scored according to their compliance with the rules and judged as method entities
based on their final scores.
    In this paper, we propose a semantically enhanced term entity relation extraction model to jointly
extract method entities and relationships from the abstracts of scientific papers. We use a recent model
called SPERT as the baseline, which uses a pre-trained transformer. A shallow entity classifier and a
shallow relationship classifier are applied to extract entities and relations, respectively. The transformer
generates embeddings of tokens in the abstract, and merges the embeddings of a span of tokens into one.
Many natural language processing tasks benefit from the use of linguistic information, such as part-of-
speech tagging, but they are less explored in deep neural models of NER and RE. Petasis et al.[12]
believed that named entities were proper nouns (PN), which served as the name of someone or something.
From the perspective of ontology, Alfonseca and Manandhar[13] proposed that named entities were
objects used to solve specific problems. Borrega et al.[14] defined named entities in detail from the
perspective of linguistics, stipulating that only nouns and noun phrases can be used as named entities.
Although these definitions are not uniform, it can be sure that named entities at least the vast majority
are nouns or noun phrases. The relationships between entity pairs also seem to be related to entity types,
for example, we often find "used-for" relations between "generic" entities and "method" entities. In
relation extraction, contextual information between candidate entities has been proven to be superior to
global contextual information. And in addition to the important information brought by the semantics
itself, the distance information between candidate entities should also be an important feature. For this
purpose, we counted the distance between relational entities, as shown in Table 1. Through the data, we
can find that Conjunction and Feature-of method types basically appear in close entities while other
relationship types are more average. By counting the percentage of different relationships appearing
before and after the entity type, we found that the entity type characteristics also have an important
influence on the relationship, here we take a table as an example, as shown in Table 2. Therefore, we
propose a semantically enhanced model. The main contributions of our work are as follows:
    • We propose an improved joint model of NER and RE for academic texts, in which the type of
         entity and the distance between candidate entities that potentially constitute the relationship are
         considered in RE, and the token sequence length of an entity is considered in NER.
    • We enrich the initial embeddings, the initial embeddings are augmented by semantic information
         and syntactic information.
    • Experiments on real data sets validate the effectiveness of our proposed method.


                                                    60
2. Related works
   With the development of technology, deep learning methods are becoming the focus of research in
the field of machine learning. Deep learning models focus more on the capability of machines and
abandon complex feature engineering, which greatly reduces the labor as well as time cost required for
extracting feature engineering compared to statistical models of machine learning. Therefore, deep
learning has become a very important research direction.
   The supervised deep learning method used in relation extraction can solve the main problems of
manual feature extraction and error propagation that exist in classical methods. The low-level features
are combined to form more abstract high-level features. At present, supervised relation extraction
methods mainly include pipeline approaches and joint approaches.
2.1     Pipeline approaches
    Pipeline approaches refer to the extraction of relations between entities directly based on the entity
recognition already done. The early pipeline approaches mainly used two types of structures,
convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Among them, CNNs with
diverse convolutional kernels are good for recognizing structural features of the target, while RNNs can
fully consider the dependency between long-range words and their memory function is good for
recognizing sequences. Zeng et al.[15] used CNN to extract word-level and sentence-level features for
the first time, and improved the accuracy of the relationship extraction model by using the hidden layer
and softmax layer for relationship classification. Socher et al.[16] first used an RNN approach for entity
relation extraction, which uses recurrent neural networks to syntactically parse the sentences in the
annotated text and then obtains a vector representation considering the syntactic structure of the sentences
after continuous iterations.
    As the research progresses, CNN and RNN methods are continuously improved and refined, and
many variants are generated, such as long short-term memory (LSTM), and bidirectional long short-term
memory (Bi- LSTM), which are improved to solve the gradient disappearance. Xu et al.[18] proposed
an LSTM-based relation extraction method based on the shortest path of syntactic dependency analysis
tree, incorporating features such as word vector, part of speech tags, WordNet, and syntax, using
maximum pooling layer, softmax layer, etc. In addition, with the application of graph convolutional
network (GCN) in the field of natural language processing, GCN has been increasingly used for mining
and exploiting potential information between entities, providing new ideas for solving relation overlap
and entity overlap, and thus further promoting the development of relation extraction. Schlichtkrull et
al.[19] proposed the use of relational graph convolutional neural networks (R-GCNs) on two standard
knowledge bases to accomplish link prediction and entity classification, respectively, where link
prediction extracts missing relations and entity classification completes the missing attributes of entities;
Zhang et al.[20] proposed an extended graph convolutional neural network that can effectively handle
arbitrary dependency structures in parallel and facilitate the extraction of entity relationships to
effectively utilize negative class data.
    Although the Pipeline approach is easy to implement, the entity model and the relation model can use
independent datasets and do not need to label both entity and relation datasets, there are several
disadvantages:
    a) error accumulation: errors in entity extraction will affect the performance of the next step of
relation extraction.
Table 1: Relation entity distance of SciERC.
                                                      Relation entity distance (world)
              Type
                                    [0-3]          [4-7]          [8-11]          [>11]       Ave[>11]*4
           Conjunction             84.5%          10.5%            3.3%           1.8%           0.6%
            Feature-of             69.4%          24.3%            5.8%           0.6%           0.3%
           Hyponym-of              46.3%          29.2%          11.7%           12.8%           3.0%
             Used-for              45.2%          31.0%          12.8%           11.0%           1.1%
              Part-of              48.0%          28.5%          11.7%           11.7%           2.1%
             Compare               36.1%          28.3%          19.9%           15.7%           4.5%
           Evaluate-for            34.2%          32.6%          15.7%           17.6%           3.1%
       All            50.1%        27.8%          11.7%          10.3%            1.0%            All


                                                     61
Table 2: Relation type in SciERC. It shows the percentage of relations between the subject as Task type
and six types of object. It is obvious that when the subject is the type of Task and the object is the type
of Material, the probability that the relation is used-for is 100%. We can also find that when the subject
is Task type, the relation is mainly found in Used-for and Part-of.
                                                             Relation type
      Type (Task)
                                        Feature   Hyponym-         Uses-                         Evaluate-
                         Conjunction                                         Part-of   Compare
                                          -of        of             for                             for

         Task               42.8%        0.5%       18.8%         20.7%      11.4%      2.5%       3.3%

       Method               8.0%        12.0%            \        46.0%         \         \       34.0%

        Metric                \            \             \        14.3%         \         \       85.7%

       Material               \            \             \        100.0%        \         \          \

  OtherScientificTerm       5.9%           \             \        85.3%         \         \        8.8%

        Generic               \          1.8%       34.9%         26.0%       7.1%        \       30.2%


    b) entity redundancy: the redundant information brought by the candidate entity pairs without
relationships will enhance the error rate and increase the computational complexity, since the extracted
entities are firstly paired, and then extract relations between entity pairs.
    c) lacking interaction: the intrinsic connection and dependency between the two tasks are ignored.
2.2     Joint approaches
    Joint approaches can further use the potential information between the two tasks to mitigate the
disadvantage of error propagation. The difficulty of joint approaches is how to enhance the interaction
between the entity model and the relation model. In early works, the connection method relied heavily
on fine-grained feature engineering to establish the interaction between NER and RE[21-23]. Recently,
end-to-end neural networks have proven successful in extracting relational triples[15,24-26], becoming
the mainstream for joint entity and relation extraction.
    Based on their differences in encoding task-specific features, most existing approaches can be divided
into two categories: sequential encoding and parallel encoding. Sequential encoding generally encodes
task features in the sequential order of NER and then RE, and this encoding approach can keep the later
encoded features from affecting the first encoded features directly away, resulting in unbalanced inter-
task interactions. Zeng et al.[27] and Wei et al.[28] are typical examples of this category. They extracted
features for different tasks in a predefined order. Parallel encoding uses two independent encoders to
generate task features, which have no interaction other than shared input, leading to insufficient inter-
task interaction, and in contrast to sequential encoding, models built based on this scheme do not need
to worry about the effect of encoding order. , and encoded the entity and relationship information
separately, and finally completed the extraction of task-specific features in two separate submodels,
respectively. Both encoding methods have their own drawbacks, the inter-task


                                                    62
Fig.2 Framework of the model. Given a corpus of academic texts, the goal of academic entity
recognition and relation extraction is to obtain entity-relationship triples. The input to the model is a
list of academic texts, which are then converted into a sequence of tokens by pretrained model. The
output of the tasks is shown in Figure2.

interaction, and in contrast to sequential encoding, models built based on this scheme do not need to
worry about the effect of encoding order, and encoded the entity and relation information separately, and
finally completed the extraction of task-specific features in two separate submodels, respectively. Both
encoding methods have their own drawbacks, the inter-task interaction in sequential encoding is one-
way with a specific order, while the problem with parallel encoding is that they only retain the shared
features and actively ignore the features that are task-beneficial separately for each task.
3. Model
3.1     Model Architecture
  In this section, we provide a detailed explanation of our model, the framework is shown in Figure 2.
We develop a joint model including five components: 1) an embedding layer, which converts tokens into
embedding vectors, 2) a POS encoder, which converts tokens into part-of-speech, 3)a fusion module,
which fuses word embedding and part-of-speech embedding into one vector,4)a shallow entity classifier,
which classifies any possible sequence of consecutive tokens and 5)a shallow relationship classifier,
which classifies relation for any given set of entity pairs.
3.2     Embedding Layer
    Given a sequence of sentences 𝑆 = {𝑠 , 𝑠 , … , 𝑠 }, the embedding layer transforms the sentences into
the vector matrix, in which each token in the sentence is represented by a pre-trained embedding. The
embedding of each token consists of two parts: pre-trained transformer and POS embeddings.
    a) Pretrained transformer: The recognition and relationship extraction of academic terminology
entities are different from the conventional entity relationship extraction, which is more specific and the
relation between terms is more abstract. Traditional methods are based on manually generated features,
while pre-training techniques are now widely used in deep learning, which have achieved good
performance in computer vision, natural language processing, and other fields. Usually, a model that has
been trained on large-scale data can achieve satisfactory performance with simple training, i.e., fine-
tuning. Obtaining a high-quality initial value of parameters with the help of pre-training techniques not
only reduces the training burden but also helps to improve the model generalization ability.


                                                    63
    However, as Bert received pre-training on general texts from Wikipedia and book corpora, its
performance on domain-specific tasks proved to be suboptimal in several previous works [29-30]. These
empirical findings have driven the development of domain-specific pre-trained language models. For
example, SciBERT in the scientific domain and BioBERT and ALBERT in the biomedical domain. so
the domain-specific pre-trained model SciBERT and BioBERT was used for domain academic entities,
through which academic entity features were obtained to obtain better feature representations.
    We split each sentence 𝑠 into a sequence of tokens 𝑇 = { 𝐶𝐿𝑆 , 𝑡 , 𝑡 , … , 𝑡 , 𝑆𝐸𝑃 }, where [CLS]
and [SEP] are special symbols. The [CLS] captures the contextual information of the text, while the [SEP]
acts as a separator to separate adjacent sentences between them. We use transformer to generate pre-
trained embeddings as in (1):
                                    𝑇𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑒𝑟 T = b             ,b ,b ,⋅⋅⋅,b ,b                         (1)

    Whereb𝑖𝜖𝑅 , and 𝑑1 is the embedding dimension.
    b) POS Tag: Part of speech is the important information a word carries, which reflects the components
that the word plays. Dependency syntactic parse in natural language processing is the conversion of
sentences into trees based on POS tags. Xu et al.[18] proposed an LSTM-based relation extraction
method based on the shortest path of the syntactic dependency analysis tree, incorporating features such
as word vector, part-of-speech, WordNet, and syntax, and using maximum pooling layer, softmax layer,
etc. for relation classification. The addition of POS ultimately achieves the goal of effect enhancement.
So we take POS tags into consideration. We generate part of speech tags for the input sentences and
assign the POS tag of the parent word to each child word tag it generates. We use a directed embedding
matrix to generate a embedding sequence 𝑃 = { 𝐶𝐿𝑆 , 𝑝 , 𝑝 , … , 𝑝 , 𝑆𝐸𝑃 }for each positional tag of
dimension d2. The BERT embedding of the token and the lexical POS embedding are then stitched
together to obtain a new vector representation of dimension. We use a directed embedding matrix to
generate embeddings for each positional tag of dimension d2.
                                              𝐶= 𝑐        ,𝑐 ,𝑐 ,⋅⋅⋅,𝑐 ,𝑐                                 (2)
  where the output embedding is a combination of the above two vectors. The dimension of the output
embedding is 𝑑 + 𝑑 .
3.3     Span Classification
   To detect entities, a vector is obtained by doing max-pooling of the embedding
representation 𝑐 ,⋅⋅⋅,𝑐 of a sequence of tokens of length k of each successive possible constituent
entity.
                                        𝑉 𝑠 = max𝑝𝑜𝑜𝑙 𝑐 ,⋅⋅⋅,𝑐              𝜖𝑅                            (3)
     where𝑐𝑖𝜖𝑅         , and 𝑑1+ 𝑑2 is the embedding dimension.
    In order to study the influence of the length of entity span, we counted the lengths of all entity spans,
as shown in Table 3. Overall, the percentage of entity span lengths on the interval (1-3) is more than half,
reaching 58.7%, and the data on the intervals (4-6) (7-10) and (>10) show that the possibility of entity
span becoming an entity is inversely proportional to the length. In terms of entity type breakdown, there
is also a large difference in the proportion of different entity types in each interval, for example, Generic
type has 96.9% on the (1-3) interval, which can be almost considered as Generic type only in the (1-3)
interval, while Method and Task types are relatively evenly distributed. Thus entity span length is an
important feature in entity classification. We train a specific width embedding matrix 𝑊 𝜖𝑅 to obtain
an embedding for a span of length k.
                                          𝑉′ 𝑠 = 𝑉 𝑠 ∘ 𝑊 𝜖𝑅                                               (4)
   where𝑊 𝜖𝑅 , and 𝑑1+ 𝑑2+ 𝑑3 is the embedding dimension.
   Finally,[CLS] , which represents the sentence context, is concatenated with 𝑉′ 𝑠 to obtain the vector
𝑉″ 𝑠 , which is passed through a softmax classifier to predict the entity type.
                                       𝑉″ 𝑠 = 𝑉′ 𝑠 ∘ 𝑏 𝐶𝐿𝑆 𝜖𝑅                                             (5)

                                            𝑒 𝑠 = 𝑊 ⋅ 𝑉″ 𝑠 + 𝑏𝜖𝑅                                          (6)

                                                     64
   Where 𝑏 𝐶𝐿𝑆 𝜖𝑅 , 2𝑑1+ 𝑑2+ 𝑑3 is the embedding dimension. And 𝑑 + 𝑑 ,“+1” is due to the
‘null’ entity 𝜙 that denotes the absence of an entity.
3.4     Relation classification
   Those spans that are classified as 𝜙 by the entity classifier are filtered out. For the remaining spans,
the task is to identify the relation between every pair of them. Consider a pair of spans 𝑠 ,𝑠 where 𝑠1
occurs before 𝑠2 in the input sentence. We assume relations to be asymmetric, so the relation directed
from 𝑠1 to 𝑠2 may be different from that directed from 𝑠2 to 𝑠1, and each of them must be separately
classified. We take the representations, 𝑐 ,⋅⋅⋅,𝑐 , where c𝑖 is the embedding of the first token following
𝑠1 and c𝑗 is that of the last token preceding 𝑠2 in the sentence, and max-pooling:
                                 𝑐 𝑠 ,𝑠    = max𝑝𝑜𝑜𝑙 𝑐 ,⋅⋅⋅,𝑐 𝜖𝑅                                          (7)

   Where 𝑐𝑖𝜖𝑅         , and 𝑑1+ 𝑑2 is the embedding dimension.The candidate relations from span 𝑠1 to
𝑠2 and s2 to s1 are separately encoded as in (8),(9):
                                  𝑅    = 𝑉′ 𝑠   ∘ 𝑐 𝑠 ,𝑠   ∘ 𝑉′ 𝑠 𝜖𝑅                                      (8)

                                 𝑅     = 𝑉′ 𝑠   ∘ 𝑐 𝑠 ,𝑠   ∘ 𝑉′ 𝑠 𝜖𝑅                                      (9)
   The results are passed through a simple classifier with a confidence interval 𝛼 (results beyond 𝛼 are
considered to have this relationship) and a sigmoid activation function to predict the type of relation. We
have tried to include logical information and distance between candidate entities to predict entity type.
We train a specific width embedding matrix 𝑊 𝜖𝑅 to obtain an embedding for the distance between
candidate entities.
                                                𝑅 ′=𝑅       ∘𝑒 𝑠   ∘ 𝑒 𝑠 𝜖𝑅                             (10)

                                                𝑅 ′=𝑅       ∘𝑒 𝑠   ∘ 𝑒 𝑠 𝜖𝑅                             (11)

                                                𝑅 ″ = 𝑅 ′ ∘ 𝑊 𝜖𝑅                                        (12)

                                                𝑅 ″ = 𝑅 ′ ∘ 𝑊 𝜖𝑅                                        (13)

                                                𝑦 = 𝜎 𝑊⋅𝑅 ′+𝑏                                           (14)
   The loss function of the joint model is the sum of the cross-entropy losses of the entity classifier and
the relational classifier. End-to-end training is performed by back-propagation, and the transformer is
fine-tuned during the training process. To train the entity classifier, we used real entity spans as positive
samples and added some non-entity spans as negative samples.
4. Experiments And Results
4.1 Datasets
  Our goal is to extract method entities and relations from the scientific literature, so we evaluated our
model on SciERC, a dataset from the scientific literature that contains both entities

Table 3. Length of tokenized entities. Ave[>10]*4 means the average percentage of every four lengths
longer than 10.
                                                          Length of tokenized entities
              Type
                                       [1-3]         [4-6]          [7-10]          [>10]      Ave[>10]*4
             Task                     47.0%         31.9%           12.9%           8.3%          1.8%
           Method                     44.1%         36.0%           12.7%           7.3%          1.4%
            Metric                    72.3%         20.3%            4.3%           3.0%          1.7%
           Material                   52.6%         33.8%            9.4%           4.2%          1.9%
      OtherScientificTerm             55.7%         32.8%            8.7%           2.8%          0.7%
           Generic                    96.9%          2.0%            1.1%           0.0%          0.0%
              All                     58.7%         27.8%            9.0%           4.5%          0.6%


                                                      65
Table 4. Performance on SciERC.
                            NER                       Boundaries RE                       Strict RE
    Model
                  P(%)       R          F1        P        R         F1          P            R        F1
  SCIIE[17]       67.2      61.5       64.2      47.6     33.5      39.2         -            -         -
  PURE[31]          -        -         66.6       -        -        48.2         -            -       35.6
  DYGIE[32]       68.6      67.8       68.2      46.2     38.5      42.0         -            -         -
  DYGIE+[33]        -        -         67.5       -        -        48.4         -            -         -
  PFN[34]         64.8      69.0       66.8       -        -          -         40.6        36.5      38.4
  SPERT[35]       70.9      69.8       70.3      53.4     48.5      50.8        40.5        36.8      38.6
  Ours            70.0      71.4       70.7      52.6     51.5      52.1        41.7        39.9      40.8

Table 5. Ablation study on SciERC. SpanL means the length of token span, Dist means the distance
between candidate entities.
  Model        NER                            Boundaries RE                   Strict RE
  （Sci）         P(%)       R          F1        P          R          F1         P          R          F1
    Ours        70.0      71.4       70.7      52.6      51.5        52.1      41.7        39.9       40.8
  -SpanL        69.5      71.0       70.3      51.3      51.5        51.4      40.0        40.3       40.1
   -Type        69.4      70.5       70.0      51.3      50.0        50.7      39.4        38.6       39.0
    -Dist       69.5      71.5       70.5      51.6      50.8        50.9      40.2        39.3       39.8

   and relations.
   The SciERC dataset is constructed from 500 abstracts of papers in the field of artificial intelligence,
with a total of 2687 sentences. It contains six scientific entities as well as seven relations. The six
scientific entities are Task, Method, Metric, Material, Other-Scientific-Term, and Generic, while the
seven methods are Compare, Conjunction, Evaluate-For, Used-For, Feature-of, Part-of, and
Hyponym-of. We follow the official cutoff methods: train (1861), dev (275), and test (551).
4.2 Evaluation Metrics
   We use the standard Precision (P)(15), Recall (R)(16), and F1-score(17) to evaluate the model
performance.
                                                   𝑃=                                                   (15)

                                                   𝑅=                                                   (16)

                                                          ∗ ∗
                                                   𝐹1 =                                                 (17)

   where 𝑇𝑃, 𝐹𝑃 and 𝐹𝑁 stand for true positive, false positive, and false negative, respectively.
4.3 Results
    Performance on SciERC. We report the performance on the SciERC dataset in Table 4. We compare
the experimental results for six different models, and the F1 values for all three tasks improve compared
to the baseline SPERT. Compared to the 0.36% F1 value improvement for the NER task, the performance
improvement for the RE task is relatively significant, with Boundaries RE and Strict RE improving by
1.26% and 2.23%, respectively.
4.4 Ablation study
    The ablation study in Table 5 shows the effect of removing entity span length, entity type, and relation
distance on the final classification score. In the ablation experiments, we take the average of the three
best results out of 20 experiments. Intending to compare the upper limits of the effect of feature. We
observe that removing the entity span length decreases the F1 value of NER by 0.4% and has an impact
on the subsequent RE tasks as well. Removing the entity type feature reduced the F1 scores of boundaries
RE and strict RE by 1.3% and 1.8%, respectively, and removing the relation distance reduced the F1
scores of boundaries RE and strict RE by 1.16% and 1.0%, respectively, so we can conclude that entity


                                                    66
type has a significant effect on relation extraction, especially on strict RE, while as the similar distance
feature, the improvement of relationship distance for RE is significantly higher than that of entity span
for NER.
5. Conclusion
    We propose a semantically enhanced deep-learning model for extracting entities and relations from
the scientific literature. We explored the impact of linguistic information on entity and relation extraction
tasks, for which we added additional lexical information to the word vectors obtained through the pre-
trained model to highlight lexical information, which proved to be superior to the pre-trained word
vectors alone. Individually, in the entity recognition part, the length of the entity's token sequence is
considered as a feature, while in the relation extraction part, entity type and relation distance are added,
which also improves the accuracy of the task. In the future, extending the in-sentence feature information
to inter-sentence contextual information is a promising challenge.
6. Funding Statement:
   This research was supported by the Young Scientists Fund of the National Natural Science
Foundation of China under Grant[71904078], Natural Science Foundation of Jiangsu Province of China
under Grant[BK20190793], the Postgraduate Research and Practice Innovation Program of Jiangsu
Province of China under Grant[GJFXW21001].
7. References
[1] Jinha A E. Article 50 million: an estimate of the number of scholarly articles in existence[J].
     Learned publishing, 2010, 23(3): 258-263.
[2] Hassan S U, Safder I, Akram A, et al. A novel machine-learning approach to measuring scientific
     knowledge flows using citation context analysis[J]. Scientometrics, 2018, 116(2): 973-996.
[3] Xie I, Babu R, Lee T H, et al. Enhancing usability of digital libraries: Designing help features to
     support blind and visually impaired users[J]. Information Processing & Management, 2020, 57(3):
     102110.
[4] Ding Y, Stirling K. Data-driven discovery: A new era of exploiting the literature and data[J].
     Journal of Data and Information Science, 2016, 1(4): 1-9.
[5] Bhatia S , Mitra P , Giles C L . Finding algorithms in scientific articles[C]// Proceedings of the
     19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA,
     April 26-30, 2010. ACM, 2010.
[6] Katsurai M, Joo S. Adoption of Data Mining Methods in the Discipline of Library and Information
     Science[J]. Journal of Library & Information Studies, 2021, 19(1).
[7] Lam C, Lai F C, Wang C H, et al. Text mining of journal articles for sleep disorder terminologies[J].
     PloS one, 2016, 11(5): e0156031.
[8] Li K, Yan E. Co-mention network of R packages: Scientific impact and clustering structure[J].
     Journal of Informetrics, 2018, 12(1): 87-100.
[9] Wang Y, Zhang C. Finding more methodological entities from academic articles via iterative
     strategy: A preliminary study[J]. training, 2019, 2787: 2.73.
[10] Zhu G, Yu Z, Li J. Discovering Relationships between Data Structures and Algorithms[J]. J. Softw.,
     2013, 8(7): 1726-1735.
[11] Duck G, Nenadic G, Brass A, et al. bioNerDS: exploring bioinformatics’ database and software
     use through literature mining[J]. BMC bioinformatics, 2013, 14(1): 1-13.
[12] Petasis G, Cucchiarelli A, Velardi P, et al. Automatic adaptation of Proper Noun Dictionaries
     through cooperation of machine learning and probabilistic methods[C]//Proceedings of the 23rd
     annual international ACM SIGIR conference on Research and development in information
     retrieval. 2000: 128-135.
[13] Alfonseca E, Manandhar S. An unsupervised method for general named entity recognition and
     automated concept discovery[C]//Proceedings of the 1st international conference on general
     WordNet, Mysore, India. 2002: 34-43.
[14] Borrega O, Taulé M, Martı M A. What do we mean when we speak about Named
     Entities[C]//Proceedings of Corpus Linguistics. 2007.


                                                     67
[15] Zeng Daojian, Liu Kang, Lai Siwei, et al．Relation classification via convolutional deep neural
     network [C]∕∕Proc of the 25th Int Conf on Computational Linguistics．Stroudsburg: ACL,2014
[16] Socher R , Huval B , Manning C D , et al. Semantic Compositionality through Recursive Matrix-
     Vector Spaces[C]// Joint Conference on Empirical Methods in Natural Language Processing &
     Computational Natural Language Learning. 2012.
[17] Luan Y, He L, Ostendorf M, et al. Multi-task identification of entities, relations, and coreference
     for scientific knowledge graph construction[J]. arXiv preprint arXiv:1808.09602, 2018.
[18] Xu K , Feng Y , Huang S , et al. Semantic Relation Classification via Convolutional Neural
     Networks with Simple Negative Sampling[J]. Computer Science, 2015, 71(7):941-9.
[19] Schlichtkrull M , Kipf T N , Bloem P , et al. Modeling Relational Data with Graph Convolutional
     Networks[J]. 2017.
[20] Zhang Y , Guo Z , Lu W . Attention Guided Graph Convolutional Networks for Relation
     Extraction[J]. 2019.
[21] Yu X , Lam W . Jointly identifying entities and extracting relations in encyclopedia text via a
     graphical model approach. 2010.
[22] Qi Li , Ji H . Incremental Joint Extraction of Entity Mentions and Relations[C]// Meeting of the
     Association for Computational Linguistics. 2014.
[23] Miwa M, Sasaki Y. Modeling joint entity and relation extraction with table
     representation[C]//Proceedings of the 2014 conference on empirical methods in natural language
     processing (EMNLP). 2014: 1858-1869.
[24] Gupta A, Eral H B, Hatton T A, et al. Nanoemulsions: formation, properties and applications[J].
     Soft matter, 2016, 12(11): 2826-2841.
[25] Katiyar A, Cardie C. Going out on a limb: Joint extraction of entity mentions and relations without
     dependency trees[C]//Proceedings of the 55th Annual Meeting of the Association for
     Computational Linguistics (Volume 1: Long Papers). 2017: 917-928.
[26] Shen X, Tang H, McDanal C, et al. SARS-CoV-2 variant B. 1.1. 7 is susceptible to neutralizing
     antibodies elicited by ancestral spike vaccines[J]. Cell host & microbe, 2021, 29(4): 529-539. e3.
[27] Wu Q, Zeng Y, Zhang R. Joint trajectory and communication design for multi-UAV enabled
     wireless networks[J]. IEEE Transactions on Wireless Communications, 2018, 17(3): 2109-2121.
[28] Wei Z , Su J , WaNg Y , et al. A Novel Cascade Binary Tagging Framework for Relational Triple
     Extraction[C]// Proceedings of the 58th Annual Meeting of the Association for Computational
     Linguistics. 2020.
[29] Beltagy I, Lo K, Cohan A. SciBERT: A pretrained language model for scientific text[J]. arXiv
     preprint arXiv:1903.10676, 2019.
[30] Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model
     for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
[31] Zhong Z, Chen D. A frustratingly easy approach for entity and relation extraction[J]. arXiv preprint
     arXiv:2010.12812, 2020.
[32] Luan Y, Wadden D, He L, et al. A general framework for information extraction using dynamic
     span graphs[J]. arXiv preprint arXiv:1904.03296, 2019.
[33] Wadden D, Wennberg U, Luan Y, et al. Entity, relation, and event extraction with contextualized
     span representations[J]. arXiv preprint arXiv:1909.03546, 2019.
[34] Yan Z, Zhang C, Fu J, et al. A partition filter network for joint entity and relation extraction[J].
     arXiv preprint arXiv:2108.12202, 2021.
[35] Eberts M, Ulges A. Span-based joint entity and relation extraction with transformer pre-training[J].
     arXiv preprint arXiv:1909.07755, 2019.


                                                   68