<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sentiment Classification of Scientific Citation Based on Modified BERT Attention by Sentiment Dictionary</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dahai Yu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bolin Hua</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>. Recent Work</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Management, Peking University</institution>
          ,
          <addr-line>Beijing, 100871</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Citation analysis methods mainly focus on the quantitative indicators, such as the cited number and the H-index, while ignoring the deeper information such as citation function and citation sentiment. Therefore, studying and analyzing the functions and sentiments of citations can more efectively evaluate an article and uncover its underlying information. As for data, this study investigated the existing dataset of citation sentiment classification (CSC), collected and organized a high-quality and available dataset. As for model, based on the pre-trained language model BERT and its variants, a model called DictSentiBERT is proposed to modify attention mechanism using sentiment dictionary, and a series of baseline models are designed for comparative experiments. The experimental results show that compared to the original BERT and baseline models such as RNN and TextCNN, the DictSentiBERT improves the accuracy of CSC and maintains the highest Macro-F1 score.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;sentiment classification</kwd>
        <kwd>informetrics</kwd>
        <kwd>NLP</kwd>
        <kwd>BERT</kwd>
        <kwd>pre-trained language model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Authors of academic articles establish various relations
between diferent papers by citing concepts, methods,
conclusions, and experimental processes to support their
work, or introducing their own work by pointing out
shortcomings in previous works. Therefore, studying
these relations is of great significance for exploring
implicit information or evaluating the quality and influence
of papers. The analysis and mining of citation behavior
can help reveal knowledge structures, research hotspots,
research trends, and academic exchange networks within
the research field.</p>
      <p>
        The demand and function of citation analysis in
academic community are gradually increasing, and citation
analysis is no longer just to evaluate the academic value
of research results. However, the traditional citation
analysis methods mainly focus on the quantitative indicators,
such as the cited number and the H-index, ignoring the
deeper information of citation function and citation
emotion. The work of Radicchi Filippo[
        <xref ref-type="bibr" rid="ref1 ref17">1</xref>
        ] and Baird L M[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
further demonstrates the limitations of the cited
number, such as the fact that flawed or controversial paper
tends to receive higher citations, while the cited number
cannot reflect this information. Therefore, studying and
analyzing the functions and sentiments of citations can
more efectively evaluate an article and uncover its
underlying information. Plus, researchers need to review
and analyze existing papers to understand the current
research status and development trends. Scientific
citation sentiment classification can help researchers better
understand others’ attitudes and perspectives towards
specific research fields, which helps to determine the
quality and reliability of existing research, as well as evaluate
research trends in the field. Citation sentiment refers to
the author’s emotional attitude towards the cited paper,
such as approval, opposition, or neutrality. Citation
sentiment analysis reveals this emotional attitude through
various methods, such as SVM, Naive Bayes, TextCNN,
BERT, etc. The dataset, code and logs have been uploaded
to GitHub1.
Research on the sentiments classification of text gradually
emerged and increased significantly after 2009. Product
review, social media conversations, news, and blogs are
the most concerned fields[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. According to Yousif[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
et al.’s research, sentiments classification of scientific
citation first appeared around 2011.
      </p>
      <p>
        Sentiment dictionary, machine learning, and deep
learning are the three most common methods. Small[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
et al. used one to three sentences as citation contexts
to assist in analyzing citation emotions, in order to
understand the structure and potential cognitive processes
of the citation. He used a dataset composed of a large
number of prompt words or phrases to analyze in detail
the functions and sentiments of 20 papers. Athar[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
classified citations into three categories: Positive, Negative
and Neutral using SVM classifiers with diferent citation
      </p>
      <sec id="sec-1-1">
        <title>1https://github.com/UFOdestiny/DictSentiBERT</title>
        <p>
          sentiment detection features, and constructed a corpus pre-trained model methods guided by prior knowledge
containing 8736 instances. Poria[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] et al. proposed using in CSC. Using a sentiment dictionary to annotate the
CNN to extract features from multi-modal content and emotional intensity of each word in a sentence and
adproviding these features to a multi-core learning classi- just the attention matrix accordingly, the DictSentiBERT
ifer for sentiment detection, which also achieved good is introduced to combine the advantages of emotional
results on diferent datasets. knowledge and pre-trained models to improve
classifica
        </p>
        <p>
          The method of pre-trained models is gradually becom- tion performance without requiring a large amount of
ing popular. Beltagy[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] et al. used a large scientific cor- additional annotated data.
pus including a total of 1.14 million scientific papers of
the biomedical (82%) and computer science (12%), rather
than a general corpus to pre-train BERT. To some extent, 3. Data
the SCIBERT is more suitable for NLP tasks of scientific
papers, significantly improving the efect of classifying The processing of data includes three stages: Source and
scientific citations. Supplement, Preprocessing and Manual Screening, as
        </p>
        <p>
          This study focuses on integrating prior knowledge into shown in Figure 1. This study firstly investigated the
expre-trained models. Tingyu Xia[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] et al. found through isting datasets through checking academic papers, search
analysis that the first layer of BERT has the worst abil- engines, etc. It was found that the existing publicly
availity to capture semantic similarity and lacks synonym able datasets have neither good quality nor good quantity.
information. Therefore, the author directly guided the Therefore, we selected two datasets with relatively higher
attention of the first layer of BERT through prior knowl- quality and better usability. Then, we used SCICite to
edge. This method improves the performance of seman- supplement the citation sentiments corpus proposed by
tic matching, especially in small data. Weijie Liu[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] et Athar. Afterwards, the dataset is subjected to a series of
al. proposed an article that applied knowledge graph to processing steps, including data deduplication and so on.
BERT and created K-BERT to solve the problem of poor Finally, we manually filtered these data.
performance of BERT in professional fields, and solved
the two major problems of heterogeneous embedding 1. Source and Supplement
space and knowledge noise in one fell swoop. corpus, Athar
        </p>
        <p>
          In summary, these studies have made outstanding con- research collect supplement complete corpus
tributions in the field of CSC. The method of sentiment SciCite
dictionary is relatively simple, but it is limited by the 2. Preprocessing
cquulatlittoyaadnadptcotovecroangsetaonfttlhyechdaicntgioinngartyh, emmaeksi.nMgeitthdoifi-ds idnruespmtalioncvcaeetes cionrnesfmtalioncvctieensg inrepmaroevnethteexsets srsyepmmecboiovalels Rmveiamslsuoienvsge
of machine learning can achieve high accuracy, which, 3. Manual Screening
hmoawneuvaelrs,erleelcytihoenavainlyd othnefyeamtuarye feancgeincehearlilneng,greesqiunireinfi-g iinmrespmtaroonpvceeesr Ininrcesomtmaonpvcleeeste wrietpmhaiormtvitepiortoenpxetr relablealbwelrong
ciency and generalization when processing large-scale
data. Deep learning methods perform well, but their ap- corpus
plication may be limited for tasks that lack large-scale
annotated data. The pre-trained model does not require Figure 1: Flow Chart of Data Processing
large-scale data, but if there is a significant diference
between the trained corpus and the task corpus, its
effectiveness will also be greatly reduced. Integrating the
prior knowledge into the task of CSC can further improve 3.1. Source and Supplement
the efectiveness. Among them, integrating knowledge After conducting detailed research, it was found that
graphs or constructing domain ontologies with BERT although there are many studies on CSC, such as the
can achieve better results. However, there are also some dataset collected by Xu[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] et al., the dataset annotated
problems: building and maintaining knowledge graphs by Budi[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] et al., the dataset studied by Yaniasih[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] et
requires a large amount of domain expert knowledge and al., or the emotional citation corpus proposed by Athar,
data, resulting in higher maintenance costs. In addition, these datasets are either not publicly available or have
the updating process of the knowledge graph is relatively terrible quality. This may be due to the lack of unified and
complex and time-consuming, so it may not be able to standardized annotation for data collection and labeling
adapt to new fields or topics in a timely manner, limiting of scientific citation texts, making it dificult to achieve
the model’s adaptability to constantly changing text data. automation, or it may be due to lack of research in this
        </p>
        <p>Based on the aforementioned research’s shortcomings ifeld.
and gaps, this study aims to explore the application of</p>
        <p>It’s natural to think about transfer learning because
obtaining data of movie reviews, social media reviews, or
e-commerce reviews is simple, direct and easy. But this
is problematic beacuse there are significant diferences
in language style, purpose, structure, etc. between the
texts of scientific papers and those of film reviews.</p>
        <p>1) Scientific papers usually adopt formal,
professional and objective language style, and try to
avoid subjective and emotional expression. Film
reviews, on the other hand, place more emphasis
on emotional expression and personal subjective
opinions.
2) Scientific papers usually adopt standard
structured forms,while film reviews are more liberal
and typically include content such as movie
introductions, personal impressions, and ratings,
without a fixed structure and format.
3) The subject range of scientific papers and film
reviews is also diferent. Scientific papers usually
involve various professional fields in the academic
ifeld, including biology, chemistry, physics, etc.,
while film reviews mainly involve film, television
industry and related topics.</p>
        <p>
          To sum up, the diferences between scientific papers
and film reviews are multifaceted, involving the purpose,
mode, intensity, object and audience of emotional
expression. So the use of Transfer learning is not efective. Due
to the lack of other solutions, this study still insists on
using the dataset2 proposed by Athar[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This corpus
contains 8736 pieces of data, with each citation manually
annotated as positive, negative, or neutral based on
emotions. These citation sentences have been extracted from
the ACL Anthology Network corpus.
        </p>
        <p>
          In order to further improve the accuracy of training
at the content level, after conducting comprehensive
research on multiple publicly available datasets, we
consider using the SCICite dataset proposed by Arman[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
et al. for data supplementation. SCICite contains a
training set of approximately 10000 citation sentences and
a testing set of approximately 1000 sentences, which
are divided into three categories in terms of intention:
method, background, and result. This dataset also
provides another classification scheme: supportive and not
supportive and this scheme fits this task very well. As a
result, we extracted approximately 1000 sentences from
SCICite to supplement the corpus proposed by Athar (not
every sentence has that classification scheme).
may be caused by unclear division of labor for manual
annotation. So, we clean the dataset and do some
preprocessing according to their work.
        </p>
        <p>1) Remove missing values.
2) Remove instances with same text but diferent</p>
        <p>
          labels.
3) Remove instances with duplicate text and labels.
4) Remove text within parentheses by regular
expressions because the content are unrelated to
sentiment analysis, such as "The two systems we
use are ENGCG (Karlsson et al., 1994)"
5) Remove various types of special symbols, only
retaining English text and numbers. Actually
symbols can also provide some emotional
information, such as question marks and exclamation
marks. In addition, some network symbols may
also reflect emotions. However, BERT seems not
sensitive to punctuation and other symbols
according to Adam’s[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] work and the information
carried by sybmols is also not very evident in this
dataset. Therefore, we decided to exclude special
symbols from the whole process.
        </p>
        <sec id="sec-1-1-1">
          <title>3.3. Manual Screening</title>
          <p>Due to low quality of the dataset, some obvious
problematic data were still discovered after preprocessing, which
is as lised in Tabel 1.</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>For classification tasks, the accuracy of machine learn</title>
        <p>
          ing models depends on the quality of training data.
Therefore, we attempts to maximize data quality through
manual review and screening. For the above questions (1), (2)
3.2. Preprocessing and (3), the original data will be directly deleted, and for
the question (4), those sentences will be re-labeled. To
The study by Mercier[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] et al. indicates that the dataset be precise, there are around 134 sentences with wrong
contains many duplicate instances, incorrect data seg- label and I re-labelled them all by by self. Finally, the
mentation, and poor quality of label consistency, which compiled dataset consists of 7912 sentences, including
2https://cl.awaisathar.com/citation-sentiment-corpus/ 1237 positive, 347 negative, and 6328 neutral.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Model Design</title>
      <p>there are negative or positive intensity, they are added
together, and 1 is added to obtain the final score. For
The idea of DictSentiBERT is to integrate the prior in- example, if the word “book” does not have polarity, then
formation of sentiment dictionary into the BERT, adjust the weight is assigned to 1. While the word “good” has
attention mechanisms to better capture and understand a positive intensity of 0.5 and a negative intensity of 0,
emotional information of scientific citations, and achieve resulting in a final weight of 1.5.
higher accuracy in the classification. As shown in Figure
2, the model adopts the following architecture, including 4.2. BERT Layer
input layer, BERT layer, modified attention layer, and
output layer.</p>
      <p>The BERT layer consists of two main structures:
embedding and encoder. The input vector is composed of three
diferent embedding, namely wordpiece embedding,
position embedding, and segment embedding. The encoder
of a transformer consists of a multi-head attention layer,
a regularization layer and a forward propagation layer. In
the standard BERT model, there are 12 layers of encoders
and the word vector dimension is 768. In this study we
use the vanilla base-bert.</p>
      <sec id="sec-2-1">
        <title>4.3. Modified Attention Layer</title>
        <p>Due to the varying importance of vocabulary and feature
weights in the text, attention mechanism is introduced to
learn the dependency relationships between vocabulary
and pay special attention to the important vocabulary.
Therefore, the accuracy of classification can be further
improved by assigning diferent weights to focus on
important parts of the context and the specific calculation
formula is as follows:</p>
      </sec>
      <sec id="sec-2-2">
        <title>4.1. Input Layer</title>
        <p>In the input layer, coeficient for adjusting attention
weights is calculated in advance by SentiWordNet and
pos_tag. SentiWordNet is a dictionary used for sentiment
analysis, which assigns an emotional intensity score to
the three dimensions of positivity, negativity, and
objectivity for each word in WordNet. However, the dictionary
itself does not have the ability to handle polysemy, so we
introduce NLTK’s tagging tool: pos_tag. Firstly, we use
BERT’s tokenizer for word segmentation, converting the
sentences into the standard form of BERT input. Next,
we annotate each word and assign weight to it with
SentiWordNet according to its part of speech. If there is only
neutral intensity score, the weight is assigned to 1. If

(, ,  ) =  ( √

)</p>
        <p>Where  is the obtained attention weight matrix and
the process of  ⊙  is as follows:
 ⊙</p>
        <p>⎡11
= ⎢
⎣
. . . 1 ⎤
. . . ... ⎥⎦</p>
        <p>⎡1
⊙ ⎢⎣
. . . ⎤
. . .</p>
        <p>... ⎥⎦
1  × 1  ×
(5)</p>
        <p>As for a sequence 1 . . . , the wights of them is
1 . . .  calculated by SentiWordNet in the input layer.
The other lines of wights matrix equal the first line.
 =  +  

(, ,  ) =  ( √
)</p>
        <p>In this step the obtained weights matrix is applied to
the original attention matrix. DictSentiBERT processes
the input sentences and calculates attention scores as
follows:
 =  ⊙  +  
(1)
(2)
(3)
(4)</p>
      </sec>
      <sec id="sec-2-3">
        <title>4.4. Output Layer</title>
        <p>The output layer is a fully connected layer that connects
and transforms the outputs of the model, and uses the
softmax function to calculate the probability score for
each category. The final output is the label of input,
including neutral, positive, and negative. Some examples
are listed in the appendix.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Experiment</title>
      <sec id="sec-3-1">
        <title>5.1. Baseline</title>
        <p>Two basic pre-trained models: BERT and SCIBERT are
used. On this basis, FeedForward NN (FNN), LSTM,
TextCNN, Self-Attention and DictSensiBERT proposed
in this paper are designed for experiments.</p>
      </sec>
      <sec id="sec-3-2">
        <title>5.2. Arguments</title>
        <p>The code was written with PyTorch v1.10 in Python v3.7.
And the model was trained on a 16GB RTX A4000 for 50
epochs each with 80% training set and 20% test set. The
batch size was set to 32, the learning rate was 5e-6. The
AdamW optimizer with a warm-up rate of 0.1 and the
cross-entropy loss function were used for optimization.</p>
      </sec>
      <sec id="sec-3-3">
        <title>5.3. Results</title>
        <sec id="sec-3-3-1">
          <title>As shown in Table 2, The average accuracy of native</title>
          <p>BERT is 91.23%, with an average Macro-F1 score of 75%.
SCIBERT performs better, with an average accuracy of
94.80% and an average Macro-F1 score of 85%. This
indicates that SCIBERT trained in scientific texts is more
suitable for CSC. On the other hand, it can also be
observed that under the same basic pre-trained model, the
performance of DictSentiBERT has also been improved to
a certain extent, which proves that the pre-trained model
incorporating sentiment dictionary is more conducive to
extracting emotional information.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusion</title>
      <p>This study proposes DictSentiBERT, which adjusts
attention mechanism based on sentiment dictionary, and
applies it to sentiments classification of scientific
citation. We conducted research and organized a high-quality
CSC dataset, designed the DictSentiBERT model and a
series of baseline models for comparative experiments.
Results indicate that pre-trained models can efectively
classify sentiments of scientific citations, and SCIBERT
performs better than native BERT on this task.
Furthermore, DictSentiBERT can improve classification accuracy
while maintaining high Macor-F1 score. In summary,
this study provides a high-quality CSC dataset and a new
model for the sentiments classification of scientific
citations. However, this study still sufers from quantity
and quality of dataset and a larger corpus is needed to
make further improvement and experiment. In the future,
we can try to imitate the training process of SCIBERT,
collect large-scale scientific citation texts, and adjust the
MASK mechanism to focus on emotional words of MLM
tasks. Then, we can use the oficial tool set provided by
Google to train BERT from scratch. Alternatively, we
can try relying on syntax trees and other methods to
focus on the characteristics of sentiment analysis from
the perspectives of syntax, grammar, and morphology.
Finally, the latest GPT large model can also be combined
to use AIGC to modify and guide pre-trained models.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This research is supported by High-performance
Computing Platform of Peking University. The work is also
supported by "National Social Science Foundation of
China" Big Data-Driven Research on Semantic
Evaluation System of Scientific and Technological Literature
(Grant No.21&amp;ZD329).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Baird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Oppenheim</surname>
          </string-name>
          , Do citations matter?,
          <source>Journal of information Science</source>
          <volume>20</volume>
          (
          <year>1994</year>
          )
          <fpage>2</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Radicchi</surname>
          </string-name>
          ,
          <article-title>In science “there is no bad publicity”: Papers criticized in comments have high scientific impact</article-title>
          ,
          <source>Scientific reports 2</source>
          (
          <year>2012</year>
          )
          <fpage>815</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Piryani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Madhavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. K.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Analytical mapping of opinion mining and sentiment analysis research during 2000-2015</article-title>
          ,
          <string-name>
            <given-names>Information</given-names>
            <surname>Processing</surname>
          </string-name>
          &amp;
          <source>Management</source>
          <volume>53</volume>
          (
          <year>2017</year>
          )
          <fpage>122</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yousif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Niu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. K.</given-names>
            <surname>Tarus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <article-title>A survey on sentiment analysis of scientific citations</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>52</volume>
          (
          <year>2019</year>
          )
          <fpage>1805</fpage>
          -
          <lpage>1838</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Small</surname>
          </string-name>
          ,
          <article-title>Interpreting maps of science using citation context sentiments: A preliminary investigation</article-title>
          ,
          <source>Scientometrics</source>
          <volume>87</volume>
          (
          <year>2011</year>
          )
          <fpage>373</fpage>
          -
          <lpage>388</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Athar</surname>
          </string-name>
          ,
          <article-title>Sentiment analysis of citations using sentence structure-based features</article-title>
          ,
          <source>in: Proceedings of the ACL 2011 student session</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>87</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Poria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Chaturvedi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cambria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hussain</surname>
          </string-name>
          ,
          <article-title>Convolutional mkl based multimodal emotion recognition and sentiment analysis</article-title>
          ,
          <source>in: 2016 IEEE 16th international conference on data mining (ICDM)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>439</fpage>
          -
          <lpage>448</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <article-title>Scibert: A pretrained language model for scientific text</article-title>
          , arXiv preprint arXiv:
          <year>1903</year>
          .
          <volume>10676</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Using prior knowledge to guide bert's attention in semantic textual matching tasks</article-title>
          ,
          <source>in: Proceedings of the Web Conference</source>
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>2466</fpage>
          -
          <lpage>2475</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>K-bert: Enabling language representation with knowledge graph</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>34</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>2901</fpage>
          -
          <lpage>2908</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Citation sentiment analysis in clinical trial papers, in: AMIA annual symposium proceedings</article-title>
          , volume
          <volume>2015</volume>
          , American Medical Informatics Association,
          <year>2015</year>
          , p.
          <fpage>1334</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I.</given-names>
            <surname>Budi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yaniasih</surname>
          </string-name>
          ,
          <article-title>Understanding the meanings of citations using sentiment, role, and citation function classifications</article-title>
          ,
          <source>Scientometrics</source>
          <volume>128</volume>
          (
          <year>2023</year>
          )
          <fpage>735</fpage>
          -
          <lpage>759</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yaniasih</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Budi</surname>
          </string-name>
          ,
          <article-title>Systematic design and evaluation of a citation function classification scheme in indonesian journals</article-title>
          ,
          <source>Publications</source>
          <volume>9</volume>
          (
          <year>2021</year>
          )
          <fpage>27</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ammar</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Van Zuylen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Cady</surname>
          </string-name>
          ,
          <article-title>Structural scafolds for citation intent classification in scientific publications</article-title>
          , arXiv preprint arXiv:
          <year>1904</year>
          .
          <volume>01608</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mercier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T. R.</given-names>
            <surname>Rizvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Rajashekar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dengel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <article-title>Impactcite: an xlnet-based method for citation impact analysis</article-title>
          ,
          <source>arXiv preprint arXiv:2005</source>
          .
          <volume>06611</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Bernardy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chatzikyriakidis</surname>
          </string-name>
          ,
          <article-title>How does punctuation afect neural models in natural language inference</article-title>
          ,
          <source>in: Proceedings of the Probability and Meaning Conference (PaM</source>
          <year>2020</year>
          ),
          <year>2020</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>1) Input: The resulting net increase in ATF4 and CHOP is significantly less than that observed with a bona fide ER stress inducer, such as TG. Output: 0 (Negative) 2) Input: While this method is known to be generally reliable, there are some questions about the representativeness of the data used</article-title>
          .
          <source>Output: 0</source>
          (
          <issue>Negative</issue>
          )
          <article-title>3) Input: Translation performance was measured using the BLEU score, which measures n-gram overlap with a reference translation. Output: 1 (Neutral) 4) Input: A totally diferent approach uses the idea of self-training described in the paper</article-title>
          .
          <source>Output: 1</source>
          (
          <issue>Neutral</issue>
          )
          <article-title>5) Input: This is an important feature from the MT viewpoint, since the decomposition into translation model and language model proved to be extremely useful in statistical MT. Output: 2 (Positive) 6) Input: From a strategic viewpoint, layered modular architectures have the competitive advantage, as well as the challenge, in being doubly distributed</article-title>
          .
          <source>Output: 2 (Positive)</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>