<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extraction of Sentences Describing Originality from Conclusion in Academic Papers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bolin Hua</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>YoungKug Shin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Management, Peking University</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Citation analysis-based strategies such as SCI, impact factor, and hindex reveals the influence of scientific papers, but it is difficult to demonstrate their originality. With the advancement of text mining technology and deep learning algorithms, it is feasible to extract the segment that illustrate originality (hereafter “originality points”) from the paper and compare it with the originality points in previous literatures so as to detect the originality of a certain focal paper. The extraction of originality points in the paper is the first step in judging the originality of the paper. On the basis of summarizing the writing rules of the conclusion part of the literature, this paper summarizes the expression of sentences describing originality(SDO) of the papers in the conclusion and forms a vocabulary of guiding words for SDO of the papers, and then uses the rules to identify and extract SDO of the papers. In the experiment, we download the full text of papers on artificial intelligence from arXiv for the experiment, and the recognition accuracy and recall rate are 83.3% and 72.2%, respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>academic literature</kwd>
        <kwd>originality point recognition</kwd>
        <kwd>originality feature words</kwd>
        <kwd>knowledge extraction</kwd>
        <kwd>originality evaluation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        For decades, scientometricians have proposed many sophisticated measurements to
characterize the impact of scientific publications, such as the number of citations of a
specific publication(Bornmann 2008) and the impact factor of the journal in which the
paper is published
        <xref ref-type="bibr" rid="ref4">(Garfield 1955)</xref>
        . Yet, it is oftentimes difficult to reflect the originality
and innovation of publications. Despite the fact that later science of science researchers
employed citing relations to estimate these
        <xref ref-type="bibr" rid="ref10 ref12">(e.g., Uzzi et al., 2013; Wang et al., 2020)</xref>
        ,
current practice mainly relies on peer review.
      </p>
      <p>Text mining techniques can be employed for evaluating the originality of a paper,
which requires much less time and human effort compared to peer reviewing. The
judgment of the originality of a paper includes subjective and objective reviews. Subjective
reviews may come from the authors themselves (i.e., self-evaluation) or other scholars:
The former is embodied in the description of originality and research conclusion of a
paper, while the latter is mainly distributed in the citing content of citations. According
to whether the cited literature appears in the reference or the main body of the citing
literature, Ding and colleagues (2013) defined the “count one” and “count x” indices.</p>
      <p>He (2010) presented a prototype system CiteSeerX which aims to build a context-aware
citation recommendation system to recommend a set of citations for a paper with high
quality.</p>
      <p>Although measurements such as the number of citations, impact factor, and h-index
have been introduced to reflect the influence/popularity of research papers, it is difficult
to reflect the originality. To detect the originality of a research work and a paper, the
current practice mainly relies on peer review. Peer reviews are subjective, and it is
difficult to handle the evaluation task for a considerable number of scientific papers. While
citation content analyses have been proposed to address this issue, most existing
practices have purely focused on the motivation and sentiment of citations instead of the
detection of the originality of a paper.</p>
      <p>In the current paper, we address this gap and aim at developing methods for the
automatic identification of SDO of a paper (“originality points”) in scientific
publications. Extraction of originality in a paper is the first step in judging the originality of
the paper. This paper uses the full-text data of arXiv for the experiment, and studies the
recognition and extraction of SDO of the papers in the conclusion part of scientific
publications,
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>The expression of originality in academic literatures is diverse, and originality may
appear in various parts of the research in different forms. Therefore, it is necessary to
identify and extract SDO in academic literatures. The current methods of extracting
information about originality in academic literatures can be divided into rule-based
methods and machine learning-based methods.
2.1</p>
      <sec id="sec-2-1">
        <title>Rule-based methods</title>
        <p>The core idea of rule-based methods is to analyze the language features of the
originality point, to select the feature items of the sentence for extraction, or to specify some
rules for extraction. Kirschner (2015) presents the results of an annotation study that
focused on the fine-grained analysis of argumentation structures in scientific
publications by specifying four types of binary argumentative relations between sentences.
Zhang et al. (2011) proposed a method of extracting sentence-level originality in the
field of scientific and technological literature based on the relationship between
domain-wise vocabulary and the ontology. Wen (2019) proposed a method of semantic
recognition and classification. Specifically, he divided the scientific and technological
abstracts into 6 categories according to syntax and semantic functions. Then he
performed statistical analysis of the distributions of categories and sentence positions,
sentence types, and sentence semantic positions. Li (2005) proposed an approach of
originality detection based on the identification of sentence-level patterns. Zhang (2011)
addressed the problem of multilingual sentence categorization and originality mining.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Machine learning-based methods</title>
        <p>With the substantial increase in computing power and the rapid expansion of data scale,
it has become possible to use deep learning methods in the big data environment to
expand the semantics of text features and calculate the similarity of content. The
computational efficiency has also been significantly improved, and scholars adopted deep
learning methods for originality detection research. Markou (2003) reviewed various
neural network methods (such as MLP, ART, RBF) that can be used for novel
information detection based on the theoretical level. Kim et al (2018) presented a
networkbased method to detect the originality of a research paper. An autoencoder neural
network is used as the originality detection model. Among the constructed networks,
keyword-level graph features exhibit the best performance using regression analysis as the
metric. Safdera (2020) proposed a set of methods to automatically identify and extract
algorithmic pseudo-codes and the sentences that convey related algorithmic metadata
using a set of machine-learning techniques.</p>
        <p>These studies promote the innovative extraction and evaluation of papers, but there
are still some shortcomings, such as:
1. Machine learning-based methods often need some labeled training data, but there is
no corresponding dataset about originality of the papers;
2. Rule-based methods aim at a small amount of data, and how to design a method to
process a large amount of data is quite challenging;
3. Most existing studies focused on the abstract of the paper, with only a few exploring
the full text of scientific publications.</p>
        <p>To tackle these problems, we design a method to extract SDO of the papers from the
conclusion section, which combines rules and statistics. This method finds some
features through statistical analysis of the description of originality points and then
transforms these features into regular expressions to reduce the trouble of large-scale
annotation data required by machine learning.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <sec id="sec-3-1">
        <title>Research framework</title>
        <p>Our technical framework for the extraction of SDO of the papers in academic literatures
includes three modules, namely data preparation, text processing, and extraction of
SDO of the papers, as shown in Figure 1.</p>
        <p>The main processing part includes the following steps:
1. Using a web crawler, we obtained the full-text information of the papers from arXiv.
2. Then, we converted the format of the papers from PDF to TXT.
3. We summarized the characteristics of the conclusions in the literatures to extract
them.
4. The conclusion section was split into sentences according to the characteristics of
the text or the "full stop" character.
5. We processed the sentence set, such as word segmentation, stemming, stop words,
part-of-speech tagging, and synonym merging.
6. In the module to recognize SDO of the paper, we collected and organized the words
that comprise the SDO of the paper through literature research, word frequency
analysis, domain dictionary, literature keyword collection.
7. We labeled the originality-related words and serialized the sentence in the
conclusion section according to the originality guide vocabulary.
8. According to the result of sentence serialization, extracting rules of SDO of the paper
were constructed and realized by regular expressions.
9. To extract the sentence describing originality from the conclusion of the paper by
using rules.</p>
        <p>Among them, the first and second steps belong to data acquisition module, step 3, 4 and
5 constitute data preprocessing module, and step 6, 7, 8 and 9 belong to extraction
module.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data collection and processing</title>
      </sec>
      <sec id="sec-3-3">
        <title>3.2.1 Document format conversion and preprocessing</title>
        <p>In order to extract SDO, it is necessary to convert the documents formatted as PDF
into the TXT format. In practice, we adopt the pdfminer3k library in Python, an
opensource package that converts PDF files into manageable TXT or Microsoft Word
documents. When a PDF is parsed into a corresponding TXT document, people oftentimes
encounter some issues, such as the lack of paragraph marks, the disappearance of the
first-line indentation, and the forced disconnection of words. Therefore, the
comprehensive application of line breaks, punctuation marks, hyphenation symbols, and
sentence length was used to identify the paragraphs of TXT.</p>
        <p>After extracting the conclusion section from the academic literatures, this paper used
spaCy natural language processing software package for word segmentation,
part-ofspeech tagging, stemming, and stop words processing. To improve the accuracy of
word segmentation, this paper introduces a keyword list, a domain glossary before word
segmentation and uses Bi-gram and Tri-gram methods to identify phrases in the
literature.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2.2 Extracting conclusions from academic literature</title>
        <p>This paper recognized the conclusion or summary based on the chapter title, then
split texts into sentences according to the length of the sentence and punctuation. After
this, we divided sentences into words using professional dictionaries, keyword
vocabulary, N-gram, and other methods for word segmentation, and finally generated a
dataset in sentence units. The structure and function of most academic texts can be
identified by chapter titles. For example, "Introduction" and "Introduction and Motivation"
can be directly judged as the introduction; "Related Work" and "Context of this
Research" can be directly judged as "related research". Due to the different expressions of
the conclusion section in the literature, this paper manually screened the chapter titles
of the experimental data and finally derive the characteristic vocabulary of the
conclusion chapter titles in Table 1.</p>
        <sec id="sec-3-4-1">
          <title>Chapter Title Featured</title>
          <p>Vocabulary
Conclusion, conclusions,
discussion, summary, future,
perspective, limitations,
outlook, work, directions,
results, concluding, remarks,
suggestions,
recommendations, comments, discussions</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>Chapter End Featured Vocabulary</title>
        </sec>
        <sec id="sec-3-4-3">
          <title>Acknowledgement[s], acknowledge, reference[s], \n\n</title>
        </sec>
        <sec id="sec-3-4-4">
          <title>Chapter</title>
        </sec>
        <sec id="sec-3-4-5">
          <title>Conclusion</title>
          <p>According to the above-mentioned starting feature vocabulary and ending feature
vocabulary, the conclusion chapter extraction rules were constructed. We extract
experimental data with these rules and finally obtained 18,563 conclusions. Then 17,653
conclusions were finally screened out through manual inspection.
3.3</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>SDO of the paper extraction</title>
      </sec>
      <sec id="sec-3-6">
        <title>3.3.1 Constructing a dictionary of originative guiding words</title>
        <p>
          Originality of academic literatures is mainly reflected in two aspects: characteristic
words (guiding words) and common expressions. Aiming at the linguistic
characteristics and style of scientific literature, the use of rule-based extraction methods could
accurately identify the "knowledge claims" in the papers. Approximate 95% of
originalities in papers are guided by characteristic words
          <xref ref-type="bibr" rid="ref14">(Wen, 2014)</xref>
          . Therefore, this paper
combines the previous research results, domain keywords, domain terminology
database, and word frequency statistical analysis to obtain the vocabulary list through
manual screening and preliminary screening of originalities point feature guiding words.
We then use WordNet to expand synonyms and finally select originative feature word
sets.
        </p>
        <p>
          The main basis for selecting the guiding words of originality in this paper comes
from the usefulness, originality, enlightenment, scientificity and other elements
described in the definition of scientific originality by some scholars. This article referred
to the research results of Dahl (2009), Trine (200
          <xref ref-type="bibr" rid="ref8">8), Parkinson (2011</xref>
          ): We selected
originative linguistic feature guiding words and divided them into the following types:
referring to the author, referring to the article, iconic verb, iconic noun, and iconic
adjectives. Since the subject terms of the field reveal the research focus of the field, the
content of originality was closely related to the research subject. Given these, when
constructing the originality guide vocabulary in this article, the field glossary and the
keywords of the literature were introduced as the subject terms of the literature
collection. In addition, the word frequency of the text in the conclusion part showed that most
of the originative guiding words were distributed in the high-frequency range.
Therefore, this article will compute word frequency on nouns and verbs and filter out
originative feature words to construct an originative guiding vocabulary table.
        </p>
        <p>After initially identifying the originative feature words, we use WordNet to expand
synonymous word as the final selection of originative feature guiding words in this
article. According to their linguistic features, the originative feature guiding words are
divided into 6 categories. The finally constructed originality point feature guiding
words are shown in Table 2.</p>
        <sec id="sec-3-6-1">
          <title>Marking symbol RF TP</title>
          <p>VB
NN
AD
TW</p>
        </sec>
        <sec id="sec-3-6-2">
          <title>Word examples</title>
          <p>I, We, Our
[In this|this|our|the]
[paper|article|study|work|]</p>
          <p>Use, show, propose, provide, present,
improve, observe, describe, investigate,
prove, define, obtain, represent, design,
aim, address, find, analyze, illustrate,
conduct, appear, try, drive, and so on
problem, method, approach, work,
result, performance, experiment, finding,
insight, notion, and so on</p>
          <p>new, novel, unused, caused, resulting,
considered, known, observed, predicted,
and so on</p>
          <p>algorithm, data, information,
framework, knowledge, Acoustic, Bayesian
network, beam search, CNN, RNN,
LSTM, ontology, optimization, cluster,
bi-lstm, classifier, crf, dnn, deep
q-learning, embedding, robotic, transfer
learning, recommender system, and so on</p>
        </sec>
      </sec>
      <sec id="sec-3-7">
        <title>3.3.2 Identification of SDO of the papers</title>
        <p>
          Recognition rules are constructed based on the relationship between domain
thesaurus and ontology, and the method of the redundancy based on the overlapping degree
of subject words is used to filter the candidate set of originality points
          <xref ref-type="bibr" rid="ref15">(Zhang and Le,
2014)</xref>
          . The vocabulary in the sentence is labeled according to the labeling symbols in
Table 2, and then the labeling symbols in the sentence are separated come out and form
a sequence of labeling symbols separated by spaces (according to the example in Figure
2, the labeling sequence in the sentence is: TP VB TW TW NN TW).
We comprehensively consider the labeling sequence and structure of SDO of the papers
and consider the positions of different types of clue words and the combination of
different clue words when constructing rules. We also set limited matching for some rules.
Finally, the rules for writing regular expressions are as follows:
        </p>
        <p>((RF )|(NN )|(TW )|(AD )|(TP )){0,3}(RF )((TP )|(AD )|(TW )|(NN )){0,3}(VB )
((AD ){0,1}(TW ){0,6}(NN ){0,2}){0,3}</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation and Results</title>
      <sec id="sec-4-1">
        <title>Experiment Data</title>
        <p>This article used a web crawler to obtain publications in the field of "artificial
intelligence" under CS (computer science) on arXiv which were labelled “CS. AI”. These
experimental data were used for extraction of SDO from the papers. We collected basic
information about the papers including title, author, publication time, URL (document
PDF location). After that, the requests library was used to parse the URL to finally
obtain 22,213 academic papers in PDF format. Figure 2 shows the annual distribution
of literatures in the field of artificial intelligence.
This paper selects sentences from the conclusion chapters of randomly chosen 200
papers from the collected documents for manual annotation and obtains 346 SDO of the
papers out of 1,227 sentences. In order to test the performance of the SDO of the paper
recognition rules constructed in this paper, the accuracy and recall rates in information
retrieval are used to verify the recognition results. The results are shown in Table 3.
According to Table 3, the originality point identification rules constructed in this paper
have an accuracy rate of 83.3% in the conclusion section. SDO of the paper were
recognized from experimental dataset with the recognition rules. A total of 14,234
sentences that match the rules are used as input for originality objects and topic mining.
Part of the extracted results is shown in Figure 4.
A qualitative analysis of the content of SDO of the papers is carried out by observing
approximately 100 papers selected randomly, and the commonly used SDO of the
papers in the conclusion chapter are summarized. A part of the results is shown in Table
4.</p>
        <sec id="sec-4-1-1">
          <title>Type SDO of the paper patterns [This paper|We] [propose|introduce|present|develop] a New method class [new|first|novel] [model|solution|algorithm|method……] that ……</title>
          <p>Methodology WWee [dpermesoennstterda|tiendtr…od…ucemd]etahomdeotlhoogdyofloogry…[f…or|to] ……
Vcioenwcpeopitn/t tITinohnteh|c[icsoonpncacepepeptr]t|wnofoet…hioan…v]eo[fre…de…finiesdd|deeffiinneedd…|pr…oposed] the
[noProof class [This paper|We] [demonstrate|prove] that</p>
          <p>Problem class We considered …… problem ……</p>
          <p>Application class In this paper, we [shown|studied] the application of ……
From Table 4, we can see there are mainly seven kinds of descriptions about the
originality of the paper, which are new method class, methodology class, concept class,
viewpoint class, proof class, problem class, and application class. Among them, the first
two describe are method originality, the latter two refer to application originality, and
the middle three belong to theory originality. We will make a detailed analysis of the
theme, object and the pattern of sentence describing originality through the following
papers.
4.3</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Analysis of experimental results</title>
        <p>This article extracted SDO in the conclusion chapter according to the innovative
sentence recognition rules. High-frequency innovation objects and subject terms will be
analyzed.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3.1 Analysis of core nodes in SDO</title>
        <p>According to the results of the dependency syntax analysis, the core nodes (ROOT)
in the innovative sentences are counted, and the proportion of the core nodes is shown
in Table 5. In the SDO from the conclusion chapter, the words present, propose, and
introduce respectively represent 23%, 22%, 11%, which amounts to more than 50% of
the entire core node, while the remaining core words account for a relatively small
proportion. This shows that in the conclusion chapter, researchers mainly use these words
to summarize or introduce the main points and originality of the article.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.3.2 Analysis of Innovation Objects</title>
        <p>This paper takes the direct object of the core node as the innovation object of SDO,
and counts the frequency of the innovation object. The proportion of the innovation
object is shown in Table 6.</p>
        <p>In the results of the proportion of innovation objects, approach, method, way and
other words about method have a relatively high proportion. It can be seen that the
innovation of methods in the field of artificial intelligence is the key research direction.
However, the innovation of methodology only accounts for 0.6% of the total, which
shows that there is relatively little research on methodological innovation.</p>
        <p>By employing arXiv scientific publications, this paper constructs recognition rule
about SDO of the paper based on originative guiding words aiming to recognize the
sentence-level originality point of academic literature. Implementing SDO of the paper
recognition experiments on the literature on artificial intelligence topics on arXiv, we
find that the proposed method is quite effective to extract SDO of the paper from papers.
After obtaining SDO of the papers, people can evaluate papers by comparing the SDO
of the papers in the different papers.</p>
        <p>The results of this paper show that the method constructed is feasible and effective
for sentence-level originality point identification and mining methods. Yet, as a
research-in-progress paper, there are still several limitations, and we are going to
implement the following related studies in the future:
1. Although the SDO of the paper recognition rules constructed in this article are
effective in recognition of SDO of the papers, the formulation and maintenance of the
rules cannot cover all papers. Therefore, in order to improve the accuracy of the
recognition of SDO of the papers, the sequence will be marked in the follow-up as
training data, machine learning methods are used to convert the extracted questions
into classification questions.
2. The current paper only identifies sentences that reflect the originative content of the
thesis. Next, the SDO of the papers will be analyzed and excavated, and the
originative objects, topics, and specific methods will be extracted.
3. In this paper, we only extract the innovative description in the papers’ conclusion
section. We will extract information describing the originality in the research
objectives, related works, and methodology from papers in the following research.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This work was supported in part by The National Social Science Foundation of
China (Number: 17BTQ066).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Amplayo</surname>
            <given-names>R.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>S.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Network-based approach to detect novelty of scholarly literature</article-title>
          .
          <source>Information Sciences</source>
          ,
          <volume>422</volume>
          ,
          <fpage>542</fpage>
          -
          <lpage>557</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dahl</surname>
            <given-names>T.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>The Linguistic Representation of Rhetorical Function: A Study of How Economists Present Their Knowledge Claims</article-title>
          .
          <source>Written Communication</source>
          ,
          <volume>29</volume>
          (
          <issue>4</issue>
          ),
          <fpage>370</fpage>
          -
          <lpage>391</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ding</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>X.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cronin</surname>
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>The distribution of references across texts:Some implications for citation anlysis</article-title>
          .
          <source>Journal of information</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ),
          <fpage>583</fpage>
          -
          <lpage>592</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Garfield</surname>
            <given-names>E</given-names>
          </string-name>
          (
          <year>1955</year>
          ).
          <article-title>Citation indexes for science. A new dimension in documentation through association of ideas</article-title>
          .
          <source>Science</source>
          ,
          <volume>3159</volume>
          (
          <issue>122</issue>
          ),
          <fpage>108</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>He</surname>
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pei</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kifer</surname>
            <given-names>D.</given-names>
          </string-name>
          , et al.(
          <year>2010</year>
          ).
          <article-title>Context-aware citation recommendation</article-title>
          .
          <source>In Proceedings of the 19th international conference on World wide web (WWW '10)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>421</fpage>
          -
          <lpage>430</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kirschner</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eckle-Kohler</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <source>Linking the Thoughts: Analysis of Argumentation Structures in Scientific Publications. 2nd Workshop on Argumentation Mining (ARG-MINING</source>
          <year>2015</year>
          ) Denver, Colorado, USA, June 4.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Markou</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Novelty detection: A review-part 2: Neural network based approaches</article-title>
          .
          <source>Signal Processing</source>
          ,
          <volume>83</volume>
          , (
          <issue>12</issue>
          ),
          <fpage>2499</fpage>
          -
          <lpage>2521</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Parkinson</surname>
            <given-names>J</given-names>
          </string-name>
          . (
          <year>2011</year>
          ).
          <article-title>The Discussion section as argument: The language used to prove knowledge claims</article-title>
          .
          <source>English for Specific Purposes</source>
          ,
          <volume>30</volume>
          (
          <issue>3</issue>
          ),
          <fpage>164</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Safdera</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassana</surname>
            <given-names>S.U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Visvizi</surname>
            <given-names>A</given-names>
          </string-name>
          . Noraset T.，
          <string-name>
            <surname>Tuarob</surname>
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents</article-title>
          .
          <source>Information Processing and Management</source>
          ,
          <volume>57</volume>
          ,
          <fpage>102269</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Shibayama</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Measuring originality in science</article-title>
          .
          <source>Scientometrics</source>
          ,
          <volume>122</volume>
          ,
          <fpage>409</fpage>
          -
          <lpage>427</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Trine</surname>
            <given-names>D</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Contributing to the Academic Conversation: A Study of New Knowledge Claims in Economics and Linguistics</article-title>
          .
          <source>Journal of Pragmatics</source>
          ,
          <volume>40</volume>
          (
          <issue>7</issue>
          ),
          <fpage>1184</fpage>
          -
          <lpage>1201</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Uzzi</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mukherjee</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stringer</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>A typical Combinations</article-title>
          and Scientific Impact, Science,
          <volume>342</volume>
          ,
          <fpage>468</fpage>
          -
          <lpage>472</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wen</surname>
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Semantic Recognition and Classification Method of originality Points in Scientific and Technological</article-title>
          .
          <source>Journal of The China Society for Scientific and Technical Information</source>
          ,
          <volume>38</volume>
          (
          <issue>3</issue>
          ),
          <fpage>249</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wen</surname>
            <given-names>Y.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            <given-names>G.Y</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <source>Dynamic Mining of Fragmented Scientific Research originality Points, Digital Library Forum</source>
          ,
          <volume>7</volume>
          ,
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. Zhang F., &amp;
          <string-name>
            <surname>Le</surname>
            <given-names>X.Q.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <source>Research on originality Points Extraction from Scientific Research Paper Based on Field Thesaurus. New Technology of Library and Information Service, (9)</source>
          ,
          <fpage>15</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Zhang Y.,
          <string-name>
            <surname>Tsai</surname>
            <given-names>F.S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kwee</surname>
            <given-names>A.T.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Multilingual sentence categorization and novelty mining</article-title>
          .
          <source>Information Processing and Management</source>
          ,
          <volume>47</volume>
          ,
          <fpage>667</fpage>
          -
          <lpage>675</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>