<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CIST@CLSciSumm-17: Multiple Features Based Citation Linkage, Classi cation and Summarization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lei Li</string-name>
          <email>leili@bupt.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yazhao Zhang</string-name>
          <email>yazhao@bupt.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liyuan Mao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Junqi Chi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moye Chen</string-name>
          <email>moyec@bupt.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zuying Huang</string-name>
          <email>zoehuang@bupt.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Intelligence Science and Technology (CIST), School of Computer Beijing University of Posts and Telecommunications (BUPT)</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">P.R.China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes our methods and experiments applied for CLSciSumm-17. We try Convolutional Neural Network, word vectors and sentence similarities for citation linkage. For facet classi cation, we explore more useful features, rules, SVM and Fusion method. We use the linear combination of ve classical features and DPPs based diversity sampling method to compute the structured summary. Test-data results show that we obtain the best performance for both facet classi cation and summarization.</p>
      </abstract>
      <kwd-group>
        <kwd>CNN</kwd>
        <kwd>word vector</kwd>
        <kwd>SVM</kwd>
        <kwd>hLDA</kwd>
        <kwd>DPP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        With the rapid development of Computational Linguistics (CL), rich, complex
and continually expanding resource has ooded into the scienti c literature of
this domain. Literature surveys and review articles in CL do help readers
obtain a gist of the state-of-the-art in research for a topic. However, literature
survey writing is labor-intensive and one speci c literature survey is not always
available for every topic of interest. The CLSciSumm-17 has highlighted the
challenges and relevance of the scienti c summarization problem.
In this paper, we describe our strategies, methods and experiments applied for
CLSciSumm-17 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. There are totally two tasks and 30 topics involving one
reference paper (RP) and some citing papers (CPs) for the training dataset. The
tasks are de ned as follows: Given a set of CPs that all contain citations to a RP.
Given a set of CPs that all contain citations to a RP for each topic, the tasks are
de ned as follows. Task 1A requires that for each citance (the set of citation
sentences), we need to identify the spans of text (cited text spans, CTS) in the
RP. We will try to use Convolutional Neural Network(CNN) with word vectors
to match the citance and possible CTS. Besides, we also have existed methods
using sentence similarity based on various traditional features and rules. We will
search for a good hybrid methods which can take advantages of both CNN and
the existed methods.
      </p>
      <p>Based on the results of Task 1A, for each cited text span, we need to identify
what facet the CTS belongs to, from a prede ned set of facets in Task 1B.
We plan to explore more useful features and machine learning methods besides
SVM.</p>
      <p>Finally, we will generate a structured summary of the RP and all of the
community discussion of the paper represented in the citances for Task 2. The length of
the summary should not exceed 250 words. We will extract summaries with high
quality, diversity and low redundancy. hLDA (hierarchical Latent Dirichlet
Allocation) topic model is adopted for content modeling, which is able to organize
topics into a tree-like structure. We combine hLDA knowledge with several other
classical features using different weights and proportions to evaluate the quality.
And the summary diversity is enhanced using Determinantal Point Processes
(DPPs).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Interest about information extraction and retrieval has increased recently, and
there are some shared tasks in this domain like Online Forum Summarization
(OnForumS) in MultiLing 2017 and Computational Linguistics Scienti c
Document Summarization Shared Task. Through these shared tasks, many methods
have been found for content linking, Ziqiang Cao et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] simplify it into a
ranking problem which just needs to select the rst item, then they adopt SVM
Rank to handle it. Bruno et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] transform each word into their synonym group
synset using WordNet for next idf (inverse document frequency) calculation step.
Kun Lu et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] use WordNet to compute the concept similarity between
sentences. Tadashi [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] combines TFIDF and a single layer Neural Network together
for content linking.
      </p>
      <p>
        In order to identify the linkage efficiently between a paper citation and its cited
text spans in the RP, we need to catch the potential information of natural
language sentences. More and more methods have been proposed for this purpose.
CNN [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] has shown good performance in sentence classi cation, Word
Embedding [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is useful in digging semantic information, WordNet is used to calculate
the similarity between words. All of above will be used in our experiments.
In the case of CL summary generation, we will not only consider the original
paper, but also consider the information provided by citances. Li et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
provide a method based on linear combination of multiple features. Amjad et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
focused on the coherence and readability of generated citation summaries. Yet
their objects for summarization are different from ours, because we are focusing
on summarizing the reference texts considering the citation texts in this paper.
It is a common view that summaries should consider redundancy as well as
quality. But, unfortunately most previous methods divide the summary generation
problem into two procedures in sequence. First, they use some machine learning
methods [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to select a subset with higher quality. Then, they control the
redundancy of summaries [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In our method, we will consider diversity, redundancy
and quality at the same time. And we will use a new method based on DPPs [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
to enhance the diversity of summaries.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <sec id="sec-3-1">
        <title>Citation Linkage</title>
        <p>For citation linkage, we need to identify the CTS in the RP that most
accurately re ect the citation in the CP, we call this content linking. In fact, it
focuses on nding the linkage between sentences, which is represented by similar
meaning between sentences. Hence computing sentence similarity based on
various features is our major work. For features, we use some traditional features
like Jaccard similarity and idf to nd syntactic information. Besides, word
vector, WordNet and CNN are used for digging out deeper semantic information.
Finally, we fuse the above features to obtain the result.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Feature Extraction</title>
        <p>1) Three lexicons: a) we picked up the words with high frequency from
reference text in the training corpus arti cially, and expanded them through WordNet
and word vectors as Lexicon 1 (high-frequency lexicon). b) we used LDA (Latent
Dirichlet Allocation) model to train the reference paper and citing papers to
obtain a lexicon of 30 latent topics for les in every topic independently as Lexicon
2 (LDA lexicon). c) we obtained the co-occurrence degree between words by the
word frequency statistics of citation text and its reference text from the training
corpus as Lexicon 3 (co-occurrence lexicon).</p>
        <p>2) Two sentence similarities: One is idf similarity, we add up the idf values
of the same words between two sentences. The other is Jaccard similarity, which
uses the division between the intersection and the union of the words in two
sentences as similarity.</p>
        <p>3) Two context similarities: We calculate the context similarity for idf
similarity and Jaccard similarity. The F1 performances of the above features for
training corpus are shown in Table 5 of Appendix B.</p>
        <p>
          4) Word vector: We trained every word as a vector with xed dimensions
using Word2Vec. Then add the word similarities together to represent sentence
similarity [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The word vector of 400 dimensions with window size as 15 performs
best as shown in Table 6 in Appendix B, so we will choose it for the following
experiments.
        </p>
        <p>5) WordNet: WordNet is a kind of English lexicon which contains nouns,
verbs, adjectives, and adverbs. It can calculate the similarity between two words
with same part-of-speech. We use 6 similarity methods: jcn, lin, lch, res, wup and
path similarity. Though the similarity in WordNet can only support calculating
word similarity, in order to compute the sentence similarity, we use the same
algorithm in word vector section for WordNet similarity, changing the cosine
similarity to word similarity in WordNet. The F1 performances can be seen in
Table 7 of Appendix B.</p>
        <p>
          6) CNN: CNN can nd the deep semantic information in sentences and is
widely used in natural language processing domain [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. We use the word vector
as the input of CNN and obtain the probability of content linking from its output.
We use the output probability to represent the similarity of input sentences. We
have investigated the lengths of reference sentences and citation ones, almost
all of them are less than 80. So we x the length of sentence to 80 and the
length of two sentences is 160. For the sentence whose length is less than 80,
we add 0 vector to the relevant location. We use the word vector with 200
dimensions. As for the CNN structure, we have tried some kinds of con gurations
involving different convolution kernels and other parameters. Finally, through
the experiments on the training corpus, we use one of them for testing corpus.
Method for Linkage
        </p>
        <p>1) For voting method, we tried different weights(the weights of different
features) and proportions(the number of sentences we choose) to combine them
through experiments, and then got four results (run 1, run 2, run 5, run 6)
through a voting system. The text span with the highest-number of votes is
chosen as the citation text corresponding sentences in the reference paper.
Jaccard Focused Method (run 3) chose Jaccard similarity as the major feature, and
added other features as supplementary. Jaccard Cascade Method (run 4) chose
the sentences with top two Jaccard values as the basic answer, then combined
other features to nd the other two sentences with highest values as the improved
answer. Finally, we choose 4 sentences as answer through experiments. Table. 1
shows the parameters for every run. W means weight and P means proportion.
2) For method with CNN: We use the answers in training data as positive
samples, and choose the sentences out of answers randomly as negative samples,
keeping the count of them in balance. For testing set, we combine every sentence
in reference paper with one citation text as input, getting the output as feature
value. While only using this feature, the performance is bad, so we choose the
following method for CNN: use top 40 sentences according to CNN output, then
combine Jaccard similarity and idf similarity together using Equation 1. All
parameters were obtained through experiments on training corpus.
similarity = 10 J accardsimilarity + 0:1 idfsimilarity
(1)
Finally, we choose the top 4 sentences with highest similarities as answers.
3.2</p>
      </sec>
      <sec id="sec-3-3">
        <title>Facet Classi cation</title>
        <p>Subtitle Rule obtains Facet Classi cation according to the subtitle of sentence
pair. We use the high frequency word number of sentence pair to obtain the Facet
Classi cation in High Frequency Word Rule. In Subtitle and High Frequency
Word Combining Rule, we combine the two methods for Facet Classi cation.
SVM Classi er uses features of sentence pair to obtain the classi cation. The
features are Location of Paragraph, Document Position Ratio, Paragraph
Position Ratio and Number of Citations or References. The ideas of Voting Method
and Fusion Method are similar.</p>
        <p>
          The details of above methods are mostly similar to [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] except for some details of
High Frequency Word Rule, SVM Classi er and Fusion Method. The difference
is that we update the High Frequency Word and retrain the SVM Classi er. In
Fusion Method, we combine all the results to obtain a fusion result. We provide
the details of features used in SVM Classi er in Appendix A.
3.3
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Summarization</title>
        <p>
          )HDWXU&amp;RPELQ
'3VEDHGQWFPSOLJ
Pre-processing The source documents provided by CLSciSumm-17 have
relatively high quality, but there also exist some xml-coding errors. Besides, we need
speci c data format to train our hLDA feature.
1. Document Merging: merge the content of RP and the citations into a
document. It is worth mentioning that we will not include the sentence in the
abstract of RP unless it is selected by Task 1A. And all documents are
converted to lowercase letters.
2. Sentence ltering: the corpus is generated from CL papers, thus there must
be some equations, gures, tables and so on. However, these contents make
small contribution to summary generation. First, we use WordNet to check
whether a word is useful. Then we can lter those sentences whose proportion
of useless words is greater than 0.5.
3. Input le generation for hLDA: For the remaining words, we build a
dictionary for each document, which contains words and their corresponding
frequencies, and the index starts from 1 to word list size. Finally, we
generate an input le for hLDA modeling, in which each line represents a sentence
presented by word index - frequency pair, such as:
[number of words in sentencei] [word
index A : f requency A] : : :
Feature Selection According to our previous work [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], Sentence Length (SL),
Sentence Position (SP) and CTS are useful features for summary generation of
CL papers and we will reuse them in this paper. Besides, we upgrade the
hierarchical Topic Model (HTM) and apply title similarity (TS) to our experiments.
1. hierarchical Topic Model (HTM): hLDA constructs a document into a
treelike structure. Each word is assigned to a node, and each sentence is assigned
to a path that goes through the root node to a leaf node. Thus, each node
is regarded as a topic which is a probability distribution over words, and
each path is always regarded as a theme. Based on this, we believe that
both topics and themes involve valuable message for summary generation.
We adapt the method in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] to use the information of hLDA.
        </p>
        <p>
          Different from the method proposed by Taiwen [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], we also consider the
theme distribution to measure the contribution of a sentence to the summary.
We use Equation. 2 to calculate it.
        </p>
        <p>m
qi = ∑( j Tj + F reqj ) + tpi (2)</p>
        <p>j=1
where Tj represents the distribution score of wordj in i-th sentence
calculated by hLDA, j is the pre-de ned weight of Tj according to our former
experiments, F reqj is the frequency of wordj in current hLDA node, tpi is
the theme distribution of i-th sentence.
2. Title Similarity (TS): Title Similarity is the cosine similarity of each sentence
and the document title. We use Equation (3) to calculate it.</p>
        <p>qi =
tfsi</p>
        <p>tfstitle
jtfsi jjtfstitle j
(3)
where stitle and si represent the title and i-th sentence respectively.
Structured Summary Generation To control the redundancy of the
generated summary, we use Jaccard similarity to measure the similarity between
sentences rstly. Then we use DPPs to enhance the diversity.</p>
        <p>
          Determinantal Point Processes (DPPs) are elegant probabilistic models of global,
negative correlations and mostly used in quantum physics to study the re ected
Brownian motions. In our method, we only consider discrete DPPs and follow
the de nition of Kulesza [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          DPPs based subset sampling method considers diversity, quality and redundancy
at the same time. And the subset sampled using DPPs are mostly diverse. Thanks
for the contribution of [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], we can construct DPPs using a positive semi-de ned
matrix L.
        </p>
        <p>Using this representation, the entries of kernel L can be written as</p>
        <p>Lij = qiϕi⊤ϕjqj
where qi 2 R+ measures the quality of an element i, and ϕi⊤ϕj measures the
similarity between element i and element j.</p>
        <p>In our method, we use the combination of previous ve features (SP, SL, TS,
CTS, HTM) to measure the quality, and Jaccard similarity to measure the
redundancy, shown as below.</p>
        <p>5
qi = ∑ φkqki
k=1
(4)
where φk 2 f0; 1g is the combination proportion of corresponding features. qki
represents the k-th quality feature of si. For example q11 represents the second
feature (SL) of second sentence in the article.</p>
        <p>Given the matrix L, we adapt the DPPs based sampling method to compute a
sentence subset D′ of the corresponding paper, shown in Table. 2.
Finally, in order to extract a high-quality structured summary, we make full use
of the prior knowledge that structured summary contains four parts:
Introduction, Methods, Results and Conclusion. So we use the Facet obtained in Task
1B to help extract the summary sentences from D′. We will extract two or three
sentences for each part of the summary if exists. Furthermore, we remove
redundant candidate sentences.</p>
        <p>Besides, we also use qi calculated using Equation. 4 without DPP to select top-N
sentences to compute a summary. We call this method as Feature Combination
method(FC). This is a comparison system to examine the effectiveness of DPPs
based sampling method.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>
        From Run1 to Run6, we use the above mentioned methods respectively for Task
1A. In Rnn 7, we use the CNN method, while it uses all corpus for training, we
have no result about it [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>We evaluate our methods on the training dataset using the evaluation scripts
offered in the CL-SciSumm Shared Task 2017 official website. We evaluate six
methods for Facet Classi cation on four results of Task 1A, and selected the best
method for each result of Task 1A. We nd out that the best Facet Classi cation
of Task 1A result is obtained by the voting method except for the result of
Jaccard-Focused obtained by High Frequency Word Rule.</p>
      <p>From the official results in Table 3, we can see that our methods for test data
performs very well. Run4 has performed the best for Macro Average F1.
According to our experiments on the training dataset, we select seven different
parameter settings to calculate the testing dataset and obtain seven runs for
the CLSciSumm-17 competition. We select the human written summary as the
only gold summary and the Manual ROUGE values of our experiments on the
training dataset are shown in Table. 4.
The results of official test data have proved that the performance of methods
we proposed is excellent. In Task 1A we tried to nd a calculating similarity
method to represent the true relationship between two sentences. For Task 1B we
used fusion method to obtain a fusion facet classi cation. Finally, we considered
the quality features, redundancy feature and diversity of RP and cited text
spans in Task 2. And we will also try to nd some better ways to use more
semantic features for citance linkage. Furthermore, we will continue nding a
better method to choose the least sentences covering the most information.</p>
    </sec>
    <sec id="sec-5">
      <title>Appendix A</title>
      <p>The meaning of features in Task 1B:
1) Location of Paragraph: the order number of the paragraph in which the
sentence is located.
2) Document Position Ratio: the ratio of sentence Sid to the total sentence
number of the corresponding document.
3) Paragraph Position Ratio: the ratio of sentence Ssid to the total sentence
number of the corresponding paragraph.
4) Number of Citations or References: the number of Citation Offset or
Reference Offset.</p>
      <p>The details of methods:
1 Rule-based method
1.1 Subtitle Rule: First of all, we examine whether the subtitle of reference
sentences and citance contains the following facet words: Hypothesis, Implication,
Aim, Results and Method. If the subtitle contains any one of these words, it will
be directly classi ed as the corresponding facet. If it contains more than one of
these words, it will be classi ed into all the facets. Else if it contains none of
them, we just classify it as the facet of Method.
1.2 High Frequency Word Rule: According to the High Frequency Word Rule,
we rstly count the High Frequency Word of ve facets from the Training Set
and the Development Set. In order to improve the coverage of sentences, we
expanded the High Frequency Word to get some similar words of each facet. We
set an appropriate threshold for each facet. If the number of the High Frequency
Word of any facet in the sentence is more than the corresponding facet
threshold, then we just use the facet whose coverage is the highest as the nal class. if
some facets' overage are same, then we just classify according to the sequence of
Hypothesis, Implication, Aim, Results and Method. If all facets have not reached
the threshold of each facet, we classify it as the Method.
1.3 Combine Subtitle and High Frequency Word Rule: We rstly use the Subtitle
Rule to classify the testing set. If the results are not in the ve facets of
Hypothesis, Implication, Aim, Results and Method, then we use the High Frequency
Word Rule to get the nal facet.
2 SVM Classi er
We extract four features of sentence for each class. Then features from citance
sentence and reference sentence form an 8-dimension vector of a pair of reference
sentence and citation sentence. We train SVM to get ve classi ers. For solving
the problem of unbalanced training data, we set different weights for different
classes. If we cannot get any class of the ve facets, then we classify it as Method
class.
3 Voting Method
We combine the results from Subtitle Rule, High Frequency Word and SVM
classi er to generate the voting results with most votes.
4 Fusion Method
We run the above methods for each run result we obtained in Task 1A and
choose a best one as the nal result. Then we also tried a fusion method to
combine all the run results of the above methods obtained in Task 1A. We counted
the number of Method, Results, Aim, Hypothesis and Implication, and set an
appropriate threshold for each facet class to get a nal result of facet class.</p>
    </sec>
    <sec id="sec-6">
      <title>Appendix B</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Kokil</given-names>
            <surname>Jaidka</surname>
          </string-name>
          , Muthu Kumar Chandrasekaran, Devanshu Jain, and
          <string-name>
            <surname>Min-Yen Kan</surname>
          </string-name>
          .
          <article-title>"Overview of the CL-SciSumm 2017 Shared Task"</article-title>
          ,
          <source>In Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL</source>
          <year>2017</year>
          ), Tokyo, Japan, CEUR. (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>PolyU at CL-SciSumm 2016</article-title>
          .
          <source>In: Proc. of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016)</source>
          . pp.
          <fpage>132138</fpage>
          .
          <string-name>
            <surname>Newark</surname>
          </string-name>
          , NJ, USA (
          <year>June 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Malenfant</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lapalme</surname>
          </string-name>
          , G.:
          <article-title>RALI System Description for CL-SciSumm 2016 Shared Task</article-title>
          .
          <source>In: Proc. of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016)</source>
          . pp.
          <fpage>146155</fpage>
          .
          <string-name>
            <surname>Newark</surname>
          </string-name>
          , NJ, USA (
          <year>June 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Recognizing reference spans and classifying their discourse facets</article-title>
          .
          <source>In: Proc. of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016)</source>
          . pp.
          <fpage>139145</fpage>
          .
          <string-name>
            <surname>Newark</surname>
          </string-name>
          , NJ, USA (
          <year>June 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Nomoto</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>NEAL: A neurally enhanced approach to linking citation and reference</article-title>
          .
          <source>In: Proc. of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016)</source>
          . pp.
          <fpage>168174</fpage>
          .
          <string-name>
            <surname>Newark</surname>
          </string-name>
          , NJ, USA (
          <year>June 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kim</surname>
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Convolutional</surname>
          </string-name>
          <article-title>Neural Networks for Sentence Classi cation</article-title>
          [J].
          <source>Eprint Arxiv</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>Computer Science</source>
          . (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Li</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>Y</given-names>
          </string-name>
          , et al.
          <article-title>Computational linguistics literature and citations oriented citation linkage, classi cation and summarization</article-title>
          [J].
          <source>International Journal on Digital Libraries:</source>
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Abu-Jbara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Coherent citation-based summarization of scienti c papers</article-title>
          .
          <source>In: Meeting of the Association for Computational Linguistics: Human Language Technologies</source>
          , pp.
          <fpage>500</fpage>
          -
          <lpage>509</lpage>
          . Portland,
          <string-name>
            <surname>Oregon</surname>
          </string-name>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , Y.:
          <article-title>Multilingual multi-document summarization based on multiple feature combination (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Collobert</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            <given-names>L</given-names>
          </string-name>
          , et al.
          <article-title>Natural language processing (almost) from scratch</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2493</fpage>
          -
          <lpage>2537</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Alex</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Determinantal point processes for machine learning</article-title>
          .
          <source>arXiv preprint arXiv:1207.6083</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Gambhir</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Recent automatic text summarization techniques: a survey</article-title>
          .
          <source>Arti cial Intelligence Review</source>
          <volume>47</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>66</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kam-Fai</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mingli</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wenjie</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Extractive summarization using supervised and semi-supervised learning</article-title>
          .
          <source>In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume</source>
          <volume>1</volume>
          . pp.
          <volume>985</volume>
          {
          <fpage>992</fpage>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Borodin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Determinantal point processes (</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>