<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CIST@CLSciSumm-19: Automatic Scienti c Paper Summarization with Citances and Facets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lei Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yingqi Zhu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yang Xie</string-name>
          <email>xieyangsp@163.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zuying Huang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wei Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xingyuan Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yinan Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Beijing University of Posts and Telecommunications (BUPT) No.</institution>
          <addr-line>10 Xitucheng Road, Haidian District, Beijing</addr-line>
          ,
          <country country="CN">P.R.China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Starting from its former version, CIST@CLSciSumm-18, our CIST@CLSciSumm-19 system is going to participate in the shared Task 1A (citation linkage), Task 1B (facet classi cation) and Task 2 (summarization) in CLSciSumm-19@SIGIR2019. We mainly try to improve its methods for all the shared tasks. We build a new feature of Word2vec H for the CNN model to calculate sentence similarity for citation linkage. We plan to adopt CNN and RNN variants for facet classi cation. And in order to improve the performance of summarization, we develop more semantic representations for sentences based on neural network language models to construct new kernel matrix used in Determinantal Point Processes (DPPs).</p>
      </abstract>
      <kwd-group>
        <kwd>Citation Linkage Facet Classi cation Summarization Word2vec H Neural Network Language Model DPPs Determinantal Point Processes</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        As the scienti c paper, computational linguistics has many characteristics such
as professional knowledge, rigorous writing and strong logic. Reading such
articles is very meaningful, but manual reading takes a lot of time, so we need to
study how to extract good article summaries to reduce the workload of readers.
The main work of CLSciSumm-19 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is to explore automatic summary methods
based on the characteristics of the papers in the eld of computational linguistics,
and to provide a comprehensive and readable summary for the thesis.
      </p>
      <p>We tried to solve the three tasks contained in CLSciSumm-19: Task 1A, Task
1B and Task2. The data set we use is the paper in the eld of computational
linguistics provided by the organizer. There are some topics in the dataset. A
topic consisting of a Reference Paper (RP) and Citing Papers (CPs) that all
contain citations to the RP. In each CP, the text spans (i.e., citances) have
been identi ed that pertain to a particular citation to the RP. Task 1A: For
each citance, identify the spans of text (cited text spans, CTS) in the RP that
most accurately re ect the citance. These are of the granularity of a sentence
fragment, a full sentence, or several consecutive sentences (no more than 5).
Task 1B: For each cited text span, identify what facet of the paper it belongs to,
from a prede ned set of facets. Task 2 (optional bonus task): Finally, generate a
structured summary of the RP from the cited text spans of the RP. The length
of the summary should not exceed 250 words.</p>
      <p>In this paper, based on previous work, we add the Word2vec H feature to
the Task 1A method and used CNN to get the result of the content linking. For
Task 1B, we use the improved CNN and RNN structures for classi cation. For
Task 2, we develop more semantic representations for sentences based on neural
network language models to construct new kernel matrix used in Determinantal
Point Processes (DPPs).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        Task 1A acts as a content linkage task, and the common method is to calculate
similarity, which includes not only the Cosine similarity, the Jaccard similarity,
and so on, but also some semantic similarity calculation methods, such as BM25
and VSM [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In addition, the various characteristics of the words are also very
important, such as the position of the word, part of speech and frequency, etc.
The characteristics of the words in the two sentences are added to the similarity
calculation for the sentence-pair, and the similarity of the two sentences can be
judged at the word level [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. With the continuous expansion of the corpus and
the increasing number of features, the machine learning method has begun to
emerge for the task. Firstly, the researchers try the basic classi ers, such as SVM
using a radial basis function kernel, Decision Tree and Logistic Regression to
identify the reference span [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Various classi ers can learn di erent text features,
integrating them together can reveal more text features. So the researchers use
ensemble models, such as the Random Forest [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Besides, in order to more deeply
explore the meaning of the sentence, deep neural networks are also applied, such
as CNN [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Siamese Deep learning Networks [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].For Task1B, the
rulebased method [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and the classi cation method [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] can be used, both of
which focus on exploring good text features. The Rule-based methods, such as
building a dictionary for each discourse facet [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], are less adaptive. Most studies
combine the features of categories with classi cation algorithms to improve the
accuracy of the classi cation. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] use a multi-features random forest classi er.
The others use a supervised topic model, and XGBOOST [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and SVM with
tf-idf and naive bayes features [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Task 2 is a summary task. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] focus on exploring the sampling process. They
use WMD sentence similarity to construct new kernel matrix used in
Determinantal Point Processes (DPPs). [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] divide all sentences into three categories
(motivations, methods, and conclusions), and then extract sentences from each
cluster based on rules and severe features to form a summary. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] generates a
summary by selecting the most relevant sentences from the RP using
linguistic and semantic features from RP and CPs. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] built a summary generation
system using the OpenNMT tool.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <p>
        In our approach, we rst obtain CTS through feature extraction and content
linkage method in the Citation Linage, which is RT (the sentence in RP)
related to CT (the sentence in CPs). Then we judge the facet of CTS by feature
extraction and classi cation methods in the facet classi cation. Finally, a
summary of the article is obtained through pre-processing, feature selection, sentence
sampling, and post-processing in the summary generation.The framework of our
system is shown in Fig. 1.
The Citation Linkage task consists of two stages: feature extraction and content
Linkage. In feature extraction, we have used some of the good-performing
methods of the past, continuing to use word-cos and Word Vector, Sentence similarity
(IDF similarity and Jaccard similarity), Context similarities, WordNet. Besides,
we add CNN (Convolutional Neural Network) method and LDA-Jaccard. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
the LDA vectors of sentences are sparse, that is, the distribution of sentences
on topics is sparse. And the LDA vectors pays more attention to whether two
sentences belong to the same topic. So we use Jaccard's idea to express the
relativity of the sentence-pair by the ratio of the topic intersection and union of the
two sentences, namely LDA-Jaccard.
      </p>
      <p>This paper used Word2vec H feature as the input of CNN. It is based on
word embedding, maps CT and RT information into dense features space, and
adds sentence similarity to better guide neural network training. Speci cally, CT
is represented as an nd matrix CT Matrix. CT M atrix = [wv1; :::; wvi; :::wvn].
n is the number of words in CT, and d is Word embedding size. wvi refers to the
word vector of the i-th word in CT. Firstly, we decompose CT Matrix by SVD
to obtain three matrices, U, S, and V. Take the top min(n,d) values in diagonal
of S as the weight set I1 = i1; i2; :::imin(n;d) , and take the top min(n,d) rows
of V to form CT V. RT V and I2 of RT can also be obtained in the same way
. Then the cosine similarity is calculated for each line of CT V and RT V to
obtain the Word2vec V. The calculation process is as Fig. 2.</p>
      <p>wvi;j = cosine li1; lj2 , l1i and l2j are row vectors of CT V and RT V. The
cosine similarity is used here.</p>
      <p>Finally, we use li1 and lj2 to assign weights for rows and columns in Word2vec V
to get the Word2vec H. vali;j = ii1 ij2, as shown in Fig. 3(a).</p>
      <p>In content linkage, this paper uses the multi-feature fusion method and
the binary classi cation method by CNN. Multi-feature fusion methods include
voting1.1, voting2.0, Jaccard-Focused-new, and Jaccard-Cascade. We use the
Word2vec H feature composed of CT and RT as the input of CNN, and the
output is the related or unrelated category that CT and RT belong to. The
structure of CNN is showed in Fig. 3(b).
3.2</p>
      <sec id="sec-3-1">
        <title>Task 1B</title>
        <p>Facet Classi cation:Our system uses Rule-based methods and Machine
Learning Methods for Task 1B. Rule-based methods construct rules based on features
extracted from CTS, RP and CPs. According to the results in last year, we
only use Subtitle and High Frequency Word Combining Rule (SubHFW) this
time. As for Machine Learning methods, we apply Random Forest (RF), a
Voting Classi er consisting of 3 Gradient Boosting (GB) and Convolutional Neural
Network (CNN) to assign each CTS single or multiple facets. RF and GB take
Location of Paragraph, Document Position Ratio, Paragraph Position Ratio and
Number of Citations or References as input features while CNN takes the
matrix of word embedding of CTS as input. Finally, we combine all the results from
Rule-based methods and Machine Learning Methods to obtain a fusion result,
which is called Fusion method.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Task 2</title>
        <p>
          For task2, we would like to present an original Quality-Diversity model for
extractive automatic summarization based on the DPP sampling algorithm [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
In general, a document can be represented as a ground set of items. Each
sentence is a minimum item, and the extractive summary can be regarded as a
subset from ground set with high quality and low redundancy. Figure 4 shows
the framework of our system. The main process for summary generation consists
of Pre-processing, Feature Selection, Sentence Sampling and Post-processing.
Pre-processing First, we need to correct some xml-coding errors manually.
Latter, we have to make some preparations such as document merging, sentence
ltering and input le generation for hierarchical Latent Dirichlet Allocation
(hLDA). We merge the content of RP and the citations into a document for
CTS feature described below. Besides, all documents are converted to lowercase
letters. Then we lter the corpus for removing some equations, gures, tables
and generate input le for hLDA model which contains word index and their
corresponding frequencies.
        </p>
        <p>
          Feature Selection When it comes to document representation, we to build
matrix L from both partial (Statistical Feature Method) and holistic (Neural
Network Language Model) perspectives to ensure better sentence sampling for
summaries. First, we build matrix L through Lij = qiSij qj concretely, we adopt
Sentence Length (SL), Sentence Position (SP), Title similarity (TS), CTS, and
Hierarchical Topic Model (HTM) as features according to the work of Li L [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] for
quality and JACCARD similarity for diversity. We are looking forward to nding
a best linear combination of designed qualities in order to capture more obvious
characteristic for high-quality summary. Furthermore, we construct matrix L
through Lij = Bi&gt;Bj by the vectors B representing sentences from Sent2Vec
and LSA directly, and call this framework Neural Network Language Model.
Sentence Sampling We use DPPs to select sentences, which are elegant
probabilistic models of global, negative correlations and mostly used in quantum
physics to study the re ected Brownian motions. In our method, we only
consider discrete DPPs and follow the de nition of Kulesza A et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. We can
enhance the diversity of summary by using DPPs. In this way, given the L matrix
constructed on document sentences, the sampling method based on DPPs [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]
can automatically choose those diverse sentences with high quality as candidate
summary sentences.
        </p>
        <p>Post-processing Since we have already had the candidate summary sentences,
we can truncate the output summary with sentences ranking high in quality, limit
the summary to 250 words, and remove some white spaces in Post-processing.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Implementation and Experiments</title>
      <p>
        In our previous work, we obtained a lot of features. As shown in Table 2,
Features number indicates the number of features the method contains. The four
methods in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] have di erent e ects on the test data and the training data,
and the more features with good performance are used, the more stable the
performance of the testing set is. The more stable the performance is. Therefore, we
removed the features with poor performance on the training set, remaining the
features with good performance for fusion methods. We adjust the parameters of
the four fusion methods in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The four new fusion methods are voting1.2,
voting2.1, Jaccard-Focus-1.1, and Jaccard-Cascade-1.1. Since the LDA can discover
the topic information and LDA vector is sparse, lexicon (LDA) and LDA-cos
are removed and LDA-Jaccard is added. Since the lexicon (co-occurrence) only
includes words selected from the training set, when the di erence between the
testing set and the training set is great, the lexicon (co-occurrence) is ine ective.
In the experiment, we chose 600 dimensions for LDA vector and 200 dimensions
for word vector. Table 1 shows the parameter settings for our method. As to
experiments, we choose 600 dimension as LDA vector and 200 dimension as word
vector. Table 1 shows the parameter settings of our methods.
      </p>
      <p>In addition, with the increasing training data, we begin to try to solve task1A
with CNN. In this paper, we build the Word2vec H feature for the
sentencepair, so that we could reduce the dimensionality of the input and add the
cosine similarity to it.We use V-1.2, V-2.1, J-F-1.1, J-C-1.1, and W H-C to
represent Voting-1.1, Voting-2.0, JacCard-Focused-1.1, and Jaccard-Cascade-1.1,
Word2vec H-CNN respectively.In Table 1,W and P are Weight and Proportion
respectively. JS means 10 fold of Jaccard Similarity.</p>
      <p>
        According to Table 2 we predict that V-1.2 and J-F-1.1 will be more stable
on the testing set. The W H-C uses the data in "Training-Set-2019", and the
e ect is the worst due to some problems, such as data imbalance of training set
and complex structure of CNN.
From Table 2 and Table 3 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],we can get three conclusions:
      </p>
      <p>The number of features used in V-1.2 is less than V-2.1 and J-F-1.1, but
the result of V-1.2 is similar to V-2.1 and J-F-1.1. The number of features used
in V-1.2 is about the same with J-C-1.1, and the result of V-1.2 is better than
J-C-1.1. It shows that features used in V-1.2 play a leading role.</p>
      <p>The results of the runs in 2019 verify our prediction, that is, the more
features that are used, the more stable the performance on the test set is. So the
performance on the testing set and the training set of V-2.1 is very stable, as
well as J-F-1.1.</p>
      <p>After removing co-occurrence dictionary, (F-train) - (F-test) results are
smaller, which indicates that co-occurrence dictionary has limitations and should
be removed.
4.2</p>
      <sec id="sec-4-1">
        <title>Task 1B</title>
        <p>In this section, well introduce our methods applied for Task 1B in detail.
Rule-based Methods: Subtitle Rule: We use subtitles of CTS and citance to
determine which facet they belong to. If subtitles contain ve prede ned classes,
we categorize CTS and citance to corresponding facet. High Frequency Word
Rule: We use high frequency words of each class to classify CTS and citance. We
rst remove common words, and then set a threshold for each facet. Subtitle and
High Frequency Word Combining Rule: We rst apply Subtitle Rule to obtain
the facet. If it doesnt give an explicit answer, then we use High Frequency Words
Rule to obtain facet.</p>
        <p>Machine Learning Methods: Firstly, we extract features from CTS and
citance consisting of Location of Paragraph, Document Position Ratio, Paragraph
Position Ratio and Number of Citations or References and concatenate these
features into an 8-dimension vector. Then we train RF and GB based on the
features. As for CNN, the content of CTS is transformed to a matrix where ith
row corresponds to the word embedding of ith word and jth column represents
the jth dimensionality of the embedding. Then, we stack a convolutional layer
with multiply kernel sizes followed by a max-pooling layer. The architecture of
CNN is shown in Fig 5.</p>
        <p>Results on Train-Set-2019 are illustrated in Table 4. We nd that Voting and
SubHFW methods have a better performance. CNN performs poorer than we
expected since the training data set is too small for a neural network to learn.
And the dataset is imbalanced where method facet has more samples than other
facets.</p>
        <p>As for Task 1B, the results on Test-set-2019 show that SubHFW method
outperforms than other method and ranks rst among all methods, which indicates
that the features of subtitle and high frequency word are crucial to determine
the facet of each CTS. Moreover, textCNN method performs poorer than we
expected due to the demand of larger dataset.
4.3
The results below utilize Manual ROUGE values to evaluate our system
summary. During the evaluation phase, CL-SciSumm 2018 has provided THREE kinds
of criterion for option: the collection of citation sentences (the community
summary), faceted summaries of the traditional self-summary (the abstract), and
ones written by well-trained annotators (the human summary).</p>
        <p>Take community summary for instance, we test each feature SP ('0), SL
('1), TS ('2), HTM ('3), and CTS ('4) described in subsection 3.3 on statistical
fearture model independently to gure out its own contribution at rst. As the
CTS feature ('4) is specially designed, we tend not to present its individual
performance, but record and observe the binary combination with every other
basic feature.</p>
        <p>From TABLE 5, the best binary combination comes from TS ('2) and CTS
('4) features. One possible explanation is that the community summary itself has
already included these citation sentences. With the title containing the essence
of a paper, selected sentences following this ranking rule will de nitely guarantee
the overlapping on golden summaries.</p>
        <p>Analogically, we conduct experiments on other two kinds of golden
summaries, where the weights of parameters appear slightly di erent. From TABLE 6
and TABLE 7, which present the results of the community summary and human
summary separately: the best binary combination goes to the same tendency.
The phenomena of same best combination maybe interpreted implicitly that no
matter whether the sentences are cited otherwise or the summaries are written
by annotators, they two both are from the perspective of readers. Community
summaries consist of those citation sentences, and the sentences themselves are
extracted from the original documents, thus there's no wonder the ROUGE
evaluation is far higher than other kinds of summaries. However, human summary
is based on comprehension of readers. In this case we do extra experiments on
human summaries besides the same parameter setting as community summaries.
The best new combination as TABLE 7 shows is a little bit di erent from the
previous mere copies of community summaries. When it comes to human
summary, the more parameters are involved, the higher ROUGE F-score it reaches.
Unfortunately, for community summary, when we desire a further exploration
on binary combination, any additional attribute performs adversely. There are
a thousand Hamlets in a thousand people's eyes.</p>
        <p>As for the self-summary (the abstract), things presented in TABLE 8 are
opposite. Every binary combination with CTS ('4) feature are not that satis ed,
so we present each individual contribution of other statistical or topic features.
Also, we try the best parameter setting for community summary and human
summary both on abstract summary. Perhaps although we have tried our best
to follow the writers, there always exists a narrow gap between our readers'
comprehension and writers' original intention. This part of the experiment follows
a simple but practical principle that on the condition that we cannot fully
understand latent semantics the writers want to express, we still manage to deal
with some statistical features which help to extract important sentences. If the
summarizer is developed through this approach, it is not limited in a familiar
language and does not require any additional linguistic knowledge or complex
linguistic processing.</p>
        <p>Furthermore, when extracting sentences from the Neural Network Language
Model (using Sent2Vec/LSA representation for sentences), we choose the best
quality combination for community summary, human summary and abstract
summary. TABLE 9 suggests the Neural Network Language Model performance.
Besides, TABLE 10 shows the best results of several runs in BIRNDL 2019.
Among all the systems in competiton, our system won the rst prize for the
human summary, while the second place for abstract and community summary.
This year, we have added neural networks to the methods of three tasks. We hope
to make use of large training corpus to give the advantages of neural networks,
that is, deeply mining the meaning of the text. Rule-based and statistics-based
methods have achieved good performance, so we try to combine them with neural
networks. In the future work, Task 1A is expected to automatically adjust the
weight of features through neural network and combine multiple features better.
For Task 1B, more study should be done to reduce the impact of imbalanced data
on neural networks. Besides, more curial features are expected to be found since
the performance of machine learning methods is the best so far. In Task 2, we
expect the neural network language models to make contributions to obtain more
meaningful semantic representation for sentences against statistical features.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This work was supported in part by the Beijing Municipal Commission of
Scienceand Technology under Grant Z181100001018035; National Social Science</p>
      <p>Foundation of China under Grant 16ZDA055; National Natural Science
Foundation of China under Grant 91546121; Engineering Research Center of Information
Networks, Ministry of Education.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>CL-SciSumm 2019 Homepage</surname>
          </string-name>
          , http://wing.comp.nus.edu.sg/ cl-scisumm2019/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Wang</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>T</given-names>
          </string-name>
          , et al.
          <source>NUDT@ CLSciSumm-18In: Proceedings of the 3nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for</source>
          Digital Libraries[C]//BIRNDL@ SIGIR.
          <year>2018</year>
          :
          <fpage>102</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Davoodi</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madan</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu J. CLSciSumm Shared</surname>
          </string-name>
          <article-title>Task: On the Contribution of Similarity measure and Natural Language Processing Features for Citing Problem</article-title>
          [C]//BIRNDL@ SIGIR.
          <year>2018</year>
          :
          <fpage>96</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ma</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            <given-names>J</given-names>
          </string-name>
          , et al. NJUST@ CLSciSumm-18[C]//BIRNDL@ SIGIR.
          <year>2018</year>
          :
          <fpage>114</fpage>
          -
          <lpage>129</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kim</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Convolutional neural networks for sentence classi cation[J]</article-title>
          .
          <source>arXiv preprint arXiv:1408.5882</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Agrawal</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mittal</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <string-name>
            <surname>IIIT-H@</surname>
          </string-name>
          CLScisumm-18
          <source>In: Proceedings of the 3nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for</source>
          Digital Libraries[C]//BIRNDL@ SIGIR.
          <year>2018</year>
          :
          <fpage>130</fpage>
          -
          <lpage>133</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Baruah</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolla M. Klick Labs at</surname>
          </string-name>
          CL-SciSumm
          <year>2018</year>
          [C]//BIRNDL@ SIGIR.
          <year>2018</year>
          :
          <fpage>134</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Karimi</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moraes L F T</surname>
            , Das
            <given-names>A</given-names>
          </string-name>
          , et al. University of Houston@ CL-SciSumm
          <year>2017</year>
          :
          <article-title>Positional language Models, Structural Correspondence Learning</article-title>
          and Textual Entailment[C]//BIRNDL@ SIGIR (2).
          <year>2017</year>
          :
          <fpage>73</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Aburaed</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bravo</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiruzzo</surname>
            <given-names>L</given-names>
          </string-name>
          , et al. LaSTUS/TALN+ INCO@
          <string-name>
            <surname>CL-SciSumm 2018-Using Regression</surname>
          </string-name>
          and
          <article-title>Convolutions for Cross-document Semantic Linking and Summarization of Scholarly Literature[C]//</article-title>
          <source>Proceedings of the 3nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2018)</source>
          . Ann Arbor,
          <source>Michigan (July</source>
          <year>2018</year>
          ).
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Debnath</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Achom</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pakray</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>NLP-NITMZ@</surname>
          </string-name>
          CLScisumm-18
          <source>In: Proceedings of the 3nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for</source>
          Digital Libraries[C]//BIRNDL@ SIGIR.
          <year>2018</year>
          :
          <fpage>164</fpage>
          -
          <lpage>171</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Li</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>M</given-names>
          </string-name>
          , et al. CIST@ CLSciSumm-18:
          <article-title>Methods for Computational Linguistics Scienti c Citation Linkage, Facet Classi cation</article-title>
          and Summarization[C]//BIRNDL@ SIGIR.
          <year>2018</year>
          :
          <fpage>84</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Alex</given-names>
            <surname>Kulesza</surname>
          </string-name>
          and Ben
          <string-name>
            <surname>Taskar</surname>
          </string-name>
          (
          <year>2012</year>
          ),
          <article-title>Determinantal Point Processes for Machine Learning</article-title>
          ,
          <source>Foundations and Trends in Machine Learning</source>
          : Vol.
          <volume>5</volume>
          : No.
          <issue>23</issue>
          , pp
          <fpage>123</fpage>
          -
          <lpage>286</lpage>
          . http://dx.doi.org/10.1561/2200000044.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Li</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi</surname>
            <given-names>J</given-names>
          </string-name>
          et al.
          <source>UIDS: A Multilingual Document Summarization Framework Based on Summary Diversity</source>
          and
          <string-name>
            <given-names>Hierarchical</given-names>
            <surname>Topics</surname>
          </string-name>
          [M] // Li L,
          <string-name>
            <surname>Zhang</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi</surname>
            <given-names>J</given-names>
          </string-name>
          et al.
          <source>Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data</source>
          . Springer,
          <year>2017</year>
          :
          <year>2017</year>
          :
          <fpage>343</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Chandrasekaran</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yasunaga</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitag</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kan</surname>
            , M.-
            <given-names>Y.</given-names>
          </string-name>
          <article-title>"Overview and Results: CL-SciSumm SharedTask 2019"</article-title>
          ,
          <source>In Proceedings of the 4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL</source>
          <year>2019</year>
          ) @
          <source>SIGIR</source>
          <year>2019</year>
          , Paris, France.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>