<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CIST@CLSciSumm-18: Methods for Computational Linguistics Scienti c Citation Linkage, Facet Classi cation and Summarization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lei Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Junqi Chi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moye Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zuying Huang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yingqi Zhu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiangling Fu</string-name>
          <email>fuxianglingg@bupt.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Beijing University of Posts and Telecommunications (BUPT) No.</institution>
          <addr-line>10 Xitucheng Road, Haidian District, Beijing</addr-line>
          ,
          <country country="CN">P.R.China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Our system makes contributions to the shared Task 1A (citation linkage), Task 1B (facet classi cation) and Task 2 (summarization) in CLSciSumm-18@SIGIR2018. We develop it based on our former one called CIST@CLSciSumm-17 [7]. We try to improve the methods for all the shared tasks. We adopt Word Mover's Distance (WMD) and improve LDA model to calculate sentence similarity for citation linkage. We try more methods for facet classi cation. And in order to improve the performance of summarization, we also add WMD sentence similarity to construct new kernel matrix used in Determinantal Point Processes (DPPs).</p>
      </abstract>
      <kwd-group>
        <kwd>WMD</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>LDA</p>
      <p>DPPs Random Forest</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>
        With the development of science and network technology, more and more
scienti c literature appears, especially in Computational Linguistics (CL) domain.
We all make literature surveys in our research for a speci c topic to obtain
inspiration and novel approaches. However, it's time-consuming for human to analyze
all the related contents. The goal of CLSciSumm-18 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is to explore
summarization of scienti c research for CL domain, support research in automatic scienti c
document summarization and provide evaluation resources to push the current
state-of-the-art [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>CLSciSumm-18 contains Task 1A, Task 1B and Task 2. Each topic of
the training dataset and test one consists of a Reference Paper (RP) and
several Citing Papers (CPs) with citations to the RP. Task 1A is to identify the
spans of text (cited text spans, CTS) in the RP for each citance given the RP
and CPs. And all CTS might be a sentence fragment, a full sentence, or
several consecutive sentences (no more than 5). Task 1B requires that for each
CTS, we need to identify what facet it belongs to from a prede ned set of
facets (Aim Citation, Method Citation, Implication Citation, Results Citation
and Hypothesis Citation). We will generate a structured summary of the RP
in Task 2, in which there are two types: faceted summary of the traditional
self-summary and the community summary (the collection of citation sentences,
'citances').</p>
      <p>
        In this paper we will introduce our methods, strategies and experiments of
Task 1A, Task 1B and Task 2 based on our former one called
CIST@CLSciSumm17 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We try to apply new sentence similarity computed from WMD and
improved LDA (Latent Dirichlet Allocation) model with better topic features for
Task 1A. In Task 1B, we use more classi cation methods to obtain the facet
of CTS. In Task 2, we try WMD sentence similarity to construct kernel matrix
for improving the quality of Determinantal Point Processes (DPPs) sampling on
the basis of our former work on summarization [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Methods of information extraction and content linkage have sprung up recently,
which attract the interest of researchers, especially in the last two years.
Methods as well as results of CLSciSumm-2016 and CLSciSumm-2017 are described
in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The methods demonstrated in Task 1A are highly relevant to the
methods of calculating similarity. For example, Ma S et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] combine
Similaritybased features (LDA/Jaccard/IDF/TF-IDF/Doc2Vec similarity) with Rule-based
features to obtain citation linkage. Li L et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] also propose many similarity
methods. Zhang D et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] utilize Search-based Similarity Scoring and
Supervised Method. The calculation the Cosine Similarity was used in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Aburaed
et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] use Voting system to obtain the best result of Word Embeddings
Distance system, Modi ed Jaccard system and BabelNet Embeddings Distance
system. Methods based on measuring semantic textual similarity are used in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
Besides, other methods are also applied for citation linkage. Task 1A was
transformed to a query problem in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Di erent ranking models and query generation
strategies were applied in their system. Karimi et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] use the following
approaches: structural correspondence learning, positional language models and
textual entailment. For Task 1B, we treat it as classi cation problem. So many
classi cation methods are used in Task 1B. Classi cation methods are mainly
divided into two parts: Rule-based methods and supervised machine learning
methods [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Besides, some other methods are also used in Task 1B.
For example, Felber et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] transform the span of text into a query problem,
and then conduct a majority vote on the top ve retrieved results to determine
the discourse facet. Prasad et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] use classi cation and ranking method.
      </p>
      <p>
        As for summary generation in Task 2, some teams submitted their results in
BIRNDL 2017. Ma S et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] divide the process into two main steps. They group
sentences into di erent clusters by bisecting K-means, and then use maximal
marginal relevance (MMR) to extract sentence from each cluster and combine
them into a summary. Aburaed et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] score the sentence using multi-features
with di erent weights, and then get the summary according to the score. Li L
et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] make a linear combination of multiple features to compute sentence
quality. Besides, they also sample sentences based on Jaccard similarity and
sentence quality. We will try new similarity method to construct new kernel
matrix of DPPs for better summary.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Methods</title>
      <p>The framework of our system is shown in Fig. 1. We rst obtain the CTS in RP
for each citance in CPs, then use features extracted from CTS to determine its
facet, and nally we use CTS and its Facet to generate a summary (no more
than 250 words).</p>
      <p>7 D V N
&amp; L W D W L R Q
/ L Q N D J H</p>
      <p>) D F H W
&amp; O D V V L I L F D W L R Q</p>
      <sec id="sec-4-1">
        <title>JQDLNH/RW&amp;Q</title>
        <p>V&amp;QGD35</p>
      </sec>
      <sec id="sec-4-2">
        <title>QLRDFU[W(</title>
      </sec>
      <sec id="sec-4-3">
        <title>XUWHD)</title>
      </sec>
      <sec id="sec-4-4">
        <title>VRGKHW0</title>
        <p>
          7 D V N
Word Mover's Distance (WMD) is a method for calculating the distance of two
sentences or texts based on word vector and Earth Mover's Distance (EMD).
WMD distance measures the dissimilarity between two textual documents as
the minimum amount of distance that the embedded words of one document
need to "travel" to reach the embedded words of another document [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. We
apply WMD as the measurement for similarity of two sentences and two texts
in our system. Where, N and M are word number of two textual documents D
and D'. w is word vector, and dim represents word vector dimension. d and d0
are normalized bag-of-words vectors of D and D'.
澳
"11
"21
        </p>
        <p>&amp;
"*1
#
#
(
#
"1!$%
"2!$% 澳</p>
        <p>&amp;
"*!$%</p>
        <p>After removing stop words, we rst represent D and D' as two nBOW vectors
d and d0 . We then obtain word vector w of each word in D and D'. Finally we
can obtain the representation of D and D' shown in Fig. 2. The goal of WMD
is to incorporate the semantic similarity between individual word pairs (e.g.
President and Obama), and the Euclidean distance of two words in the word2vec
embedding space. The distance between word i and word j is c(i; j) = jjwi wj jj.
Word i and word j are from D and D' respectively. After getting d, d0 and c(i; j)
we can use EMD algorithm to obtain the minimum WMD.
Citation Linkage (Task 1A): The main processes are extracting features
from RP and CPs, and using Content Linkage Methods to obtain CTS for each
citance.</p>
        <p>
          Feature Extraction: This is extracting features from RP and CPs, which
contain Lexicons (high-frequency lexicon, LDA lexicon and co-occurrence lexicon),
Sentence similarity (WMD similarity, IDF similarity and Jaccard similarity),
Context similarities, Word vector, WordNet (jcn, lin, lch, res, wup and path
similarity) and CNN (Convolutional Neural Network) similarity. We calculate
the WordNet similarity between words in the two sentences to obtain a matrix.
Then we select the maximum value in the matrix, and remove the corresponding
row and column of the maximum value repeatedly until the matrix is null.
Finally we add up all maximum values selected in each iteration to a sum value and
the result is divided by plength1length2 to be similarity between sentences. The
process of computing Word vector similarity is the same as that of the WordNet
similarity. CNN uses word vector as the input to obtain the probability of
content linking from its output, and the output probability represents the similarity
of input sentences [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Most features are used in our former work [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] except for
Lexicon obtained by LDA model and the WMD applied for calculating Sentence
similarities and Context Similarities.
        </p>
        <p>In our previous work, we used LDA model only to train RP and CPs to
obtain the LDA lexicon of 20 latent topics for les in each topic. We improve
the LDA model to obtain better topic features. According to the LDA model
we denote a sentence S as an n-dimensional vector (LDA vector), such as S =
(x1; :::; xi; :::; xn). xi represents the probability of S which belongs to the ith
topic. Every citance and CTS can be represented as n-dimensional vectors
separately so that we could calculate their cosine similarity. We represent cosine
similarity of LDA vector as LDA-cos. The larger cosine similarity is, the more
similar they are. Compared with the old LDA method, the new LDA method
not only considers the number of same words belonging to the same topic in
citance and CTS, but also preserves the cohesion of topic distribution in them.</p>
        <p>Besides, we use WMD to calculate the similarity of two texts for enriching
similarity features.</p>
        <p>Content Linkage Methods: We use two methods which are Voting Method
and WMD Method. Voting Method means that nal results are obtained by
voting of all runs (which are the results given by features described in Feature
Extraction). WMD Method means that the results come from the similarity
calculated by WMD (we can call it WMD similarity). In the WMD similarity
method, rst we represent sentences as word vectors. Then we calculate the
WMD similarity between citance and CTS using word vectors. WMD refers to
the distance one speci c sentence requires to transform to another, so the smaller
the WMD is, the more similar the two sentences are.</p>
        <sec id="sec-4-4-1">
          <title>Facet Classi cation (Task 1B): Our system mainly uses Rule-based meth</title>
          <p>ods and Machine Learning Methods based on multiple features for Task 1B.
Rule-based methods contain Subtitle Rule (Sub), High Frequency Word Rule
(HFW) and Subtitle and High Frequency Word Combining Rule (SubHFW).
Rule-based methods construct rules based on features got from CTS, RP and
CPs. As for Machine Learning methods, we apply SVM, Decision Trees (DT)
and K-Nearest Neighbor (KNN) to obtain facet. Besides, we also train Random
Forest (RF), Gradient Boosting (GB) and Voting methods to obtain facet, which
are based on the idea of Ensemble Leaning. The features used in machine
learning methods contain Location of Paragraph, Document Position Ratio, Paragraph
Position Ratio and Number of Citations or References. Finally we combine all
the results to obtain a fusion result, which is called Fusion method.
3.3</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>Task 2</title>
          <p>The main process for summary generation consists of Pre-processing, Feature</p>
        </sec>
        <sec id="sec-4-4-3">
          <title>Selection, Sentence Sampling and Post-processing.</title>
          <p>Pre-processing: We need to correct some xml-coding errors rstly. Besides,
we have to make some preparations such as document merging, sentence ltering
and input le generation for hierarchical Latent Dirichlet Allocation (hLDA). We
merge the content of RP and the citations into a document. And we will not
extract the sentence in the abstract of RP except for that it is selected in Task
1A. Besides, all documents are converted to lowercase letters. We lter the corpus
for removing some equations, gures, tables and so on. Then we generate input
le for hLDA which contains word index and their corresponding frequencies.</p>
          <p>
            Feature Selection: We choose Sentence Length (SL), Sentence Position
(SP), CTS, Title similarity (TS) and Hierarchical Topic Model (HTM) as
features in our system according to the work of Li L [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. We use these features to
calculate sentence quality. Besides, we use WMD similarity as sentence
similarity, and combine it with sentence quality to construct kernel matrix of DPPs.
          </p>
          <p>
            Sentence Sampling: We use DPPs to select sentences, which are elegant
probabilistic models of global, negative correlations and mostly used in quantum
physics to study the re ected Brownian motions. In our method, we only consider
discrete DPPs and follow the de nition of Kulesza A et al. [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]. We can enhance
the diversity of summary by using DPPs. Furthermore, we also use Jaccard
similarity to construct kernel matrix as a comparison for the e ectiveness of
DPPs based on WMD similarity.
          </p>
          <p>Post-processing: We truncate the output summary to 250 words, and
remove some white spaces in Post-processing.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Implementation and Experiments</title>
      <p>We implement our system and use the o cial scripts to evaluate the training
data using ten cross-validation in Task 1. Training-Set-2018 and Test-Set-2018
provided by o cial are training data and test data respectively in our system.
4.1</p>
      <sec id="sec-5-1">
        <title>Task 1A</title>
        <p>
          In our previous work, for syntactic information, we have three lexicons, two
sentence similarities and two context similarities. All of them can measure
sentence similarity [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. For semantic information, we use word vector [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], WordNet
and CNN. In this paper, we combine two feature representations (LDA vector
and word vector) and two similarity calculation methods (EMD similarity and
cosine similarity). We obtain two new methods: LDA-cos and WMD. We used
the corpus crawled from "https://www.theguardian.com The Guardian" to train
the word embeddings. The size of the corpus is 835 MB. As to experiments, we
choose 600 dimensions for LDA vector and 300 dimensions for word vector. The
Task 1A methods are unsupervised. We have done some experiments under
conditions of di erent numbers of sentences in the result. Then we choose the
number used in our runs, which shows the best performance.
        </p>
        <p>
          Besides, we also improve two feature fusion methods: Voting-1.0 and
JaccardFocused in Li L et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Except for some parameter changes, we add and delete
some features of methods. Based on Voting-1.0 we obtain Voting-1.1, which
replaces Jaccard context similarity with LDA-cos similarity. Based on
JaccardFocused we obtain Jaccard-Focused-new, which adds jcn similarity and LDA-cos
similarity. Table 1 shows the parameter settings of our methods.
        </p>
        <p>In Table 1, W and P are Weight and Proportion respectively. V-1.1, V-1.0,
V2.0, J-F-new, J-F, J-C are Voting-1.1, Voting-1.0, Voting-2.0,
Jaccard-Focusednew, Jaccard-Focused and Jaccard-Cascade methods reprectively. JS means 10
fold of Jaccard Similarity. Owing to the performance of WMD similarity is very
poor on the training data, WMD similarity is not adopted in our feature fusion
methods.</p>
        <p>From Table 2, we nd that the performance of Voting-1.1 method is better
than Voting-1.0. It shows the validity of LDA-cos similarity. Besides, comparing
to Jaccard-Focused method, the performance of Jaccard-Focused-new is much
better.
4.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Task 1B</title>
        <p>Here, we mainly apply Rule-based Methods and Machine Learning Methods.</p>
      </sec>
      <sec id="sec-5-3">
        <title>Rule-based Methods:</title>
        <p>Subtitle Rule: We use the subtitles of CTS and citance to determine the facet.
If the subtitles contain words of ve prede ned classes, we categorize CTS and
citance as corresponding facet.</p>
        <p>High Frequency Word Rule: We apply high frequency words obtained from
ve classes to classify CTS and citance. We rst remove the common words, and
then set a threshold for each facet.</p>
        <p>Subtitle and High Frequency Word Combining Rule : We rst apply Subtitle
Rule to obtain the facet. If subtitles fail, we use High Frequency Words to obtain
nal facet.</p>
      </sec>
      <sec id="sec-5-4">
        <title>Machine Learning Methods:</title>
        <p>First we extract features from CTS and citance. The features are Location
of Paragraph, Document Position Ratio, Paragraph Position Ratio and Number
of Citations or References of CTS and citance, and they are put together in
an 8-dimension vector. Second we train SVM, DT, KNN, RF, GB and Voting
model with Training-Set-2016 and Training-Set-2017.</p>
        <p>From Table 3, we can nd that Sub, SubHFW, RF and Voting methods show
better performance in our experiments. Owing to Sub Methods are highly related
to subtitle, the method is full of uncertainty. In our submitted runs, we use RF,
SubHFW, Voting and Fusion methods as our nal methods for Task 1B.</p>
        <p>Owing to the missing of some Citance XML les in Test-Set-2018 released by
the o cial, we cannot extract features of CTS. In this situation, we set a xed
initial value as features for Task 1B in submitted Test-Set-2018 runs.
4.3</p>
      </sec>
      <sec id="sec-5-5">
        <title>Task 2</title>
        <p>
          In this part, our system provides a sample method based on DPPs [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] to extract
sentences when constructing a brief summary with no more than 250 words.
Determinantal point processes (DPPs) are elegant probabilistic models of
repulsion that origin in quantum physics and random matrix theory. The essential
characteristic of a DPP is that these binary variables are negative correlated.
As a result the sampling subset is a set of items that are diverse, this exactly
encourages a number of techniques working with diverse sets, especially in the
information retrieval community . A summary generated by an automatic system
requires the analogous principles: coverage of information, information signi
cance, redundancy in information and cohesion in text. Thus, we associate these
two objects together to build informative summaries through a sampling method
based on DPPs by selecting diverse sentences from documents. It takes not only
the ranking of the sentence quality themselves into account, but also the
correlation between these sentences. This approach was once fully described before
in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and was proven a competitive method based on the result feedback from
the CLSciSumm-17.
        </p>
        <p>As Task 2 requires a structured summary generated from the CTSs identi ed
in Task 1A, we consider the CTS as one crucial feature described in section 3.3
to help select sentences. Besides, SP, SL, TS and HTM feature are also included.
We try two speci c metrics to measure the cohesion quantitatively: JACCARD
calculates the proportion of same words precisely while WMD re ects the
transition cost from one sentence to another. During our contrast experiment, we
are looking forward to nding a best linear combination of qualities in order
to capture more obvious characteristic for high-quality summary, and exploring
relationship between sentences through comparison of di erent metrics for its
redundancy.</p>
        <p>The results below utilize Manual ROUGE values to evaluate our summaries.
During the evaluation phase, CLSciSumm-18 has provided THREE kinds of
criterion for option: the collection of citation sentences (the community summary),
faceted summaries of the traditional self-summary (the abstract), and ones
written by well-trained annotators (the human summary).</p>
        <p>Take community summary for instance, we test SP ('0), SL ('1), TS ('2),
HTM('3) and CTS ('4) feature independently to gure out its own contribution
at rst. As the CTS feature ('4) is speci cally designed, we tend not to present
its individual performance, but record and observe the binary combination with
every other basic feature.</p>
        <p>From Table 4, the best binary combination comes from TS ('2) and CTS
('4) features for WMD metric. One possible explanation is that the community
summary itself has already included these citation sentences. With the title
containing the essence of a paper, selected sentences following this ranking rule
will de nitely guarantee the overlapping on golden summaries.</p>
        <p>Analogically, we conduct experiments on other two kinds of golden
summaries, where the weights of parameters appear slightly di erent. Tables 5-7,
present the weights and results of the three golden summaries: the best binary
combinations go to the same tendency. However, when it comes to human
summary, the more parameters are involved, the higher ROUGE F-score it reaches.
Unfortunately, for community summary, when we desire a further exploration
on binary combination, any additional attribute performs adversely. The
phenomena of same best combination may be interpreted implicitly that no matter
whether the sentences are cited otherwise or the summaries are written by
annotators, the two both are from the perspective of readers. There are a thousand
Hamlets in a thousand people's eyes. As for the self-summary (the abstract),
every binary combination with CTS ('4) feature are not that satis ed, so we
present each individual contribution of other statistical or topic features.
Perhaps although we have tried our best to follow the writers, there always exists a
narrow gap between our readers' comprehension and writers' original intention.
In general, despite the two diversity metrics are somehow evenly matched on this
dataset, the best result in Table 5, the 1th row comes from WMD metric, thus
we rmly believe the newly proposed algorithm is just on its way, still remains
full potential to be discovered.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Future Work</title>
      <p>In this paper, we propose some new methods to improve the performance of
Task 1 and Task 2 based on our former work, especially in similarity calculation.
We apply WMD method and LDA-cos to calculate similarity and generate
summaries. In future, we will continue to improve these methods and incorporate
new methods based on the o cial results by CLSciSumm-18.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work was supported by National Social Science Foundation of China [grant
number 16ZDA055]; National Natural Science Foundation of China [grant
numbers 91546121, 71231002]; EU FP7 IRSES MobileCloud Project [grant number
612212]; the 111 Project of China [grant number B08004]; Engineering Research
Center of Information Networks, Ministry of Education; Beijing BUPT
Information Networks Industry Institute Company Limited; the project of Beijing
Institute of Science and Technology Information; the project of CapInfo
Company Limited.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>CL-SciSumm 2018 Homepage</surname>
          </string-name>
          , http://wing.comp.nus.edu.sg/ birndl-sigir2018/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chandrasekaran M K</surname>
            , Jaidka
            <given-names>K</given-names>
          </string-name>
          , Mayr P. Joint Workshop on Bibliometric-enhanced
          <source>Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL</source>
          <year>2017</year>
          )[C]//Proceedings of the 40th
          <source>International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM</source>
          ,
          <year>2017</year>
          :
          <fpage>1421</fpage>
          -
          <lpage>1422</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Li</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi</surname>
            <given-names>J</given-names>
          </string-name>
          , et al.
          <source>UIDS: A Multilingual Document Summarization Framework Based on Summary Diversity and Hierarchical Topics[M]//Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data</source>
          . Springer, Cham,
          <year>2017</year>
          :
          <fpage>343</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jaidka</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chandrasekaran</surname>
            <given-names>M K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rustagi</surname>
            <given-names>S</given-names>
          </string-name>
          , et al.
          <article-title>Insights from CL-SciSumm 2016: the faceted scienti c document summarization Shared Task</article-title>
          [J].
          <source>International Journal on Digital Libraries</source>
          ,
          <year>2017</year>
          :
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jaidka</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chandrasekaran</surname>
            <given-names>M K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            <given-names>D</given-names>
          </string-name>
          , et al.
          <article-title>The CL-SciSumm shared task 2017: results and key insights[C]//Proceedings of the Computational Linguistics Scienti c Summarization Shared Task (CL-SciSumm 2017), organized as a part of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL</article-title>
          <year>2017</year>
          ).
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ma</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>J</given-names>
          </string-name>
          , et al.
          <source>NJUST@ CLSciSumm-17[C]//Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Li</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            <given-names>L</given-names>
          </string-name>
          , et al.
          <source>CIST@ CLSciSumm-17: Multiple Features Based Citation Linkage</source>
          ,
          <article-title>Classi cation</article-title>
          and Summarization[C]/
          <source>/Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zhang</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>S. PKU</given-names>
          </string-name>
          @ CLSciSumm-17: Citation Contextualization[C]/
          <source>/Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Pramanick</surname>
          </string-name>
          ,
          <string-name>
            <surname>Aniket</surname>
          </string-name>
          , et al.
          <article-title>"SciSumm 2017: Employing Word Vectors for Identifying, Classifying and Summarizing Scienti c Documents."</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Aburaed</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ahmed</surname>
          </string-name>
          , et al.
          <article-title>"LaSTUS/TALN@ CLSciSumm-17: cross-document sentence matching and scienti c text summarization systems</article-title>
          .
          <source>"</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lauscher</surname>
            , Anne,
            <given-names>Goran</given-names>
          </string-name>
          <string-name>
            <surname>Glava</surname>
            , and
            <given-names>Kai</given-names>
          </string-name>
          <string-name>
            <surname>Eckert</surname>
          </string-name>
          . "University of Mannheim@ CLSciSumm-17:
          <article-title>Citation-Based Summarization of Scienti c Articles Using Semantic Textual Similarity</article-title>
          .
          <source>"</source>
          (
          <year>2017</year>
          )
          <article-title>: tba</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Felber</surname>
            , Thomas,
            <given-names>and Roman</given-names>
          </string-name>
          <string-name>
            <surname>Kern</surname>
          </string-name>
          . "Graz University of Technology at CL-SciSumm
          <year>2017</year>
          :
          <article-title>Query Generation Strategies."</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Karimi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Samaneh</surname>
          </string-name>
          , et al. "University of Houston@ CL-SciSumm
          <year>2017</year>
          :
          <article-title>Positional language Models, Structural Correspondence Learning</article-title>
          and
          <string-name>
            <given-names>Textual</given-names>
            <surname>Entailment</surname>
          </string-name>
          ."
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Prasad</surname>
          </string-name>
          , Animesh.
          <article-title>"WING-NUS at CL-SciSumm 2017: Learning from syntactic and semantic similarity for citation contextualization</article-title>
          .
          <source>" Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          ).
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kusner</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolkin</surname>
            <given-names>N</given-names>
          </string-name>
          , et al. From word embeddings to document distances[C]//International Conference on Machine Learning.
          <year>2015</year>
          :
          <fpage>957</fpage>
          -
          <lpage>966</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kulesza</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taskar</surname>
            <given-names>B</given-names>
          </string-name>
          .
          <article-title>Determinantal point processes for machine learning</article-title>
          [J].
          <source>Foundations and Trends in Machine Learning</source>
          ,
          <year>2012</year>
          ,
          <volume>5</volume>
          (
          <issue>23</issue>
          ):
          <fpage>123</fpage>
          -
          <lpage>286</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>