<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The CL-SciSumm Shared Task 2017: Results and Key Insights</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kokil Jaidka</string-name>
          <email>jaidka@sas.upenn.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muthu Kumar Chandrasekaran</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Devanshu Jain</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Min-Yen Kan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computing, National University of Singapore</institution>
          ,
          <country country="SG">Singapore</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Smart Systems Institute, National University of Singapore</institution>
          ,
          <country country="SG">Singapore</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Pennsylvania</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The CL-SciSumm Shared Task is the rst medium-scale shared task on scienti c document summarization in the computational linguistics (CL) domain. In 2017, it comprised three tasks: (1A) identifying relationships between citing documents and the referred document, (1B) classifying the discourse facets, and (2) generating the abstractive summary. The dataset comprised 40 annotated sets of citing and reference papers from the open access research papers in the CL domain. This overview describes the participation and the o cial results of the CLSciSumm 2017 Shared Task, organized as a part of the 40th Annual Conference of the Special Interest Group in Information Retrieval (SIGIR), held in Tokyo, Japan in August 2017. We compare the participating systems in terms of two evaluation metrics and discuss the use of ROUGE as an evaluation metric. The annotated dataset used for this shared task and the scripts used for evaluation can be accessed and used by the community at: https://github.com/WING-NUS/scisumm-corpus.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        CL-SciSumm explores summarization of scienti c research in the domain of
computational linguistics research. It encourages the incorporation of new kinds of
information in automatic scienti c paper summarization, such as the facets of
research information being summarized in the research paper. CL-SciSumm also
encourages the use of citing mini-summaries written in other papers, by other
scholars, when they refer to the paper. The Shared Task dataset comprises the
set of citation sentences (i.e., \citances") that reference a speci c paper as a
(community-created) summary of a topic or paper [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Citances for a reference
paper are considered a synopses of its key points and also its key contributions
and importance within an academic community [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The advantage of using
citances is that they are embedded with meta-commentary and o er a contextual,
interpretative layer to the cited text. Citances o er a view of the cited paper
which could complement the reader's context, possibly as a scholar [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] or a writer
of a literature review [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>The CL-SciSumm Shared Task is aimed at bringing together the
summarization community to address challenges in scienti c communication
summarization. Over time, we anticipate that the Shared Task will spur the creation of
new resources, tools and evaluation frameworks.</p>
      <p>
        A pilot CL-SciSumm task was conducted at TAC 2014, as part of the larger
BioMedSumm Task4. In 2016, a second CL-Scisumm Shared Task [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] was held
as part of the Joint Workshop on Bibliometric-enhanced Information Retrieval
and Natural Language Processing for Digital Libraries (BIRNDL) workshop [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
at the Joint Conference on Digital Libraries (JCDL5). This paper provides the
results and insights from CL-SciSumm 2017, which was held as part of
subsequent BIRNDL 2017 workshop[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] at the annual ACM Conference on Research
and Development in Information Retrieval (SIGIR6).
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Task</title>
      <p>CL-SciSumm de ned two serially dependent tasks that participants could
attempt, given a canonical training and testing set of papers.</p>
      <p>Given: A topic consists of a Reference Paper (RP) and ten or more Citing
Papers (CPs) that all contain citations to the RP. In each CP, the text spans
(i.e., citances) have been identi ed that pertain to a particular citation to the
RP. Additionally, the dataset provides three types of summaries for each RP:
{ the abstract, written by the authors of the research paper.
{ the community summary, collated from the reference spans of its citances.
{ a human-written summary, written by the annotators of the CL-SciSumm
annotation e ort.</p>
      <p>Task 1A: For each citance, identify the spans of text (cited text spans) in the
RP that most accurately re ect the citance. These are of the granularity of a
sentence fragment, a full sentence, or several consecutive sentences (no more than 5).
Task 1B: For each cited text span, identify what facet of the paper it belongs
to, from a prede ned set of facets.</p>
      <p>Task 2: Finally, generate a structured summary of the RP from the cited text
spans of the RP. The length of the summary should not exceed 250 words. This
was an optional bonus task.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Development</title>
      <p>We built the CL-SciSumm corpus by randomly sampling research papers
(Reference papers, RPs) from the ACL Anthology corpus and then downloading the
4 http://www.nist.gov/tac/2014
5 http://www.jcdl2016.org/
6 http://sigir.org/sigir2017/
citing papers (CPs) for those which had at least ten citations. The prepared
dataset then comprised annotated citing sentences for a research paper, mapped
to the sentences in the RP which they referenced. Summaries of the RP were
also included.</p>
      <p>The CL-SciSumm 2017 corpus included a re ned version of the CL-SciSumm
2016 corpus of 30 RPs as a training set, in order to encourage teams from the
previous edition to participate. The test set was an additional corpus of 10 RPs.</p>
      <p>
        Based on feedback from CL-SciSumm 2016 task participants, we re ned the
training set as follows:
{ In cases where the annotators could not place the citance to a sentence in
the referred paper, the citance was discarded. In prior versions of the task,
annotators were required to reference the title (Reference O set: ['0']) but the
participants complained that this resulted in a drop in system performance.
{ Citances were deleted if they mentioned the referred paper in a clause as a
part of multiple references and did not cite speci c information about it.
For details of the general procedure followed to construct the CL-SciSumm
corpus, and changes made to the procedure in CL-SciSumm-2016, please see [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
3.1
      </p>
      <p>Annotation
The annotation scheme was unchanged from what was followed in previous
editions of the task and the original BiomedSumm task developed by Cohen et.
al7: Given each RP and its associated CPs, the annotation group was instructed
to nd citations to the RP in each CP. Speci cally, the citation text, citation
marker, reference text, and discourse facet were identi ed for each citation of
the RP found in the CP.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Overview of Approaches</title>
      <p>Nine systems participated in Task 1 and a subset of ve also participated in
Task 2. The following paragraphs discuss the approaches followed by the
participating systems, in lexicographic order by team name.</p>
      <p>
        The Beijing University of Posts and Telecommunications team from their
Center for Intelligence Science and Technology (CIST, [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]) followed an approach
similar to their 2016 system submission [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. They calculated a set of similarity
metrics between reference spans and citance { idf similarity, Jaccard similarity,
and context similarity. They submitted six system runs which combined
similarity scores using a fusion method, a Jaccard Cascade method, a Jaccard Focused
method, an SVM method and two ensemble methods using voting.
      </p>
      <p>
        The Jadavpur University team (Jadavpur, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) participated in all of the tasks.
For Task 1A, they de ned a cosine similarity between texts. The reference paper's
7 http://www.nist.gov/tac/2014
sentence with the highest score is selected as the reference span. For Task 1B,
they represent each discourse facet as a bag of words of all the sentences having
that facet. Only words with the highest tf:idf values are chosen. To identify the
facet of a sentence, they calculated the cosine similarity between a candidate
sentence vector and each bag's vector. The bag with the highest similarity is
deemed the chosen facet. For Task 2, a similarity score was calculated between
pairs of sentences belonging to the same facets. If the resultant score is high,
only a single sentence of the two is added to the summary.
      </p>
      <p>
        Nanjing University of Science and Technology team (NJUST, [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ])
participated in all of the tasks (Tasks 1A, 1B and 2). For Task 1A, they used a weighted
voting-based ensemble of classi ers (linear support vector machine (SVM), SVM
using a radial basis function kernel, Decision Tree and Logistic Regression) to
identify the reference span. For Task 1B, they created a dictionary for each
discourse facet and labeled the reference span with the facet if its dictionary
contained any of the words in the span. For Task 2, they used bisecting
Kmeans to group sentences in di erent clusters and then used maximal marginal
relevance to extract sentences from each cluster and combine into a summary.
      </p>
      <p>
        National University of Singapore WING (NUS WING, [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]) participated in
Tasks 1A and B. They followed a joint-scoring approach, weighting surface-level
similarity using tf:idf and longest common subsequence (LCS), and semantic
relatedness using a pairwise neural network ranking model. For Task 1B, they
retro tted their neural network approach, applying it to output of Task 1A.
      </p>
      <p>
        The Peking University team (PKU, [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]) participated in Task 1A. They
computed features based on sentence-level and character-level tf:idf scores and
word2vec similarity and used logistic regression to classify sentences as being
reference spans or not.
      </p>
      <p>
        The Graz University of Technology team (TUGRAZ, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) participated in
Tasks 1A and 1B. They followed an information retrieval style approach for
Task 1A, creating an index of the reference papers and treating each citance as
a query. Results were ranked according to a vector space model and BM25. For
Task 1B, they created an index of cited text along with the discourse facet(s).
To identify the discourse facet of the query, a majority vote was taken among
the discourse facets found in the top 5 results.
      </p>
      <p>
        The University of Houston team (UHouston, [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) used a combination of lexical
and syntactic features for Task 1A, based on the position of text and textual
entailment They tackled Task 1B using WordNet expansion.
      </p>
      <p>
        The University of Mannheim team (UniMA, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) also participated in all of
the tasks. For Task 1A, they used supervised learning to rank paradigm to rank
the sentences in the reference paper using features such as lexical similarity,
semantic similarity, entity similarity and others. They formulated Task 1B, as a
one-versus-all multi-class classi cation. They used an SVM and a trained
convolutional neural network (CNN) for each of the ve binary classi cation tasks.
For Task 2, they clustered the sentences using single pass clustering algorithm
using a Word Mover's similarity measure and sorted the sentences in each cluster
according to their Text Rank score. Then they ranked the clusters according to
the average Text Rank score. Top sentences were picked from the clusters and
added to summary until the word limit of 250 words was reached.
      </p>
      <p>
        Finally, the Universitat Pompeu Fabra team (UPF, [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) participated in Tasks 1A,
1B and 2. For Task 1A, they used a weighted voting ensemble of systems that
used word embedding distance, modi ed Jaccard distance and BabelNet
embedding distance. They formulated Task 1B as a one-versus-all multi-class classi
cation. For Task 2, they trained a linear regression model to learn the scoring
function (approximated as cosine similarity between reference paper's sentence
vector and summary vector) of each sentence.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>An automatic evaluation script was used to measure system performance for
Task 1A, in terms of the sentence ID overlaps between the sentences identi ed
in system output, versus the gold standard created by human annotators. The
raw number of overlapping sentences were used to calculate the precision, recall
and F1 score for each system. We followed the approach in most SemEval tasks
in reporting the overall system performance as its micro-averaged performance
over all topics in the blind test set.</p>
      <p>
        Additionally, we calculated lexical overlaps in terms of the ROUGE-2 and
ROUGE-SU4 scores [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] between the system output and the human annotated
gold standard reference spans.
      </p>
      <p>
        ROUGE scoring was used for CL-SciSumm 17, for Tasks 1a and Task 2.
Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is a set of
metrics used to automatically evaluate summarization systems [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] by measuring
the overlap between computer-generated summaries and multiple human
written reference summaries. In previous studies, ROUGE scores have signi cantly
correlated with human judgments on summary quality [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Di erent variants
of ROUGE di er according to the granularity at which overlap is calculated.
For instance, ROUGE{2 measures the bigram overlap between the candidate
computer-generated summary and the reference summaries. More generally, ROUGE{
N measures the n-gram overlap. ROUGE{L measures the overlap in Longest
Common Subsequence (LCS). ROUGE{S measures overlaps in skip-bigrams or
bigrams with arbitrary gaps in-between. ROUGE-SU uses skip-bigram plus
unigram overlaps. CL-SciSumm 2017 uses ROUGE-2 and ROUGE-SU4 for its
evaluation.
      </p>
      <p>Task 1B was evaluated as a proportion of the correctly classi ed discourse
facets by the system, contingent on the expected response of Task 1A. As it is a
multi-label classi cation, this task was also scored based on the precision, recall
and F1 scores.</p>
      <p>Task 2 was optional, and also evaluated using the ROUGE{2 and ROUGE{
SU4 scores between the system output and three types of gold standard
summaries of the research paper: the reference paper's abstract, a community
summary, and a human summary.</p>
      <p>The evaluation scripts have been provided at the CL-SciSumm Github
repository8 where the participants may run their own evaluation and report the results.</p>
    </sec>
    <sec id="sec-6">
      <title>6 Results</title>
      <p>This section compares the participating systems in terms of their performance.
Five of the nine system that did Task 1 also did the bonus Task 2. Following
are the plots with their performance measured by ROUGE{2 and ROUGE{SU4
against the 3 gold standard summary types. The results are provided in Table 1
and Figure 1. The detailed implementation of the individual runs are described
in the system papers included in this proceedings volume.</p>
      <p>
        For Task 1A, the best performance was shown by three of the ve runs from
NJUST [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Their performance was closely followed by TUGRAZ [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The third
best system was CIST [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] which was also the best performer for Task 1B. The
8 github.com/WING-NUS/scisumm-corpus
      </p>
      <p>
        For Task 2, CIST had the best performance against the abstract, community
and human summaries [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. UPF [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] had the next best performances against the
abstract and community summaries while NJUST [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and UniMA [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] were close
runner-ups against the human summaries.
      </p>
      <p>In this edition of the task, we used ROUGE-1 as a more lenient way to
evaluate Task 1A { however, as Figure 1 shows, many systems' performance
on ROUGE scores was lower than on the exact match F1. The reasons for this
aberration are discussed in Section 7.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Error Analysis</title>
      <p>
        We carefully considered participant feedback from CL-Scisumm 2016 Task [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
and made a few changes to the annotation rules and evaluation procedure. We
discuss the key insights from Task 1A, followed by Task 2.
      </p>
      <p>Task 1A: In 2017, we introduced the ROUGE metric to evaluate Task 1A,
which we anticipated would be a more lenient way to score the system runs,
especially since it would consider bigrams separated by over up to four words.
However, we found that system performance on ROUGE was not always more
lenient than sentence overlap F1 scores. Table 3 provides some examples to
demonstrate how the ROUGE score is biased to prefer shorter sentences over
longer ones. ROUGE scores are calculated for candidate reference spans (RS)
from system submissions against the gold standard (GS) reference span (Row 1
of Table 3). Here, we consider 3 examples, each with a pair of RS compared
with one another. The RS of Submission 2 is shorter than that of Submission 1.
Both systems retrieve one correct sentence (overlap with GS) and one incorrect
sentence. Although F1 score overlap for exact match of sentences for both will
be the same, the ROUGE score for Submission 2 (shorter) is greater than that
of Submission 1. In the next example, neither system retrieves a correct match.
Submission 1 is shorter than that of Submission 2. The exact match for both
systems are the same: 0. However the ROUGE scores for Submission 1 (shorter)
is higher than that of Submission 2. In the last example, both the submissions
correctly retrieve GS. However, they also retrieve an additional false positive
sentence. Submission 1's RS is longer than Submission 2. Similar to the
previous example, ROUGE score for Submission 1 (shorter) is less than that of
Submission 2.</p>
      <p>
        Evaluation on ROUGE recall instead of ROUGE F1 will prevent longer
candidate summaries from being penalized. However, there is a caveat { a system
would retrieve the entire article (RP) as the reference span and achieve the
highest ROUGE recall. On sorting all the system runs by their average recall
measure, we nd that the submission by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] ranked the rst. Considering the
overall standing of this system, we infer that there were probably a lot of false
positives due to there being a lack of a stringent word limit. In future tasks, we
will impose a limit on the length of the reference span that can be retrieved.
Although our documentation advised participants to return reference spans of
three sentences or under, we did not penalize longer outputs in our evaluation.
      </p>
      <p>
        On the other hand, evaluation on ROUGE precision would encourage systems
to return single-sentences with high information overlap. A large body of work
in information retrieval and summarization has measured system performance
in terms of task precision. In fact, as argued by Felber and Kern [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Task 1A can
be considered akin to an information retrieval or a question answering task. We
can then use standard IR performance measures such as Mean Average Precision
(MAP) over all the reference spans. We plan to pilot this measure in the next
edition of the Shared Task.
      </p>
      <p>Task 1A topic level meta-analysis: We conducted a meta-analysis of
system performances for Task 1A over all the topics in the test set. We observed
that for only one of the ten test topics (speci cally, W09-0621), the average F1
score was one standard deviation away from the mean average F1 score of all the
topics taken together. At the topic level, we observed that the largest variances
in system performance were for W11-0815, W09-0621, D10-1058 and P07-1040,
for which nearly two-thirds of all the submitted runs had an ROUGE or an
overlap F1 score that was more than one standard deviation from the average
F1 score for that topic. We note that since most of the participants submitted
multiple runs, some of these variances are isolated to all the runs submitted by
a couple of teams (speci cally, NJUST and UniMA) and may not necessarily
re ect an aberration with the topic itself. All participants were recommended to
closely examine the outputs these and other topics during their error analysis.
They can refer to the topic-level results posted in the Github repository of the
CL-SciSumm dataset 9.</p>
      <p>Task 1B: Systems reported di culty in classifying discourse facets (classes)
with few datapoints. The class distribution, in general, is skewed towards the
`Method' facet. Systems reported that the class imbalance could not be countered
e ectively by class weights. This suggests that the `Method' facet is composed
of other sub-facets which need to be identi ed and annotated as ground truth.</p>
      <p>
        Task 2: While considering the results from Task 2, we observed that
ensemble approaches were the most e ective against all three sets of gold standard
summaries. Some systems { for instance, the system by Abura`Ed et. al [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] {
tailored their summary generation approach to improve on one type of summary
at a time. We plan to discourage this approach in future tasks, as we envision
that systems would converge towards a general, optimal method for generating
salient scienti c summaries. Based on the results from CL-SciSumm 2016, we
had expected that approaches that did well against human summaries would
also do well against community summaries. However, no such inferences could
be made from the results of CL-SciSumm 2017. In the case of NJUST [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], one
of their approaches (Run 2) was among the top approaches against abstract and
human summaries, but was a poor performer against the community summaries.
On the other hand, di erent runs by UPF [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] performed well against di erent
9 https://github.com/WING-NUS/scisumm-corpus
summaries. One of their runs (`UPF acl com') was among the top against human
and community summaries but was near the bottom against abstract summaries.
8
      </p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>
        Nine systems participated in CL-SciSumm 2017 shared tasks. The tasks provided
a larger corpus with further re nements over 2016. Compared with 2016, the task
attracted additional submissions that attempted neural network-based methods.
Participants also experimented with the use of word embeddings trained on the
shared task corpus, as well as on other domain corpora. We recommend that
future approaches should go beyond o -the-shelf deep learning methods, and also
exploit the structural and semantic characteristics that are unique to scienti c
documents; perhaps as an enrichment device for word embeddings. The results
from 2016 suggest that the scienti c summarization task lends itself as a suitable
problem for transfer learning [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        For CL-SciSumm 2018, we are planning to collaborate with Yale University
and introduce semantic concepts from the ACL Anthology Network [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>Acknowledgement. We would like to thank Microsoft Research Asia,
for their generous funding. We would also like to thank Vasudeva Varma
and colleagues at IIIT-Hyderabad, India and University of Hyderabad
for their e orts in convening and organizing our annotation workshops.
We acknowledge the continued advice of Hoa Dang, Lucy Vanderwende
and Anita de Waard from the pilot stage of this task. We would also
like to thank Rahul Jha and Dragomir Radev for sharing their software
to prepare the XML versions of papers. We are grateful to Kevin B.
Cohen and colleagues for their support, and for sharing their annotation
schema, export scripts and the Knowtator package implementation on
the Protege software { all of which have been indispensable for this shared
task.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Abura'Ed,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Chiruzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Saggion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Accuosto</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          , lex Bravo:
          <article-title>LaSTUS/TALN @ CL-SciSumm-17: Cross-document Sentence Matching and Scienti c Text Summarization Systems</article-title>
          .
          <source>In: Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Conroy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Vector space and language models for scienti c document summarization</article-title>
          .
          <source>In: NAACL-HLT</source>
          . pp.
          <volume>186</volume>
          {
          <fpage>191</fpage>
          . Association of Computational Linguistics, Newark, NJ, USA (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Dipankar</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.</given-names>
            ,
            <surname>Pramanick</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Employing Word Vectors for Identifying,Classifying and Summarizing Scienti c Documents</article-title>
          .
          <source>In: Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Felber</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kern</surname>
          </string-name>
          , R.:
          <article-title>Query Generation Strategies for CL-SciSumm 2017 Shared Task</article-title>
          .
          <source>In: Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jaidka</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chandrasekaran</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rustagi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kan</surname>
          </string-name>
          , M.Y.:
          <article-title>Insights from clscisumm 2016: the faceted scienti c document summarization shared task</article-title>
          .
          <source>International Journal on Digital</source>
          Libraries pp.
          <volume>1</volume>
          {
          <issue>9</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Jaidka</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khoo</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Na</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          :
          <article-title>Deconstructing human literature reviews{a framework for multi-document summarization</article-title>
          .
          <source>In: Proc. of ENLG</source>
          . pp.
          <volume>125</volume>
          {
          <issue>135</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>K.S.:</given-names>
          </string-name>
          <article-title>Automatic summarising: The state of the art</article-title>
          .
          <source>Information Processing and Management</source>
          <volume>43</volume>
          (
          <issue>6</issue>
          ),
          <volume>1449</volume>
          {
          <fpage>1481</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Karimi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verma</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moraes</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          : University of Houston at CL-SciSumm
          <year>2017</year>
          .
          <source>In: Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lauscher</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glavas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eckert</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Citation-Based Summarization of Scienti c Articles Using Semantic Textual Similarity</article-title>
          .
          <source>In: Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cong</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
          </string-name>
          , H.:
          <article-title>CIST System for CL-SciSumm 2016 Shared Task</article-title>
          .
          <source>In: Proc. of the Joint Workshop on Bibliometricenhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016)</source>
          . pp.
          <volume>156</volume>
          {
          <fpage>167</fpage>
          .
          <string-name>
            <surname>Newark</surname>
          </string-name>
          , NJ, USA (
          <year>June 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          : CIST@CLSciSumm-17
          <source>: Multiple Features Based Citation Linkage</source>
          ,
          <article-title>Classi cation and Summarization</article-title>
          .
          <source>In: Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.Y.</given-names>
          </string-name>
          :
          <article-title>Rouge: A package for automatic evaluation of summaries</article-title>
          .
          <source>Text summarization branches out: Proceedings of the ACL-04 workshop 8</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Correlation between rouge and human evaluation of extractive meeting summaries</article-title>
          .
          <source>In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers</source>
          . pp.
          <volume>201</volume>
          {
          <fpage>204</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , S.,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <surname>C.</surname>
          </string-name>
          : NJUST@CLSciSumm-
          <fpage>17</fpage>
          .
          <source>In: Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mayr</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chandrasekaran</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaidka</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Editorial for the 2nd joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL) at SIGIR 2017</article-title>
          .
          <source>In: Proceedings of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL</source>
          <year>2017</year>
          )
          <article-title>co-located with the 40th</article-title>
          <source>International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2017</year>
          ), Tokyo, Japan,
          <year>August 11</year>
          ,
          <year>2017</year>
          . pp.
          <volume>1</volume>
          {
          <issue>6</issue>
          (
          <issue>2017</issue>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-1888/editorial.pdf
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Mayr</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frommholz</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cabanac</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolfram</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Editorial for the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL) at JCDL 2016</article-title>
          .
          <source>In: Proc. of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016)</source>
          . pp.
          <volume>1</volume>
          {
          <issue>5</issue>
          . Newark, NJ, USA (
          <year>June 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwartz</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hearst</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Citances:
          <article-title>Citation sentences for semantic analysis of bioscience text</article-title>
          .
          <source>In: Proceedings of the SIGIR'04 workshop on Search and Discovery in Bioinformatics</source>
          . pp.
          <volume>81</volume>
          {
          <issue>88</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>WING-NUS at CL-SciSumm 2017: Learning from Syntactic and Semantic Similarity for Citation Contextualization</article-title>
          .
          <source>In: Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Qazvinian</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Scienti c paper summarization using citation summary networks</article-title>
          .
          <source>In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume</source>
          <volume>1</volume>
          . pp.
          <volume>689</volume>
          {
          <fpage>696</fpage>
          .
          <string-name>
            <surname>ACL</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muthukrishnan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qazvinian</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>The acl anthology network corpus</article-title>
          .
          <source>In: Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries</source>
          . pp.
          <volume>54</volume>
          {
          <fpage>61</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , D.: PKU @ CLSciSumm-17: Citation Contextualization.
          <source>In: Proc. of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2017)</source>
          . Tokyo, Japan (
          <year>August 2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>