=Paper=
{{Paper
|id=Vol-2414/paper17
|storemode=property
|title=Overview and Results: CL-SciSumm Shared Task 2019
|pdfUrl=https://ceur-ws.org/Vol-2414/paper17.pdf
|volume=Vol-2414
|authors=Muthu Kumar Chandrasekaran,Michihiro Yasunaga,Dragomir Radev,Dayne Freitag,Min-Yen Kan
|dblpUrl=https://dblp.org/rec/conf/sigir/ChandrasekaranY19
}}
==Overview and Results: CL-SciSumm Shared Task 2019==
<pdf width="1500px">https://ceur-ws.org/Vol-2414/paper17.pdf</pdf>
<pre>
    Overview and Results: CL-SciSumm Shared
                   Task 2019

            Muthu Kumar Chandrasekaran1 , Michihiro Yasunaga2 ,
            Dragomir Radev2 , Dayne Freitag1 , and Min-Yen Kan3
                            1
                              SRI International, USA
                                2
                               Yale University, USA
        3
          School of Computing, National University of Singapore, Singapore
                             cmkumar087@gmail.com


      Abstract. The CL-SciSumm Shared Task is the first medium-scale shared
      task on scientific document summarization in the computational linguis-
      tics (CL) domain. In 2019, it comprised three tasks: (1A) identifying
      relationships between citing documents and the referred document, (1B)
      classifying the discourse facets, and (2) generating the abstractive sum-
      mary. The dataset comprised 40 annotated sets of citing and reference pa-
      pers of the CL-SciSumm 2018 corpus and 1000 more from the SciSumm-
      Net dataset. All papers are from the open access research papers in the
      CL domain. This overview describes the participation and the official
      results of the CL-SciSumm 2019 Shared Task, organized as a part of the
      42nd Annual Conference of the Special Interest Group in Information Re-
      trieval (SIGIR), held in Paris, France in July 2019. We compare the par-
      ticipating systems in terms of two evaluation metrics and discuss the use
      of ROUGE as an evaluation metric. The annotated dataset used for this
      shared task and the scripts used for evaluation can be accessed and used
      by the community at: https://github.com/WING-NUS/scisumm-corpus.


1   Introduction

CL-SciSumm explores summarization of scientific research in the domain of com-
putational linguistics research. It encourages the incorporation of new kinds of
information in automatic scientific paper summarization, such as the facets of
research information being summarized in the research paper. CL-SciSumm also
encourages the use of citing mini-summaries written in other papers, by other
scholars, when they refer to the paper. The Shared Task dataset comprises the
set of citation sentences (i.e., “citances”) that reference a specific paper as a
(community-created) summary of a topic or paper [19]. Citances for a reference
paper are considered a synopses of its key points and also its key contributions
and importance within an academic community [16]. The advantage of using ci-
tances is that they are embedded with meta-commentary and offer a contextual,
interpretative layer to the cited text. Citances offer a view of the cited paper
which could complement the reader’s context, possibly as a scholar [8].
    The CL-SciSumm Shared Task is aimed at bringing together the summariza-
tion community to address challenges in scientific communication summariza-
tion. Over time, we anticipate that the Shared Task will spur the creation of
new resources, tools and evaluation frameworks.
    A pilot CL-SciSumm task was conducted at TAC 2014, as part of the larger
BioMedSumm Task4 . In 2016, a second CL-Scisumm Shared Task [6] was held as
part of the Joint Workshop on Bibliometric-enhanced Information Retrieval and
Natural Language Processing for Digital Libraries (BIRNDL) workshop [15] at
the Joint Conference on Digital Libraries (JCDL 2016). This paper provides the
results and insights from CL-SciSumm 2017, which was held as part of subse-
quent BIRNDL 2017 workshop[14] at the annual ACM Conference on Research
and Development in Information Retrieval (SIGIR5 ).


2     Task
CL-SciSumm defined two serially dependent tasks that participants could at-
tempt, given a canonical training and testing set of papers.

Given: A topic consists of a Reference Paper (RP) and ten or more Citing
Papers (CPs) that all contain citations to the RP. In each CP, the text spans
(i.e., citances) have been identified that pertain to a particular citation to the
RP. Additionally, the dataset provides three types of summaries for each RP:
 – the abstract, written by the authors of the research paper.
 – the community summary, collated from the reference spans of its citances.
 – a human-written summary, written by the annotators of the CL-SciSumm
   annotation effort.
Task 1A: For each citance, identify the spans of text (cited text spans) in the
RP that most accurately reflect the citance. These are of the granularity of a sen-
tence fragment, a full sentence, or several consecutive sentences (no more than 5).

Task 1B: For each cited text span, identify what facet of the paper it belongs
to, from a predefined set of facets.

Task 2: Finally, generate a structured summary of the RP from the cited text
spans of the RP. The length of the summary should not exceed 250 words. This
was an optional bonus task.


3     Development
We built the CL-SciSumm corpus by randomly sampling research papers (Ref-
erence papers, RPs) from the ACL Anthology corpus and then downloading the
4
    http://www.nist.gov/tac/2014
5
    http://sigir.org/sigir2017/
citing papers (CPs) for those which had at least ten citations. The prepared
dataset then comprised annotated citing sentences for a research paper, mapped
to the sentences in the RP which they referenced. Summaries of the RP were
also included.
    The CL-SciSumm 2019 corpus consisted for 40 annotated RPs and their CPs.
These are the same as described in our overview paper in CL-SciSumm 2018 [7].
The test set was blind. We reused the blind test we used for CL-SciSumm 2018
since we want to have a comparable evaluation CL-SciSumm 2019 systems that
will have additional training data (see Section 3.1).
    For details of the general procedure followed to construct the CL-SciSumm
corpus, and changes made to the procedure in CL-SciSumm-2016, please see
[6]. In 2017, we made revisions to the corpus to remove citances from passing
citations. These are described in [5].


3.1    Annotation

The first annotated CL-SciSumm corpus was released for The CL-SciSumm 16
shared task. This was annotated based on annotation scheme from what was
followed in previous editions of the task and the original BiomedSumm task de-
veloped by Cohen et. al6 : Given each RP and its associated CPs, the annotation
group was instructed to find citations to the RP in each CP. Specifically, the
citation text, citation marker, reference text, and discourse facet were identified
for each citation of the RP found in the CP.
    Then CL-Scisumm-17 and CL-Scisumm-18 incrementally added more anno-
tated RPs to its current size of 40 annotated RPs.
    For CL-Scisumm-19, we augment this dataset both Task 1a and Task 2 so
that they have approximately 1000 data points as opposed to 40 in previous
years. Specifically, for Task 1, we used the method proposed by [17] to prepare
noisy training data for about 1000 unannotated papers. This method involves
automatically matching a citance in a CP with approximately similar reference
spans in its RPs. The number of reference spans per citance is a hyperparameter
that can set as input. For Task 2, we used the SciSummNet corpus proposed
by [23].


4     Overview of Approaches

Nine systems out of the seventeen registered systems – in Task 1 and a subset of
five also participated in Task 2 – submitted their output for evaluation. We in-
clude these system papers in the BIRNDL 2019 proceedings. We will now briefly
summarise their methods and key results in lexicographic order by team name.

   System 1 is from Nanjing University of Science and Technology [13]. For
Task 1A, they use multi-classifiers and integrate their results via voting system.
6
    http://www.nist.gov/tac/2014
Compared with previous work, this year they make new selection of features
based on correlation analysis, apply similarity-based negative sampling strategy
when creating training dataset and add deep learning models for classifications.
For Task 1B, they firstly calculate the probability that each word would belong
to the specific facet based on training corpus and then some prior rules are
added to obtain final result. For Task 2, to obtain a logical summary, they group
sentences in two ways, first based on their relevance between abstract segments
and second arranged by recognized facet from task 1B. Then they pick out
important sentences via ranking.
    System 2 is from Beijing University of Posts and Telecommunications (BUPT)
[10]. They build a new feature of Word2vec H for the CNN model to calculate
sentence similarity for citation linkage. In addition to the methods used last
year, they also intend to apply CNN for facet classification. In order to improve
the performance of summarization, they develop more semantic representations
for sentences based on neural network language models to construct new kernel
matrix used in Determinantal Point Processes (DPPs).
    System 3 is from University of Manchester [24]. For Task 1 they looked into
supervised and semi-supervised approaches. They explored the potential of fine-
tuning bidirectional transformers for the identification of cited passages. They
further formalised the task as a similarity ranking problem and implemented
bilateral multi-perspective matching for natural language sentences. For Task 2,
they used hybrid summarisation methods to create a summary from the content
of the paper and the cited text spans.
    System 4 is from University of Toulouse [18]. They focus on Task 1A.
They first identify candidate sentences in the reference paper and compute their
similarities to the citing sentence using tf-idf and embedding-based methods as
well as other features such as POS tags. They submitted 15 runs with different
configurations.
    System 7 is from IIIT Hyderabad and Adobe Research [21]. Their archi-
tecture incorporates transfer learning by utilising a combination of pretrained
embeddings which are subsequently used for building models for the given tasks.
In particular, for task 1A, they locate the related text spans referred to by the
citation text by creating paired text representations and employ pre-trained em-
bedding mechanisms in conjunction with XGBoost, a gradient boosted decision
tree algorithm to identify textual entailment. For task 1B, they make use of
the same pretrained embeddings and use the RAKEL algorithm for multi-label
classification.
    System 8 is from Universitat Pompeu Fabra and Universidad de la Repub-
lica [2]. They propose a supervised system based on recurrent neural networks
and an unsupervised system based on sentence similarity for Task 1A, one su-
pervised approach for Task 1B, and one supervised approach for Task 2. The
approach for Task 2 follows the method by the winning approach in CL-SciSumm
2018.
    System 9 is from Politecnico di Torino [20]. Their approach to tasks 1A
and 1B relies on an ensemble of classification and regression models trained on
the annotated pairs of cited and citing sentences. Facet assignment is based on
the relative positions of the cited sentences locally to the corresponding section
and globally in the entire paper. Task 2 is addressed by predicting the overlap
(in terms of units of text) between the selected text spans and the summary
generated by the domain experts. The output summary consists of the subset of
sentences maximizing the predicted overlap score.
    System 12 is from Nanjing University and Kim Il Sung University [9].
They propose a novel listwise ranking method for cited text identification. Their
method have two stages: similarity-based ranking and supervised listwise rank-
ing. In the first stage, we select the top-5 sentences per a citation text, due to
the modified Jaccard similarity. These top-5 selected sentences are proceeded to
rank by a CitedListNet (listwise ranking model based on deep learning). They
select 36 similarity features and 11 section information as feature. Finally, they
select two sentences on the sentence list ranked by CitedList- Net.
    System 17 is from National Technical University of Athens, Athens Univer-
sity of Economics and Business, and Athena Research and Innovation Center [4].
Their approach is twofold. Firstly they classify sentences of an abstract to pre-
defined classes called “zones”. They use sentences from selected zones to find the
most similar ones of the rest sentences of the paper which constitute the “can-
didate sentences”. Secondly, they employ a siamese bi-directional GRU neural
network with a logistic regression layer to classify if a citation sentence cites a
candidate sentence.


5   Evaluation

An automatic evaluation script was used to measure system performance for
Task 1A, in terms of the sentence ID overlaps between the sentences identified
in system output, versus the gold standard created by human annotators. The
raw number of overlapping sentences were used to calculate the precision, recall
and F1 score for each system. We followed the approach in most SemEval tasks
in reporting the overall system performance as its micro-averaged performance
over all topics in the blind test set.
    Additionally, we calculated lexical overlaps in terms of the ROUGE-2 and
ROUGE-SU4 scores [11] between the system output and the human annotated
gold standard reference spans.
    We have been reporting ROUGE scoring since CL-SciSumm 17, for Tasks 1a
and Task 2. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is
a set of metrics used to automatically evaluate summarization systems [11]
by measuring the overlap between computer-generated summaries and multi-
ple human written reference summaries. In previous studies, ROUGE scores
have significantly correlated with human judgments on summary quality [12].
Different variants of ROUGE differ according to the granularity at which over-
lap is calculated. For instance, ROUGE–2 measures the bigram overlap between
the candidate computer-generated summary and the reference summaries. More
generally, ROUGE–N measures the n-gram overlap. ROUGE–L measures the
overlap in Longest Common Subsequence (LCS). ROUGE–S measures over-
laps in skip-bigrams or bigrams with arbitrary gaps in-between. ROUGE-SU
uses skip-bigram plus unigram overlaps. CL-SciSumm 2017 uses ROUGE-2 and
ROUGE-SU4 for its evaluation.

    Task 1B was evaluated as a proportion of the correctly classified discourse
facets by the system, contingent on the expected response of Task 1A. As it is a
multi-label classification, this task was also scored based on the precision, recall
and F1 scores.
    Task 2 was optional, and also evaluated using the ROUGE–2 and ROUGE–
SU4 scores between the system output and three types of gold standard sum-
maries of the research paper: the reference paper’s abstract, a community sum-
mary, and a human summary.
    The evaluation scripts have been provided at the CL-SciSumm Github reposi-
tory7 where the participants may run their own evaluation and report the results.

6     Results
This section compares the participating systems in terms of their performance.
Five of the nine system that did Task 1 also did the bonus Task 2. Following
are the plots with their performance measured by ROUGE–2 and ROUGE–SU4
against the 3 gold standard summary types. The results are provided in Table 1
and Figure 1. The detailed implementation of the individual runs are described
in the system papers included in this proceedings volume.
    For Task 1A, the best performance was shown by System 3 (Team UoM) [24].
Their performance was closely followed by System 12 [9]. Both teams imple-
mented deep learning-based systems. One of the key goals of CL-SciSumm ’19
was to boost performance of deep learning models by adding more training data.
It is encouraging though not surprising to see the best performance from deep
learning models. The third best system was system 2 (Team CIST-BUPT) which
was also the best performer for Task 1B, the classification task. Second best per-
formance Task 1B was by System 4 (Team IRIT-IRIS).

   On the summarisation task, Task 2, System 3 (Team UoM) had the best per-
formance against the abstract. System 2 (Team CIST-BUPT) had the best per-
formance for community and human summaries. Again, both are deep learning-
based systems. The additional 1000 summaries from SciSummnet as training
data has resulted in the improved performance. System 2 was the second against
abstract summaries, and system 3 was the second against human summaries.

7     Research questions and discussions
For CL-SciSumm ’19, we augmented the CL-SciSumm ’18 training datasets for
both Task 1a and Task 2 so that they have approximately 1000 data points as
7
    github.com/WING-NUS/scisumm-corpus
                                                                          Task 1A: Sentence       Task 1A:
     System                                                                  Overlap (F1 )    ROUGE-SU4 F1   Task 1B
     system 3 Run 2                                                       0.126               0.075          0.312
     system 12 Run 1                                                      0.124               0.090          0.221
     system 3 Run 5                                                       0.120               0.072          0.303
     system 3 Run 6                                                       0.118               0.079          0.292
     system 12 Run 2                                                      0.118               0.061          0.266
     system 3 Run 10                                                      0.110               0.073          0.276
     system 3 Run 4                                                       0.110               0.062          0.283
     system 2 run15-Voting-1.1-SubtitleAndHfw-QD method 1                 0.106               0.034          0.389
     system 2 run13-Voting-1.1-SubtitleAndHfw-LSA method 3                0.106               0.034          0.389
     system 2 run14-Voting-1.1-SubtitleAndHfw-LSA method 4                0.106               0.034          0.389
     system 2 run16-Voting-1.1-SubtitleAndHfw-SentenceVec method 2        0.106               0.034          0.389
     system 2 run23-Voting-2.0-Voting-QD method 1                         0.104               0.036          0.341
     system 2 run24-Voting-2.0-Voting-SentenceVec method 2                0.104               0.036          0.341
     system 2 run20-Voting-2.0-TextCNN-SentenceVec method 2               0.104               0.036          0.342
     system 2 run21-Voting-2.0-Voting-LSA method 3                        0.104               0.036          0.341
     system 2 run18-Voting-2.0-TextCNN-LSA method 4                       0.104               0.036          0.342
     system 2 run22-Voting-2.0-Voting-LSA method 4                        0.104               0.036          0.341
     system 2 run19-Voting-2.0-TextCNN-QD method 1                        0.104               0.036          0.342
     system 2 run17-Voting-2.0-TextCNN-LSA method 3                       0.104               0.036          0.342
     system 12 Run 3                                                      0.104               0.041          0.286
     system 2 run10-Jaccard-Focused-Voting-LSA method 4                   0.103               0.038          0.294
     system 2 run7-Jaccard-Focused-SubtitleAndHfw-QD method 1             0.103               0.038          0.385
     system 2 run5-Jaccard-Focused-SubtitleAndHfw-LSA method 3            0.103               0.038          0.385
     system 2 run9-Jaccard-Focused-Voting-LSA method 3                    0.103               0.038          0.294
     system 2 run12-Jaccard-Focused-Voting-SentenceVec method 2           0.103               0.038          0.294
     system 2 run6-Jaccard-Focused-SubtitleAndHfw-LSA method 4            0.103               0.038          0.385
     system 2 run11-Jaccard-Focused-Voting-QD method 1                    0.103               0.038          0.294
     system 2 run8-Jaccard-Focused-SubtitleAndHfw-SentenceVec method 2    0.103               0.038          0.385
     system 12 Run 4                                                      0.098               0.030          0.315
     system 3 Run 3                                                       0.097               0.062          0.251
     system 4 WithoutEmb Training20182019 Test2019 3 0.1                  0.097               0.071          0.286
     system 4 WithoutEmb Training2018 Test2019 3 0.1                      0.097               0.071          0.286
     system 4 WithoutEmb Training2019 Test2019 3 0.1                      0.097               0.071          0.286
     system 3 Run 1                                                       0.093               0.060          0.255
     system 9 Run 2                                                       0.092               0.034          0.229
     system 9 Run 3                                                       0.092               0.034          0.229
     system 9 Run 1                                                       0.092               0.034          0.229
     system 9 Run 4                                                       0.092               0.034          0.229
     system 4 WithoutEmbTopsim Training20182019 Test2019 0.15 5 0.05      0.090               0.044          0.351
     system 4 WithoutEmbTopsim Training2019 Test2019 0.15 5 0.05          0.090               0.044          0.351
     system 4 WithoutEmbTopsim Training2018 Test2019 0.15 5 0.05          0.090               0.044          0.351
     system 4 WithoutEmbPOS Training20182019 Test2019 3 0.1               0.089               0.065          0.263
     system 4 WithoutEmbPOS Training2019 Test2019 3 0.1                   0.089               0.065          0.263
     system 4 WithoutEmbPOS Training2018 Test2019 3 0.1                   0.089               0.065          0.263
     system 4 WithoutEmbTopsimPOS Training2019 Test2019 0.15 5 0.05       0.088               0.044          0.346
     system 4 WithoutEmbTopsimPOS Training2018 Test2019 0.15 5 0.05       0.088               0.044          0.346
     system 4 WithoutEmbTopsimPOS Training20182019 Test2019 0.15 5 0.05   0.088               0.044          0.346
     system 2 run1-Jaccard-Cascade-Voting-LSA method 3                    0.087               0.033          0.274
     system 2 run3-Jaccard-Cascade-Voting-QD method 1                     0.087               0.033          0.274
     system 2 run4-Jaccard-Cascade-Voting-SentenceVec method 2            0.087               0.033          0.274
     system 2 run2-Jaccard-Cascade-Voting-LSA method 4                    0.087               0.033          0.274
     system 1 Run 26                                                      0.086               0.041          0.245
     system 1 Run 4                                                       0.086               0.042          0.241
     system 1 Run 30                                                      0.081               0.036          0.242
     system 1 Run 27                                                      0.081               0.040          0.207
     system 1 Run 8                                                       0.081               0.036          0.242
     system 1 Run 10                                                      0.081               0.036          0.242
     system 1 Run 23                                                      0.081               0.036          0.242
     system 1 Run 17                                                      0.080               0.035          0.236
     system 3 Run 7                                                       0.078               0.048          0.218
     system 1 Run 12                                                      0.078               0.093          0.098
     system 1 Run 15                                                      0.078               0.093          0.110
     system 1 Run 28                                                      0.078               0.093          0.098
     system 1 Run 2                                                       0.078               0.093          0.110
     system 1 Run 9                                                       0.078               0.093          0.110
     system 1 Run 25                                                      0.078               0.093          0.098
     system 1 Run 13                                                      0.078               0.040          0.205
     system 1 Run 24                                                      0.078               0.093          0.110
     system 1 Run 22                                                      0.078               0.093          0.098
     system 1 Run 3                                                       0.078               0.093          0.098
     system 1 Run 5                                                       0.078               0.093          0.113
     system 1 Run 6                                                       0.078               0.093          0.110
     system 1 Run 1                                                       0.078               0.093          0.113
     system 1 Run 14                                                      0.078               0.093          0.113
     system 1 Run 7                                                       0.078               0.093          0.098
     system 1 Run 16                                                      0.078               0.093          0.098
     system 1 Run 29                                                      0.078               0.093          0.110
     system 1 Run 18                                                      0.077               0.033          0.232
     system 4 unweightedPOS W2v Training2018 Test2019 3 0.05              0.076               0.045          0.201
     system 4 unweightedPOS W2v Training20182019 Test2019 3 0.05          0.076               0.047          0.201
     system 4 unweightedPOS W2v Training2019 Test2019 3 0.05              0.076               0.045          0.201
     system 1 Run 11                                                      0.075               0.091          0.106
     system 3 Run 8                                                       0.074               0.051          0.221
     system 1 Run 19                                                      0.073               0.031          0.218
     system 8 Run 4                                                       0.070               0.025          0.122
     system 8 Run 2                                                       0.066               0.026          0.277
     system 3 Run 11                                                      0.062               0.052          0.150
     system 1 Run 20                                                      0.061               0.032          0.178
     system 1 Run 21                                                      0.048               0.048          0.083
     system 8 Run 3                                                       0.031               0.021          0.078
     system 8 Run 1                                                       0.020               0.015          0.070
     system 7                                                             0.020               0.031          0.045
     system 17 ntua-ilsp-RUN-NNT                                          0.013               0.021          0.016
     system 3 Run 9                                                       0.012               0.018          0.039
     system 2 run25-Word2vec-H-CNN-SubtitleAndHfw-QD method 1             0.009               0.009          0.047
     system 2 run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec method 2    0.009               0.009          0.047
     system 17 ntua-ilsp-RUN NNF                                          0.007               0.013          0.013

Table 1: Systems’ performance in Task 1A and 1B, ordered by their F1 -scores for sentence overlap
on Task 1A. Each system’s rank by their performance on ROUGE on Task 1A and 1B are shown in
parentheses.
                                                                                       Vs. Abstract     Vs. Community        Vs. Human
 System
                                                                               R–2        RSU–4     R–2       RSU–4   R–2     RSU–4
 system 3 Run 1                                                                0.514     0.295    0.106     0.062    0.265    0.180
 system 3 Run 11                                                               0.514     0.295    0.106     0.062    0.265    0.180
 system 3 Run 6                                                                0.514     0.295    0.106     0.062    0.265    0.180
 system 3 Run 2                                                                0.514     0.295    0.106     0.062    0.265    0.180
 system 3 Run 7                                                                0.514     0.295    0.106     0.062    0.265    0.180
 system 3 Run 10                                                               0.514     0.295    0.106     0.062    0.265    0.180
 system 3 Run 8                                                                0.514     0.295    0.106     0.062    0.265    0.180
 system 3 Run 5                                                                0.514     0.295    0.106     0.062    0.265    0.180
 system 3 Run 3                                                                0.514     0.295    0.106     0.062    0.265    0.180
 system 3 Run 4                                                                0.514     0.295    0.106     0.062    0.265    0.180
 system 3 Run 9                                                                0.514     0.295    0.106     0.062    0.265    0.180
 system 2 run3-Jaccard-Cascade-Voting-QD method 1 human                        0.389     0.210    0.122     0.063    0.278    0.200
 system 2 run3-Jaccard-Cascade-Voting-QD method 1 abstract                     0.389     0.210    0.122     0.063    0.278    0.200
 system 2 run23-Voting-2.0-Voting-QD method 1 human                            0.386     0.227    0.121     0.063    0.257    0.189
 system 2 run19-Voting-2.0-TextCNN-QD method 1 human                           0.386     0.227    0.121     0.063    0.257    0.189
 system 2 run19-Voting-2.0-TextCNN-QD method 1 abstract                        0.386     0.227    0.121     0.063    0.257    0.189
 system 2 run23-Voting-2.0-Voting-QD method 1 abstract                         0.386     0.227    0.121     0.063    0.257    0.189
 system 2 run15-Voting-1.1-SubtitleAndHfw-QD method 1 human                    0.381     0.211    0.119     0.062    0.267    0.191
 system 2 run15-Voting-1.1-SubtitleAndHfw-QD method 1 abstract                 0.381     0.211    0.119     0.062    0.267    0.191
 system 2 run10-Jaccard-Focused-Voting-LSA method 4 community                  0.368     0.186    0.096     0.053    0.252    0.170
 system 2 run2-Jaccard-Cascade-Voting-LSA method 4 community                   0.368     0.186    0.096     0.053    0.252    0.170
 system 2 run18-Voting-2.0-TextCNN-LSA method 4 community                      0.368     0.186    0.096     0.053    0.252    0.170
 system 2 run6-Jaccard-Focused-SubtitleAndHfw-LSA method 4 community           0.368     0.186    0.096     0.053    0.252    0.170
 system 2 run14-Voting-1.1-SubtitleAndHfw-LSA method 4 community               0.368     0.186    0.096     0.053    0.252    0.170
 system 2 run22-Voting-2.0-Voting-LSA method 4 community                       0.368     0.186    0.096     0.053    0.252    0.170
 system 2 run11-Jaccard-Focused-Voting-QD method 1 human                       0.367     0.201    0.121     0.062    0.258    0.184
 system 2 run7-Jaccard-Focused-SubtitleAndHfw-QD method 1 abstract             0.367     0.201    0.121     0.062    0.258    0.184
 system 2 run11-Jaccard-Focused-Voting-QD method 1 abstract                    0.367     0.201    0.121     0.062    0.258    0.184
 system 2 run7-Jaccard-Focused-SubtitleAndHfw-QD method 1 human                0.367     0.201    0.121     0.062    0.258    0.184
 system 9 Run 1                                                                0.364     0.196    0.196     0.104    0.218    0.144
 system 9 Run 3                                                                0.359     0.194    0.195     0.104    0.211    0.141
 system 9 Run 2                                                                0.346     0.176    0.209     0.112    0.215    0.140
 system 2 run5-Jaccard-Focused-SubtitleAndHfw-LSA method 3 community           0.343     0.171    0.097     0.049    0.254    0.174
 system 2 run13-Voting-1.1-SubtitleAndHfw-LSA method 3 community               0.343     0.171    0.097     0.049    0.254    0.174
 system 2 run1-Jaccard-Cascade-Voting-LSA method 3 community                   0.343     0.171    0.097     0.049    0.254    0.174
 system 2 run9-Jaccard-Focused-Voting-LSA method 3 community                   0.343     0.171    0.097     0.049    0.254    0.174
 system 2 run21-Voting-2.0-Voting-LSA method 3 community                       0.343     0.171    0.097     0.049    0.254    0.174
 system 2 run17-Voting-2.0-TextCNN-LSA method 3 community                      0.343     0.171    0.097     0.049    0.254    0.174
 system 9 Run 4                                                                0.340     0.174    0.206     0.111    0.208    0.138
 system 8 Run 1                                                                0.329     0.172    0.149     0.090    0.241    0.171
 system 2 run12-Jaccard-Focused-Voting-SentenceVec method 2 abstract           0.318     0.171    0.142     0.075    0.239    0.167
 system 2 run12-Jaccard-Focused-Voting-SentenceVec method 2 human              0.318     0.171    0.142     0.075    0.239    0.167
 system 2 run8-Jaccard-Focused-SubtitleAndHfw-SentenceVec method 2 abstract    0.318     0.171    0.142     0.075    0.239    0.167
 system 2 run8-Jaccard-Focused-SubtitleAndHfw-SentenceVec method 2 human       0.318     0.171    0.142     0.075    0.239    0.167
 system 8 Run 2                                                                0.316     0.167    0.169     0.101    0.245    0.169
 system 8 Run 3                                                                0.311     0.156    0.153     0.093    0.252    0.170
 system 2 run20-Voting-2.0-TextCNN-SentenceVec method 2 abstract               0.296     0.152    0.128     0.067    0.252    0.177
 system 2 run24-Voting-2.0-Voting-SentenceVec method 2 human                   0.296     0.152    0.128     0.067    0.252    0.177
 system 2 run20-Voting-2.0-TextCNN-SentenceVec method 2 human                  0.296     0.152    0.128     0.067    0.252    0.177
 system 2 run24-Voting-2.0-Voting-SentenceVec method 2 abstract                0.296     0.152    0.128     0.067    0.252    0.177
 system 1 Run 26                                                               0.296     0.145    0.193     0.108    0.224    0.150
 system 1 Run 4                                                                0.294     0.144    0.191     0.108    0.235    0.151
 system 2 run4-Jaccard-Cascade-Voting-SentenceVec method 2 abstract            0.287     0.155    0.121     0.066    0.247    0.175
 system 2 run4-Jaccard-Cascade-Voting-SentenceVec method 2 human               0.287     0.155    0.121     0.066    0.247    0.175
 system 2 run16-Voting-1.1-SubtitleAndHfw-SentenceVec method 2 human           0.277     0.150    0.124     0.064    0.246    0.179
 system 2 run16-Voting-1.1-SubtitleAndHfw-SentenceVec method 2 abstract        0.277     0.150    0.124     0.064    0.246    0.179
 system 2 run25-Word2vec-H-CNN-SubtitleAndHfw-QD method 1 abstract             0.277     0.158    0.115     0.059    0.238    0.167
 system 2 run25-Word2vec-H-CNN-SubtitleAndHfw-QD method 1 human                0.277     0.158    0.115     0.059    0.238    0.167
 system 1 Run 8                                                                0.277     0.137    0.200     0.115    0.229    0.151
 system 1 Run 30                                                               0.276     0.137    0.204     0.117    0.237    0.154
 system 1 Run 10                                                               0.276     0.137    0.204     0.117    0.237    0.154
 system 1 Run 18                                                               0.262     0.127    0.196     0.113    0.223    0.149
 system 2 run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec method 2 human       0.261     0.145    0.126     0.066    0.222    0.153
 system 2 run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec method 2 abstract    0.261     0.145    0.126     0.066    0.222    0.153
 system 8 Run 4                                                                0.246     0.147    0.131     0.084    0.170    0.141
 system 1 Run 20                                                               0.239     0.122    0.177     0.102    0.231    0.158
 system 2 run15-Voting-1.1-SubtitleAndHfw-QD method 1 community                0.207     0.123    0.126     0.070    0.215    0.153
 system 2 run3-Jaccard-Cascade-Voting-QD method 1 community                    0.205     0.118    0.130     0.069    0.201    0.144
 system 2 run4-Jaccard-Cascade-Voting-SentenceVec method 2 community           0.204     0.123    0.140     0.077    0.221    0.159
 system 2 run24-Voting-2.0-Voting-SentenceVec method 2 community               0.203     0.126    0.138     0.076    0.225    0.164
 system 2 run20-Voting-2.0-TextCNN-SentenceVec method 2 community              0.203     0.126    0.138     0.076    0.225    0.164
 system 2 run8-Jaccard-Focused-SubtitleAndHfw-SentenceVec method 2 community   0.199     0.115    0.131     0.073    0.222    0.156
 system 2 run16-Voting-1.1-SubtitleAndHfw-SentenceVec method 2 community       0.199     0.116    0.131     0.071    0.207    0.156
 system 2 run12-Jaccard-Focused-Voting-SentenceVec method 2 community          0.199     0.115    0.131     0.073    0.222    0.156
 system 2 run19-Voting-2.0-TextCNN-QD method 1 community                       0.198     0.114    0.135     0.072    0.226    0.156
 system 2 run23-Voting-2.0-Voting-QD method 1 community                        0.198     0.114    0.135     0.072    0.226    0.156
 system 2 run11-Jaccard-Focused-Voting-QD method 1 community                   0.197     0.108    0.134     0.069    0.220    0.154
 system 2 run7-Jaccard-Focused-SubtitleAndHfw-QD method 1 community            0.197     0.108    0.134     0.069    0.220    0.154
 system 1 Run 12                                                               0.184     0.111    0.192     0.110    0.194    0.151
 system 1 Run 2                                                                0.183     0.111    0.192     0.112    0.193    0.150
 system 1 Run 6                                                                0.183     0.111    0.192     0.112    0.193    0.150
 system 1 Run 14                                                               0.183     0.112    0.192     0.112    0.193    0.150
 system 2 run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec method 2 community   0.180     0.106    0.112     0.063    0.211    0.147
 system 1 Run 28                                                               0.167     0.104    0.192     0.108    0.194    0.150
 system 1 Run 22                                                               0.167     0.104    0.192     0.108    0.194    0.150
 system 1 Run 16                                                               0.167     0.104    0.192     0.108    0.194    0.150
 system 1 Run 24                                                               0.166     0.104    0.193     0.109    0.194    0.150
 system 2 run25-Word2vec-H-CNN-SubtitleAndHfw-QD method 1 community            0.151     0.097    0.126     0.069    0.201    0.138
 system 1 Run 13                                                               0.144     0.077    0.148     0.087    0.146    0.111
 system 1 Run 17                                                               0.119     0.063    0.149     0.095    0.128    0.098
 system 1 Run 11                                                               0.114     0.066    0.145     0.088    0.099    0.085
 system 1 Run 5                                                                0.112     0.067    0.136     0.085    0.140    0.103
 system 1 Run 27                                                               0.110     0.064    0.136     0.089    0.142    0.098
 system 1 Run 25                                                               0.107     0.061    0.145     0.092    0.113    0.086
 system 1 Run 1                                                                0.107     0.061    0.156     0.096    0.139    0.098
 system 1 Run 15                                                               0.105     0.062    0.128     0.080    0.097    0.077
 system 1 Run 9                                                                0.101     0.063    0.147     0.091    0.121    0.091
 system 1 Run 29                                                               0.093     0.057    0.139     0.086    0.120    0.093
 system 1 Run 19                                                               0.091     0.056    0.157     0.083    0.113    0.085
 system 1 Run 23                                                               0.090     0.058    0.165     0.094    0.108    0.084
 system 1 Run 3                                                                0.089     0.051    0.146     0.084    0.116    0.088
 system 1 Run 7                                                                0.082     0.050    0.162     0.096    0.121    0.095
 system 1 Run 21                                                               0.075     0.050    0.109     0.063    0.121    0.083

Table 2: Systems’ performance for Task 2 ordered by their ROUGE–2(R–2) and ROUGE–SU4(R–
SU4) F1 -scores. Each system’s rank by their performance on the corresponding evaluation is shown
in parentheses. Winning scores are bolded.
                  Task1A: SentenceTask1A:
                                   OverlapROUGE-SU4
                                           (F1) Task1B
                                                    (F1)
                                                       (F1)
system 1 Run 1             0.078          0.093           0.113
system 1 Run 10            0.081          0.036           0.242
system 1 Run 11            0.075          0.091           0.106
system 1 Run 12            0.078          0.093           0.098
system 1 Run 13            0.078           0.04           0.205
system 1 Run 14            0.078          0.093           0.113
system 1 Run 15            0.078          0.093            0.11
system 1 Run 16            0.078          0.093           0.098
system 1 Run 17             0.08          0.035           0.236
system 1 Run 18            0.077          0.033           0.232
system 1 Run 19            0.073          0.031           0.218
system 1 Run 2             0.078          0.093            0.11
system 1 Run 20            0.061          0.032           0.178
system 1 Run 21            0.048          0.048           0.083
system 1 Run 22            0.078          0.093           0.098
system 1 Run 23            0.081          0.036           0.242
system 1 Run 24            0.078          0.093            0.11
system 1 Run 25            0.078          0.093           0.098
system 1 Run 26            0.086          0.041           0.245
system 1 Run 27            0.081           0.04           0.207
system 1 Run 28            0.078          0.093           0.098
system 1 Run 29            0.078          0.093            0.11
system 1 Run 3             0.078          0.093           0.098
system 1 Run 30            0.081          0.036           0.242
system 1 Run 4             0.086          0.042           0.241
system 1 Run 5             0.078          0.093           0.113
system 1 Run 6             0.078
                                     Task1B  (F1)
                                         0.093             0.11
system 1 Run 7 system 10.078
                         Run 1            0.093 0.113     0.098
system 1 Run 8 system 10.081
                         Run 10           0.036   0.242   0.242
system 1 Run 9
                  system 10.078
                            Run 11        0.093
                                                  0.106    0.11
system 12 Run 1            0.124           0.09           0.221
                  system 1 Run 12                 0.098
system 12 Run 2            0.118
system 12 Run 3 system 10.104
                          Run 13
                                          0.061
                                          0.041   0.205
                                                          0.266
                                                          0.286
                                                                                                       (a)
system 12 Run 4 system 10.098
                          Run 14           0.03   0.113   0.315
system 17 ntua-ilsp-RUN_NNF
                         0.007            0.013           0.013
                  system 1 Run 15                  0.11
system 17 ntua-ilsp-RUN-NNT
                         0.013            0.021           0.016
                  system 1 Run 16                 0.098
system 2 run1-Jaccard-Cascade-Voting-LSA_method_3
                          0.087          0.033            0.274
                system 10.103
                           Run 17        0.038 0.236
system 2 run10-Jaccard-Focused-Voting-LSA_method_4        0.294
                system 10.103
                           Run 18        0.038 0.232
system 2 run11-Jaccard-Focused-Voting-QD_method_1         0.294
system 2 run12-Jaccard-Focused-Voting-SentenceVec_method_2
                          0.103           0.038         0.294
                  system 1 Run 19                 0.218
system 2 run13-Voting-1.1-SubtitleAndHfw-LSA_method_3
                           0.106           0.034          0.389
                  system 1 Run 2                   0.11
system 2 run14-Voting-1.1-SubtitleAndHfw-LSA_method_4
                           0.106           0.034          0.389
                system 10.106
                            Run 20        0.034 0.178
system 2 run15-Voting-1.1-SubtitleAndHfw-QD_method_1      0.389
                system 10.106
                            Run 21         0.034 0.083 0.389
system 2 run16-Voting-1.1-SubtitleAndHfw-SentenceVec_method_2
system 2 run17-Voting-2.0-TextCNN-LSA_method_3
                system 10.104
                            Run 22      0.036 0.098       0.342
system 2 run18-Voting-2.0-TextCNN-LSA_method_4
                           0.104        0.036             0.342
                  system 1 Run 23                 0.242
system 2 run19-Voting-2.0-TextCNN-QD_method_1
                           0.104        0.036             0.342
                 system 10.087
                           Run 24        0.033 0.11
system 2 run2-Jaccard-Cascade-Voting-LSA_method_4         0.274
                system 10.104
                            Run 25       0.036 0.098 0.342
system 2 run20-Voting-2.0-TextCNN-SentenceVec_method_2
system 2 run21-Voting-2.0-Voting-LSA_method_3
                system 10.104
                            Run 26        0.036   0.245   0.341
system 2 run22-Voting-2.0-Voting-LSA_method_4
                           0.104          0.036           0.341
                  system 1 Run 27                 0.207
system 2 run23-Voting-2.0-Voting-QD_method_1
                           0.104         0.036            0.341
                system 10.104
                            Run 28        0.036 0.098
system 2 run24-Voting-2.0-Voting-SentenceVec_method_2     0.341
                system 10.009
                          Run 29        0.009 0.11 0.047
system 2 run25-Word2vec-H-CNN-SubtitleAndHfw-QD_method_1
system 2 run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec_method_2
                system 10.009
                          Run 3         0.009 0.098 0.047
system 2 run3-Jaccard-Cascade-Voting-QD_method_1
                          0.087         0.033             0.274
                  system 1 Run 30                 0.242
system 2 run4-Jaccard-Cascade-Voting-SentenceVec_method_2
                          0.087
                 system 10.103
                           Run 4
                                          0.033         0.274
                                          0.038 0.241 0.385
system 2 run5-Jaccard-Focused-SubtitleAndHfw-LSA_method_3
                                                                                                       (b)
                 system 10.103
                           Run 5          0.038 0.113 0.385
system 2 run6-Jaccard-Focused-SubtitleAndHfw-LSA_method_4
system 2 run7-Jaccard-Focused-SubtitleAndHfw-QD_method_1
                 system 10.103
                           Run 6          0.038 0.11 0.385
system 2 run8-Jaccard-Focused-SubtitleAndHfw-SentenceVec_method_2
                          0.103           0.038          0.385
                  system 1 Run 7                  0.098
system 2 run9-Jaccard-Focused-Voting-LSA_method_3
system 3 Run 1
                          0.103
                  system 10.093
                            Run 8
                                         0.038
                                           0.06   0.242
                                                          0.294
                                                          0.255
                                                                    Fig. 1: Performances on (a) Task 1A in terms of sentence overlap and ROUGE-
system 3 Run 10 system 1 0.11
                         Run 9            0.073    0.11   0.276
system 3 Run 11 system 12
                       0.062
                          Run 1           0.052   0.221    0.15     SU4, and (b) Task 1B conditional on Task 1A
                  system 12 Run 2                 0.266
                  system 12 Run 3                 0.286
                  system 12 Run 4                 0.315
                  system 17 ntua-ilsp-RUN_NNF
                                           0.013
                  system 17 ntua-ilsp-RUN-NNT
                                           0.016
                  system 2 run1-Jaccard-Cascade-Voting-LSA_method_3
                                            0.274
                  system 2 run10-Jaccard-Focused-Voting-LSA_method_4
                                            0.294
                  system 2 run11-Jaccard-Focused-Voting-QD_method_1
                                            0.294
                  system 2 run12-Jaccard-Focused-Voting-SentenceVec_method_2
                                            0.294
                  system 2 run13-Voting-1.1-SubtitleAndHfw-LSA_method_3
                                             0.389
                  system 2 run14-Voting-1.1-SubtitleAndHfw-LSA_method_4
                                             0.389
                  system 2 run15-Voting-1.1-SubtitleAndHfw-QD_method_1
                                             0.389
                  system 2 run16-Voting-1.1-SubtitleAndHfw-SentenceVec_method_2
                                             0.389
                  system 2 run17-Voting-2.0-TextCNN-LSA_method_3
                                             0.342
                  system 2 run18-Voting-2.0-TextCNN-LSA_method_4
                                             0.342
                  system 2 run19-Voting-2.0-TextCNN-QD_method_1
                                             0.342
                  system 2 run2-Jaccard-Cascade-Voting-LSA_method_4
                                            0.274
                  system 2 run20-Voting-2.0-TextCNN-SentenceVec_method_2
                                             0.342
                  system 2 run21-Voting-2.0-Voting-LSA_method_3
                                             0.341
                  system 2 run22-Voting-2.0-Voting-LSA_method_4
                                             0.341
 U4
  System              Human R-2        Human R-SU4
6 system 1 Run 1              0.139            0.098
7U4
  System 1 Run 10 Human R-20.237 Human R-SU4
  system                                 0.154
6
8 system 1 Run 111          0.139
                            0.099         0.098
                                          0.085
1S
74 ystem        12Human R-2 0.237
   system 1 Run 10          0.194Human R-SU4
                                          0.154
                                          0.151
8system
7 system1 1Run
            Run1 11
                 13          0.139
                               0.099
                               0.146         0.098
                                               0.085
                                               0.111
1system
2 system1 1Run
            Run1012
                 14          0.237
                               0.194
                               0.193         0.154
                                               0.151
                                                0.15
8system
7 system1 1Run
            Run1113
                 15          0.099
                               0.146
                               0.097         0.085
                                               0.111
                                               0.077
8system
2 system1 1Run
            Run1214
                 16          0.194
                               0.193
                               0.194         0.151
                                                0.15
8system
5 system1 1Run
            Run1315
                 17          0.146
                               0.097
                               0.128         0.111
                                               0.077
                                               0.098
3system
8 system1 1Run
            Run1416
                 18          0.193
                               0.194
                               0.223          0.15
                                                0.15
                                               0.149
3system
5 system1 1Run
            Run1517
                 19          0.097
                               0.128
                               0.113         0.077
                                               0.098
                                               0.085
2system
3 system1 1Run
            Run1618
                 2           0.194
                               0.223
                               0.193          0.15
                                               0.149
                                                0.15
3system
2 system1 1Run
            Run1719
                 20          0.128
                               0.113
                               0.231         0.098
                                               0.085
                                               0.158
3system
2 system1 1Run
            Run182
                 21          0.223
                               0.193
                               0.121         0.149
                                                0.15
                                               0.083
8system
2 system1 1Run
            Run1920
                 22          0.113
                               0.231
                               0.194         0.085
                                               0.158
                                                0.15
4system
3 system1 1Run
            Run2 21
                 23          0.193
                               0.121
                               0.108          0.15
                                               0.083
                                               0.084
8system
9 system1 1Run
            Run2022
                 24          0.231
                               0.194         0.158
                                                0.15
2system
4 system1 1Run
            Run2123
                 25          0.121
                               0.108
                               0.113         0.083
                                               0.084
                                               0.086
9system
8 system1 1Run
            Run2224
                 26          0.194
                               0.194
                               0.224          0.15
                                                0.15                                                     (a)
9system
2 system1 1Run
            Run2325
                 27          0.108
                               0.113
                               0.142         0.084
                                               0.086
                                               0.098
8system
  system1 1Run
            Run2426
                 28          0.194
                               0.224
                               0.194          0.15
                                                0.15
6system
9 system1 1Run
            Run2527
                 29          0.113
                               0.142
                                0.12         0.086
                                               0.098
                                               0.093
8system
4 system1 1Run
            Run2628
                 3           0.224
                               0.194
                               0.116          0.15
                                                0.15
                                               0.088
7system
6 system1 1Run
            Run2729
                 30          0.142
                                0.12
                               0.237         0.098
                                               0.093
                                               0.154
4system
8 system1 1Run
            Run283
                 4           0.194
                               0.116
                               0.235          0.15
                                               0.088
                                               0.151
5system
7 system1 1Run
            Run2930
                 5            0.12
                               0.237
                                0.14         0.093
                                               0.154
                                               0.103
8system
2 system1 1Run
            Run3 4
                 6           0.116
                               0.235
                               0.193         0.088
                                               0.151
                                                0.15
6system
5 system1 1Run
            Run305
                 7           0.237
                                0.14
                               0.121         0.154
                                               0.103
                                               0.095
5system
2 system1 1Run
            Run4 6
                 8           0.235
                               0.193
                               0.229         0.151
                                                0.15
                                               0.151
1system
6 system1 1Run
            Run5 7
                 9            0.14
                               0.121        0.103
                                              0.095
                                              0.091
9system
   system1 1
           2Run
5od_3_community
             Run68          0.193
                              0.229          0.15
                                              0.151
             run1-Jaccard-Cascade-Voting-LSA_method_3_community
                              0.254           0.174
1system  1
   system 1 Run
           2 Run
3hod_4_community7 9         0.121
                              0.121         0.095
                                              0.091
             run10-Jaccard-Focused-Voting-LSA_method_4_community
                              0.252            0.17
2system
   system1 2Run
9od_3_community
 od_1_abstract  8           0.229
                              0.254         0.151
             run1-Jaccard-Cascade-Voting-LSA_method_3_community
                                              0.174
             run11-Jaccard-Focused-Voting-QD_method_1_abstract
                              0.258           0.184
3system
   system1 2Run
9hod_4_community9           0.121           0.091
             run10-Jaccard-Focused-Voting-LSA_method_4_community
 od_1_community               0.252            0.17
             run11-Jaccard-Focused-Voting-QD_method_1_community
                               0.22           0.154
2d_3_community
  system
   system2 2run1-Jaccard-Cascade-Voting-LSA_method_3_community
  od_1_abstract
  od_1_human                 0.254           0.174
              run11-Jaccard-Focused-Voting-QD_method_1_abstract
                               0.258           0.184
              run11-Jaccard-Focused-Voting-QD_method_1_human
5od_4_community
9 system
   system2 2run10-Jaccard-Focused-Voting-LSA_method_4_community
  od_1_community             0.252            0.17
              run11-Jaccard-Focused-Voting-QD_method_1_community
 Vec_method_2_abstract          0.22           0.154
              run12-Jaccard-Focused-Voting-SentenceVec_method_2_abstract
                               0.239           0.167
2d_1_abstract
3system
   system2 2run11-Jaccard-Focused-Voting-QD_method_1_abstract
 od_1_human
 Vec_method_2_community      0.258
                               0.258         0.184
              run11-Jaccard-Focused-Voting-QD_method_1_human
                                               0.184
              run12-Jaccard-Focused-Voting-SentenceVec_method_2_community
                               0.222           0.156
5d_1_community
 system
   system2 2run11-Jaccard-Focused-Voting-QD_method_1_community
 Vec_method_2_abstract        0.22           0.154
              run12-Jaccard-Focused-Voting-SentenceVec_method_2_abstract
 Vec_method_2_human            0.239           0.167
              run12-Jaccard-Focused-Voting-SentenceVec_method_2_human
3d_1_human
9system
   system2 2run11-Jaccard-Focused-Voting-QD_method_1_human
 Vec_method_2_community       0.258            0.184
              run12-Jaccard-Focused-Voting-SentenceVec_method_2_community
 ethod_3_community              0.222            0.156
              run13-Voting-1.1-SubtitleAndHfw-LSA_method_3_community
                                0.254            0.174                                                   (b)
5ec_method_2_abstract
 system
   system2 2run12-Jaccard-Focused-Voting-SentenceVec_method_2_abstract
 Vec_method_2_human           0.239            0.167
              run12-Jaccard-Focused-Voting-SentenceVec_method_2_human
3ethod_4_community              0.239            0.167
              run14-Voting-1.1-SubtitleAndHfw-LSA_method_4_community
                                0.252             0.17
2ec_method_2_community
 system
   system2 2run12-Jaccard-Focused-Voting-SentenceVec_method_2_community
9ethod_3_community
 thod_1_abstract              0.222
                                0.254          0.156
              run13-Voting-1.1-SubtitleAndHfw-LSA_method_3_community
                                                 0.174
              run15-Voting-1.1-SubtitleAndHfw-QD_method_1_abstract
                                0.267            0.191
3ec_method_2_human
 system
   system2 2run12-Jaccard-Focused-Voting-SentenceVec_method_2_human
7ethod_4_community            0.239            0.167
              run14-Voting-1.1-SubtitleAndHfw-LSA_method_4_community
 thod_1_community               0.252             0.17
              run15-Voting-1.1-SubtitleAndHfw-QD_method_1_community
                                0.215            0.153
2system
 hod_3_community
    system2 2run13-Voting-1.1-SubtitleAndHfw-LSA_method_3_community
  thod_1_abstract
  thod_1_human                 0.254            0.174
               run15-Voting-1.1-SubtitleAndHfw-QD_method_1_abstract
                                 0.267            0.191
               run15-Voting-1.1-SubtitleAndHfw-QD_method_1_human
4system
 hod_4_community
    system2 2run14-Voting-1.1-SubtitleAndHfw-LSA_method_4_community
7ceVec_method_2_abstract
  thod_1_community             0.252             0.17
               run15-Voting-1.1-SubtitleAndHfw-QD_method_1_community
                                 0.215            0.153
               run16-Voting-1.1-SubtitleAndHfw-SentenceVec_method_2_abstract
                                 0.246            0.179
hod_1_abstract
2
1system
   system2 2run15-Voting-1.1-SubtitleAndHfw-QD_method_1_abstract
 thod_1_human                 0.267            0.191
              run15-Voting-1.1-SubtitleAndHfw-QD_method_1_human
ceVec_method_2_community        0.267            0.191
              run16-Voting-1.1-SubtitleAndHfw-SentenceVec_method_2_community
                                0.207            0.156
hod_1_community
4system
   system2 2run15-Voting-1.1-SubtitleAndHfw-QD_method_1_community
ceVec_method_2_abstract       0.215            0.153
              run16-Voting-1.1-SubtitleAndHfw-SentenceVec_method_2_abstract
ceVec_method_2_human            0.246            0.179
              run16-Voting-1.1-SubtitleAndHfw-SentenceVec_method_2_human
hod_1_human
1
9 system
   system2 2run15-Voting-1.1-SubtitleAndHfw-QD_method_1_human
 ceVec_method_2_community
 _3_community                 0.267            0.191
              run16-Voting-1.1-SubtitleAndHfw-SentenceVec_method_2_community
                                0.207            0.156
              run17-Voting-2.0-TextCNN-LSA_method_3_community
                                0.254            0.174
eVec_method_2_abstract
4 system
   system2 2run16-Voting-1.1-SubtitleAndHfw-SentenceVec_method_2_abstract
 ceVec_method_2_human
3_4_community                 0.246            0.179
              run16-Voting-1.1-SubtitleAndHfw-SentenceVec_method_2_human
                                0.246            0.179
              run18-Voting-2.0-TextCNN-LSA_method_4_community
                                0.252             0.17
eVec_method_2_community
9 system
   system2 2run16-Voting-1.1-SubtitleAndHfw-SentenceVec_method_2_community
3_3_community
 1_abstract                   0.207            0.156
              run17-Voting-2.0-TextCNN-LSA_method_3_community
                                0.254            0.174
              run19-Voting-2.0-TextCNN-QD_method_1_abstract
                                0.257            0.189
eVec_method_2_human
3 system
   system2 2run16-Voting-1.1-SubtitleAndHfw-SentenceVec_method_2_human
2_4_community
 1_community                  0.246           0.179
              run18-Voting-2.0-TextCNN-LSA_method_4_community
                                0.252            0.17
              run19-Voting-2.0-TextCNN-QD_method_1_community
                                0.226           0.156
  system
  _community
   system2 2run17-Voting-2.0-TextCNN-LSA_method_3_community
31_abstract
 1_human                      0.254           0.174
              run19-Voting-2.0-TextCNN-QD_method_1_abstract
                                0.257           0.189
              run19-Voting-2.0-TextCNN-QD_method_1_human
3 system
  _community
   system2 2run18-Voting-2.0-TextCNN-LSA_method_4_community
21_community
 od_4_community               0.252
                                0.226         0.17
              run19-Voting-2.0-TextCNN-QD_method_1_community
                                               0.156
              run2-Jaccard-Cascade-Voting-LSA_method_4_community
                                0.252           0.17
7_abstract
3 system
   system2 2run19-Voting-2.0-TextCNN-QD_method_1_abstract
 1_human                      0.257          0.189
              run19-Voting-2.0-TextCNN-QD_method_1_human
 _method_2_abstract             0.257          0.189
              run20-Voting-2.0-TextCNN-SentenceVec_method_2_abstract
                                0.252          0.177
3_community
6 system
   system2 2run19-Voting-2.0-TextCNN-QD_method_1_community
 od_4_community
 _method_2_community          0.226
                                0.252        0.156
              run2-Jaccard-Cascade-Voting-LSA_method_4_community
                                                0.17
              run20-Voting-2.0-TextCNN-SentenceVec_method_2_community
                                0.225          0.164
7_human
  system
   system2 2run19-Voting-2.0-TextCNN-QD_method_1_human
 _method_2_abstract           0.257          0.189
              run20-Voting-2.0-TextCNN-SentenceVec_method_2_abstract
 _method_2_human                0.252          0.177
              run20-Voting-2.0-TextCNN-SentenceVec_method_2_human
6d_4_community
9system
   system2 2run2-Jaccard-Cascade-Voting-LSA_method_4_community
 _method_2_community
 community                    0.252
                                0.225         0.17
              run20-Voting-2.0-TextCNN-SentenceVec_method_2_community
                                               0.164
              run21-Voting-2.0-Voting-LSA_method_3_community
                                0.254          0.174
3method_2_abstract
7system
   system2 2run20-Voting-2.0-TextCNN-SentenceVec_method_2_abstract
 _method_2_human
 community                    0.252          0.177
              run20-Voting-2.0-TextCNN-SentenceVec_method_2_human
                                0.252          0.177
              run22-Voting-2.0-Voting-LSA_method_4_community
                                                0.17
3method_2_community
  system
   system2 2run20-Voting-2.0-TextCNN-SentenceVec_method_2_community
9abstract
  community                   0.225          0.164
              run21-Voting-2.0-Voting-LSA_method_3_community
                                0.254          0.174
              run23-Voting-2.0-Voting-QD_method_1_abstract
                                0.257          0.189
3method_2_human
  system
   system2 2run20-Voting-2.0-TextCNN-SentenceVec_method_2_human
  community
2community                    0.252          0.177
              run22-Voting-2.0-Voting-LSA_method_4_community
                                0.252           0.17
              run23-Voting-2.0-Voting-QD_method_1_community
                                0.226          0.156                                                     (c)
 ommunity
  system
    system2 2run21-Voting-2.0-Voting-LSA_method_3_community
3abstract
 human                         0.254           0.174
               run23-Voting-2.0-Voting-QD_method_1_abstract
                                 0.257           0.189
               run23-Voting-2.0-Voting-QD_method_1_human
2ommunity
  system
    system2 2run22-Voting-2.0-Voting-LSA_method_4_community
7community                     0.252            0.17
               run23-Voting-2.0-Voting-QD_method_1_community
  ethod_2_abstract               0.226           0.156
               run24-Voting-2.0-Voting-SentenceVec_method_2_abstract
                                 0.252           0.177
6 system
  stract
   system2 2run23-Voting-2.0-Voting-QD_method_1_abstract
3human
 ethod_2_community            0.257
                                0.257         0.189
              run23-Voting-2.0-Voting-QD_method_1_human
                                                0.189
              run24-Voting-2.0-Voting-SentenceVec_method_2_community
                                0.225           0.164
  system
  mmunity
   system2 2run23-Voting-2.0-Voting-QD_method_1_community
7ethod_2_abstract
 ethod_2_human                0.226           0.156
              run24-Voting-2.0-Voting-SentenceVec_method_2_abstract
                                0.252           0.177
              run24-Voting-2.0-Voting-SentenceVec_method_2_human
                                                                       Fig. 2: Task 2 Performances on (a) Abstract, (b) Community and (c) Human
9
7
 system
 man
   system2 2run23-Voting-2.0-Voting-QD_method_1_human
6ethod_2_community
 QD_method_1_abstract
 system
                              0.257           0.189
              run24-Voting-2.0-Voting-SentenceVec_method_2_community
                                0.225           0.164
              run25-Word2vec-H-CNN-SubtitleAndHfw-QD_method_1_abstract
 hod_2_abstract
                                0.238           0.167
   system2 2run24-Voting-2.0-Voting-SentenceVec_method_2_abstract
                              0.252           0.177
                                                                       summaries. Plots correspond to the numbers in Table 2.
9ethod_2_humanrun24-Voting-2.0-Voting-SentenceVec_method_2_human
 QD_method_1_community          0.252           0.177
              run25-Word2vec-H-CNN-SubtitleAndHfw-QD_method_1_community
                                0.201           0.138
9system
 hod_2_community
  system2 2run24-Voting-2.0-Voting-SentenceVec_method_2_community
 QD_method_1_abstract        0.225          0.164
             run25-Word2vec-H-CNN-SubtitleAndHfw-QD_method_1_abstract
 QD_method_1_human             0.238          0.167
             run25-Word2vec-H-CNN-SubtitleAndHfw-QD_method_1_human
6system
9hod_2_human
  system2 2run24-Voting-2.0-Voting-SentenceVec_method_2_human
 QD_method_1_community       0.252          0.177
             run25-Word2vec-H-CNN-SubtitleAndHfw-QD_method_1_community
 SentenceVec_method_2_abstract 0.201          0.138
             run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec_method_2_abstract
                               0.222          0.153
QD_method_1_abstract
3system
9 system2 2run25-Word2vec-H-CNN-SubtitleAndHfw-QD_method_1_abstract
 QD_method_1_human           0.238          0.167
             run25-Word2vec-H-CNN-SubtitleAndHfw-QD_method_1_human
 SentenceVec_method_2_community0.238          0.167
             run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec_method_2_community
                               0.211          0.147
QD_method_1_community
6system
  system2 2run25-Word2vec-H-CNN-SubtitleAndHfw-QD_method_1_community
                             0.201
 SentenceVec_method_2_abstract              0.138
             run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec_method_2_abstract
 SentenceVec_method_2_human    0.222          0.153
             run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec_method_2_human
QD_method_1_human
3system
  system2 2run25-Word2vec-H-CNN-SubtitleAndHfw-QD_method_1_human
 d_1_abstract               0.238
 SentenceVec_method_2_community            0.167
             run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec_method_2_community
                              0.211          0.147
             run3-Jaccard-Cascade-Voting-QD_method_1_abstract
                              0.278             0.2
9system
6entenceVec_method_2_abstract
  system2 2run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec_method_2_abstract
 SentenceVec_method_2_human 0.222          0.153
             run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec_method_2_human
 d_1_community                0.222          0.153
             run3-Jaccard-Cascade-Voting-QD_method_1_community
                              0.201          0.144
3system
 entenceVec_method_2_community
  system2 2run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec_method_2_community
 d_1_abstract
 d_1_human                  0.211          0.147
             run3-Jaccard-Cascade-Voting-QD_method_1_abstract
                              0.278             0.2
             run3-Jaccard-Cascade-Voting-QD_method_1_human
9system
 entenceVec_method_2_human
  system2 2run26-Word2vec-H-CNN-SubtitleAndHfw-SentenceVec_method_2_human
 d_1_community              0.222          0.153
             run3-Jaccard-Cascade-Voting-QD_method_1_community
                              0.201          0.144
3_1_abstract
 system
  system2 2run3-Jaccard-Cascade-Voting-QD_method_1_abstract
 d_1_human                  0.278            0.2
             run3-Jaccard-Cascade-Voting-QD_method_1_human
                              0.278            0.2
 _1_community
 system 2 run3-Jaccard-Cascade-Voting-QD_method_1_community
                           0.201         0.144
 _1_human
 system 2 run3-Jaccard-Cascade-Voting-QD_method_1_human
                           0.278           0.2
opposed to 40 in previous years. Specifically, for Task 1, we used the method
proposed by [17] to prepare noisy training data for about 1000 unannotated
papers; for Task 2, we used the SciSummNet corpus proposed by [23]. For CL-
SciSumm ’19 we use the same blind test data used in CL-SciSumm ’18.
    Based on this we propose the following research questions to comparatively
analyse results from CL-SciSumm ’18 with those from CL-SciSumm ’19. The
research questions we have are:
RQ1. Did data augmentation help systems achieve better performance?
The best Task 1a performance (sentence overlap F1 ) this year is 0.126 from
System 3 [24] which is a deep learning system trained on augmented data. This
is about 0.02 lower than the best CL-SciSumm’18 system [22] which was at 0.145.
It appears that the data augmentation has helped deep learning methods. The
only fully deep learning system from CL-SciSumm ’18 [3] achieved 0.044. So,
increasing training data is clearly the way forward. Traditional machine learning
based systems such as [10] seem to suffer from noise in the augmented data. We
propose to use better data generation method that produces data cleaner than
the naive similarity based cut-off method [17] used this time.
    Note that there was no data augmentation to Task 1b. So, the performance
of traditional methods across CL-SciSumm ’18 and CL-SciSumm, ’19 are largely
the same.
    The best on CL-SciSumm ’19 Task 2 performance on human written sum-
maries on ROUGE-2 is 0.278 by [10]. This is higher than the best CL-SciSumm’18
system which score 0.252 [1]. This suggests that the additional 1000 ScisummNet
summaries is useful to further performance. It also indicates that SciSummNet
relatively cleaner than the auto annotated data used for Task 1a.
    RQ2. CL-SciSumm ’19 encouraged participants to use deep learning based
methods; do they perform better than traditional machine learning methods?
In Task 1a the best performing CL-SciSumm ’19 system The best performing
CL-SciSumm ’18 system [22] used traditional models including random forests
and ranking models trained on the CL-SciSumm ’18 training data. This implies
that for Task 1a, traditional models trained on clean data perform better than
deep learning models trained on noisy data. However, if we look at CL-SciSumm
’19 systems’ performances, we notice that deep learning models perform better
than traditional machine learning models when trained on the augmented data.
    On Task 1b, systems using traditional methods perform better than deep
learning systems. Note that the winner for Task 1a, System 3, is not the best
system for Task 1b although they are not far behind. We also did not add any
additional training data to Task 1b. So, we cannot rule out that deep learn-
ing systems will not perform better than traditional methods when trained on
enough data.
    On Task 2, the best performing system on human summaries, System 2, using
neural representations trained on the 1000 plus summaries, does the best with a
ROUGE-2 score of 0.278. This is higher than CL-SciSumm ’18 top system using
traditional methods. System 3, the second best Cl-SciSumm ’19 system an end-
end deep learning model, with a score of 0.265 is also higher than CLSciSumm
’18 top system. With a score of 0.514 System 3 also improves the state-of-the-art
agasint abstracts by 0.2 on ROUGE-2 score. System 3 is also the top system on
community summaries with a ROUGE-2 score of 0.204.
    In summary, deep learning models do well across the board for summaries.
Traditional methods do better on Task 1a on small but clean training data. Deep
learning methods take over on large bu tnoisy data.

8   Conclusion
Nine systems participated in CL-SciSumm 2019 shared tasks. The systems were
provided with larger but noisy corpus with automatic annotation. Nearly all the
teams had neural methods and many employed transfer learning. Participants
also experimented with the use of word embeddings trained on the shared task
corpus, as well as on other domain corpora. We found that data augmentation
for Task 1a may have helped deep learning models but not traditional machine
learning methods. It also appears that deep learning methods perform better
than traditional methods across the board when they have enough training data.
We will explore methods to obtain cleaner training data for Task 1 without or
with minimal human annotation effort.
    We recommend that future approaches should go beyond off-the-shelf deep
learning methods, and also exploit the structural and semantic characteristics
that are unique to scientific documents; perhaps as an enrichment device for word
embeddings. The committee also observes that CL-SciSumm series over the past
5 years has catalysed research in the area of scientific document summarisation.
We observe that a number of papers outside of the BIRNDL workshop published
at prominent NLP and IR venues evaluate on the CL-SciSumm gold standard
data. To create a reference corpus for the task was a key goal of the series.
We have achieved this goal now. We will consider newer tasks to push the effort
towards automated literature reviews. We will also consider switching the format
of the shared evaluation from a shared task to a leaderboard to which systems
can submit evaluations asynchronously throughout the year.

      Acknowledgement. We would like to thank SRI International for their
      generous funding of CL-SciSumm ’19 and BIRNDL ’19. We thank Chan-
      Zuckerberg Initiative for sponsoring the invited talk. We would also like
      to thank Vasudeva Varma and colleagues at IIIT-Hyderabad, India and
      University of Hyderabad for their efforts in convening and organizing our
      annotation workshops in 2016-17. We acknowledge the continued advice
      of Hoa Dang, Lucy Vanderwende and Anita de Waard from the pilot
      stage of this task. We would also like to thank Rahul Jha and Dragomir
      Radev for sharing their software to prepare the XML versions of papers.
      We are grateful to Kevin B. Cohen and colleagues for their support, and
      for sharing their annotation schema, export scripts and the Knowtator
      package implementation on the Protege software – all of which have been
      indispensable for this shared task.
                             Bibliography


 [1] Aburaed, A., Bravo, A., Chiruzzo, L., Saggion, H.: Lastus/taln+ inco@
     cl-scisumm 2018-using regression and convolutions for cross-document se-
     mantic linking and summarization of scholarly literature. In: Proceedings
     of the 3nd Joint Workshop on Bibliometric-enhanced Information Retrieval
     and Natural Language Processing for Digital Libraries (BIRNDL2018). Ann
     Arbor, Michigan (July 2018) (2018)
 [2] AbuRaed, A., Chiruzzo, L., Bravo, A., Saggion, H.: LaSTUS-TALN+INCO
     @ CL-SciSumm 2019. In: BIRNDL2019 (2019)
 [3] De Moraes, L.F., Das, A., Karimi, S., Verma, R.M.: University of houston@
     cl-scisumm 2018. In: BIRNDL@ SIGIR. pp. 142–149 (2018)
 [4] Fergadis, A., Pappas, D., Papageorgiou, H.: Siamese recurrent bi-directional
     neural network for scientific summarization @ CL-SciSumm 2019 . In:
     BIRNDL2019 (2019)
 [5] Jaidka, K., Chandrasekaran, M.K., Jain, D., Kan, M.Y.: The cl-scisumm
     shared task 2017: Results and key insights. In: BIRNDL@ SIGIR (2). vol.
     2002, pp. 1–15. CEUR (2017)
 [6] Jaidka, K., Chandrasekaran, M.K., Rustagi, S., Kan, M.Y.: Insights from cl-
     scisumm 2016: the faceted scientific document summarization shared task.
     International Journal on Digital Libraries pp. 1–9 (2017)
 [7] Jaidka, K., Yasunaga, M., Chandrasekaran, M.K., Radev, D., Kan, M.Y.:
     The cl-scisumm shared task 2018: Results and key insights. In: BIRNDL@
     SIGIR (2). vol. 2132, pp. 74–83. CEUR (2018)
 [8] Jones, K.S.: Automatic summarising: The state of the art. Information Pro-
     cessing and Management 43(6), 1449–1481 (2007)
 [9] Kim, H., Ou, S.: Ranking-based Identification of Cited Text with Deep
     Learning . In: BIRNDL2019 (2019)
[10] Li, L., Zhu, Y., Xie, Y., Huang, Z., Liu, W., Li, X., Liu, Y.:
     CIST@CLSciSumm-19: Automatic Scientific Paper Summarization with Ci-
     tances and Facets. In: BIRNDL2019 (2019)
[11] Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. Text
     summarization branches out: Proceedings of the ACL-04 workshop 8 (2004)
[12] Liu, F., Liu, Y.: Correlation between rouge and human evaluation of extrac-
     tive meeting summaries. In: Proceedings of the 46th Annual Meeting of the
     Association for Computational Linguistics on Human Language Technolo-
     gies: Short Papers. pp. 201–204. Association for Computational Linguistics
     (2008)
[13] Ma, S., Zhang, H., Xu, T., Xu, J., Hu, S., Zhang, C.: IRTM-NJUST @
     CLSciSumm-19. In: BIRNDL2019 (2019)
[14] Mayr, P., Chandrasekaran, M.K., Jaidka, K.: Editorial for the 2nd joint
     workshop on bibliometric-enhanced information retrieval and natural lan-
     guage processing for digital libraries (BIRNDL) at SIGIR 2017. In: Pro-
     ceedings of the 2nd Joint Workshop on Bibliometric-enhanced Information
     Retrieval and Natural Language Processing for Digital Libraries (BIRNDL
     2017) co-located with the 40th International ACM SIGIR Conference on
     Research and Development in Information Retrieval (SIGIR 2017), Tokyo,
     Japan, August 11, 2017. pp. 1–6 (2017), http://ceur-ws.org/Vol-1888/
     editorial.pdf
[15] Mayr, P., Frommholz, I., Cabanac, G., Wolfram, D.: Editorial for the
     Joint Workshop on Bibliometric-enhanced Information Retrieval and Nat-
     ural Language Processing for Digital Libraries (BIRNDL) at JCDL
     2016. In: Proc. of the Joint Workshop on Bibliometric-enhanced Infor-
     mation Retrieval and Natural Language Processing for Digital Libraries
     (BIRNDL2016). pp. 1–5. Newark, NJ, USA (June 2016)
[16] Nakov, P.I., Schwartz, A.S., Hearst, M.: Citances: Citation sentences for
     semantic analysis of bioscience text. In: Proceedings of the SIGIR’04 work-
     shop on Search and Discovery in Bioinformatics. pp. 81–88 (2004)
[17] Nomoto, T.: Resolving citation links with neural networks. Frontiers in Re-
     search Metrics and Analytics 3, 31 (2018)
[18] Pitarch, Y., Pinel-Sauvagnat, K., Hubert, G., Cabanac, G., elie Fraisier-
     Vannier, O.: IRIT-IRIS at CL-SciSumm 2019: Matching Citances with
     their Intended Reference Text Spans from the Scientific Literature. In:
     BIRNDL2019 (2019)
[19] Qazvinian, V., Radev, D.: Scientific paper summarization using citation
     summary networks. In: Proceedings of the 22nd International Conference
     on Computational Linguistics-Volume 1. pp. 689–696. ACL (2008)
[20] Quatra, M.L., Cagliero, L., Baralis, E.: Poli2Sum@CL-SciSumm 2019: iden-
     tify, classify, and summarize cited text spans by means of ensembles of su-
     pervised models . In: BIRNDL2019 (2019)
[21] Syed, B., Indurthi, V., Srinivasan, B.V., Varma, V.: Transfer learning for
     effective scientific research comprehension. In: BIRNDL2019 (2019)
[22] Wang, P., Li, S., Wang, T., Zhou, H., Tang, J.: Nudt@ clscisumm-18. In:
     BIRNDL@ SIGIR. pp. 102–113 (2018)
[23] Yasunaga, M., Kasai, J., Zhang, R., Fabbri, A., Li, I., Friedman, D., Radev,
     D.: ScisummNet: A large annotated corpus and content-impact models for
     scientific paper summarization with citation networks. In: Proceedings of
     AAAI 2019 (2019)
[24] Zerva, C., Nghiem, M.Q., Nguyen, N.T., Ananiadou, S.: UoM@CL-SciSumm
     2019. In: BIRNDL2019 (2019)

</pre>