<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>WITS: Wikipedia for Italian Text Summarization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Silvia Casola</string-name>
          <email>scasola@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Lavelli</string-name>
          <email>lavelli@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>. Fondazione Bruno Kessler</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>. Universita` degli studi di Padova</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>ive text summarization has recently improved its performance due to the use of sequence to sequence models. However, while these models are extremely data-hungry, datasets in languages other than English are few. In this work, we introduce WITS (Wikipedia for Italian Text Summarization), a largescale dataset built exploiting Wikipedia articles' structure. WITS contains almost 700,000 Wikipedia articles, together with their human-written summaries. Compared to existing data for text summarization in Italian, WITS is more than an order of magnitude larger and more challenging given its lengthy sources. We explore WITS characteristics and present some baselines for future work.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Automatic text summarization aims at
condensing one or more source documents in a shorter
output, which contains their most salient
information. The underlying task can be framed in two
different manners: extractive summarizers select
the most relevant segments from the input and
produce a summary which is a concatenation of such
segments; as a result, the output is a subset of the
original text, which the summary follows
verbatim. On the other hand, abstractive summarizers
aim to encode the whole source into an internal
representation from which they generate the
summary; thus, they produce a new piece of text that
condenses the source without necessarily using its
vocabulary and expressions.</p>
      <p>Recently, abstractive summarization has
attracted a growing interest in the Natural Language</p>
      <p>Copyright © 2021 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).</p>
      <p>Processing (NLP) community. Sequence to
sequence models have been increasingly used for
the task, with pre-trained encoder-decoder
transformers becoming the de facto state of the art
for abstractive text summarization. Normally
pretrained in an unsupervised manner, these models
are then fine-tuned in a supervised way on the
downstream dataset; during fine-tuning, the model
learns to generate the summary from the source
document.</p>
      <p>While various datasets for abstractive
summarization exist for English, resources in other
languages are limited. This paper introduces WITS
(Wikipedia for Italian Text Summarization), a
large-scale dataset for abstractive summarization
in Italian, built exploiting Wikipedia. Taking
advantage of the structure of Wikipedia pages, which
contain a lead section (Figure 1) – giving an
overview of the article’s topic –, followed by the
full-length article – describing the topic in details
–, we create a large and challenging dataset for
abstractive summarization in Italian, which we will
make publicly available.</p>
      <p>WITS is particularly challenging, given its large
source length and its high abstractiveness. In this
paper, we describe the dataset, its statistics and
characteristics, and report some preliminary
experiments that might be used as baselines for
future work.</p>
      <p>This paper is organized as follows: in Section 2,
we describe the state of the art in text
summarization, focusing on resources for Italian. We later
preset the dataset and its related task (Section 3.1);
we describe the data collection and preprocessing
process in Sections 3.2 and 3.3. In Section 4, we
show our results when summarising the dataset
using some existing extractive baseline models.
Finally, we draw our conclusions in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>State of the Art</title>
      <p>Automatic text summarization has recently
attracted increasing attention from the NLP
community. However, the majority of the research work
still focuses on English.</p>
      <p>
        As a matter of example, out of all the papers
published in the Association for Computational
Linguistics (ACL) conference in 2021, 46
explicitly refer to summarization in their title; 38 of these
dealt with English only, while 7 presented
experiments with one or more other languages
(including 2 on source code summarization). For
reference, only one paper
        <xref ref-type="bibr" rid="ref11 ref23">(Mastronardo and Tamburini,
2019)</xref>
        on text summarization (in English) was
published at the Italian Conference on Computational
Linguistics (CLiC-it) since its first edition, and
none experimented with Italian.
      </p>
      <p>In this section, we present the state of the art
in abstractive text summarization. We first present
the available datasets for the task; then, we
discuss some relevant learning models. We focus on
the significant gap between English and Italian, for
which very few resources exist.
2.1</p>
      <sec id="sec-2-1">
        <title>Datasets for Automatic Text</title>
      </sec>
      <sec id="sec-2-2">
        <title>Summarization</title>
        <p>A typical dataset for text summarization is
composed of some source documents (which needs
to be summarized) and their corresponding
summaries, used as the gold standard. A minority
of datasets (e.g., the DUC 2004 dataset1) provide
multiple gold standards; however, such datasets
1https://duc.nist.gov/duc2004/
tend to be small and are mostly used for
evaluation.</p>
        <p>
          In general, summaries exploit a human-written
abstract. For example, the CNN/Daily Mail
Corpus
          <xref ref-type="bibr" rid="ref13">(Nallapati et al., 2016)</xref>
          2 leverages a
bulletpoint summary on the newspapers’ websites. A
similar rationale is used in datasets constructed
from scientific papers
          <xref ref-type="bibr" rid="ref4">(Cohan et al., 2018)</xref>
          3 or
patents
          <xref ref-type="bibr" rid="ref23">(Sharma et al., 2019)</xref>
          4. In contrast, Rush
et al. (2015)5 frames the task of news
summarization as headline generation.
        </p>
        <p>
          To the best of our knowledge, WikiLingua
          <xref ref-type="bibr" rid="ref16">(Ladhak et al., 2020)</xref>
          6 is the only summarization
dataset that contains data in Italian. WikiLingua is
a cross-lingual dataset for abstractive text
summarization built on top of WikiHow. WikiHow
contains tutorials on how to perform specific tasks in
the form of step-by-step instructions. The dataset
constructs a summary by concatenating the first
sentence for each step and using the remaining text
as the source. WikiLingua contains data in 18
languages, including Italian (50,943 source-summary
pairs). Both summaries and sources are relatively
short (on average, 44 and 418 tokens, respectively,
for the Italian split).
2.2
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Models for Abstractive Text</title>
      </sec>
      <sec id="sec-2-4">
        <title>Summarization</title>
        <p>
          Abstractive text summarization is one of the most
challenging tasks in NLP: it requires very long
input understanding (encoding), salient passages
ifnding and constrained text generation.
Technically, models for abstractive text summarization
are generally sequence-to-sequence: they encode
the input and then generate the output through a
neural network. While some previous work used
Recurrent Neural Networks
          <xref ref-type="bibr" rid="ref3">(Chung et al., 2014)</xref>
          ,
with the possible addition of an encoder-decoder
attention mechanism
          <xref ref-type="bibr" rid="ref2">(Chopra et al., 2016)</xref>
          ,
transformer models
          <xref ref-type="bibr" rid="ref25">(Vaswani et al., 2017)</xref>
          have later
become pervasive, following a similar trend in
many other NLP areas. Using self-attention, these
models have proved to be superior to Recurrent
2https://huggingface.co/datasets/cnn d
ailymail
        </p>
        <p>3https://huggingface.co/datasets/arxi
v dataset</p>
        <p>4https://huggingface.co/datasets/big p
atent</p>
        <p>5https://huggingface.co/datasets/giga
word</p>
        <p>6https://huggingface.co/datasets/wiki
lingua
Neural Networks, as they are able to better deal
with long dependencies, a critical task in text
summarization.
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>WITS</title>
      <sec id="sec-3-1">
        <title>Task and Rationale</title>
        <p>
          Following another recent trend in NLP, many
summarization models use a transfer-learning
approach: after a pre-training phase, in which they
are training in an unsupervised way on a huge
amount of text, they are fine-tuned for the specific
downstream task on a relatively limited amount
of supervised data. Summarization models either
exploit encoders and decoders previously trained
for other tasks or are pre-trained from scratch on
a specific objective tailored for summarization.
Rothe et al. (2020), for example, leveraged
previously existing pre-trained models (BERT in
Devlin et al. (2019); ROBERTA in Liu et al. (2019);
and GPT-27 in Radford et al. (2019)) as encoders
or decoders of the sequence-to-sequence
summarizer and showed high performance improvement
with respect to random initialization. More
recently, summarization models
          <xref ref-type="bibr" rid="ref24 ref7">(Song et al., 2019;
Lewis et al., 2020)</xref>
          have been pre-trained with an
objective specific to Natural Language Generation
tasks. For example, authors of Pegasus
          <xref ref-type="bibr" rid="ref27">(Zhang et
al., 2020)</xref>
          used two objectives: Masked Language
Model
          <xref ref-type="bibr" rid="ref5">(Devlin et al., 2019)</xref>
          has been widely used
in previous work, and consists in masking a
percentage of tokens in text, later predicted using
context; Gap Sentences Generation is instead a new
pre-training objective, in which a percentage of
the original sentences are masked, and the model
needs to generate them in accordance to the
contest.
        </p>
        <p>
          Following a shared practice, most
summarization models have first been trained and evaluated
for English only. In some cases, a subsequent
multilingual version of the model was also created
          <xref ref-type="bibr" rid="ref26">(Xue et al., 2021)</xref>
          . To the best of our knowledge,
few sequence-to-sequence models in Italian exist
to date8, and while they might be fine-tuned for
summarization, no full-scale evaluation has been
performed yet.
Given a Wikipedia article, we extract the lead
section (which we sometimes refer to as ”Summary”
in the remaining of the paper) and propose the
following task:
        </p>
        <sec id="sec-3-1-1">
          <title>Given all article sections, summarize its</title>
          <p>content to produce its lead section.</p>
          <p>The task is rather natural given pages structure.
According to the Wikipedia Manual of Style9, the
lead section is, in fact, a high-quality summary of
the body of the article. The lead “serves as an
introduction to the article and a summary of its most
important contents” and “gives the basics in a
nutshell and cultivates interest in reading on—though
not by teasing the reader or hinting at what
follows”. Moreover, it should “stand on its own as a
concise overview of the article’s topic”.</p>
          <p>As for the content, according to Wikipedia, the
lead must define the topic, explaining its
importance and the relevant context; then, it must
summarize the most prominent points of the article,
emphasizing the most important material.</p>
          <p>
            Moreover, the lead should only cover
information that is contained in the article: “significant
information should not appear in the lead if it
is not covered in the remainder of the article”.
This is particularly relevant for abstractive
summarization, as models are more prone to produce
summaries that are not factual to the source
(often called hallucinations) when they are trained
to generate summaries containing information not
in the source
            <xref ref-type="bibr" rid="ref14">(Nan et al., 2021)</xref>
            . The problem
of factuality in abstractive summarization is
currently an active area of research, as previous work
has shown that up to 30% of generated summaries
contain non-factual information
            <xref ref-type="bibr" rid="ref1">(Cao et al., 2018)</xref>
            .
          </p>
          <p>Linguistically, the lead “should be written in
a clear, accessible style with a neutral point of
view”. It is worth noting that, in contrast to
WikiLingua, where the summary is constructed as a
concatenation of sentences from different parts of
the articles, the summary in WITS is a stand-alone
piece of text, with a coherent discourse structure.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Data Collection</title>
        <p>This section describes the process of data
collection and preprocessing.</p>
        <p>9https://en.wikipedia.org/wiki/Wikipe
dia:Manual of Style
# docs
# sentences (avg)
# tokens (avg)
Comp. ratio (avg)</p>
        <p>We downloaded the latest XML dump of
Wikipedia in Italian10, which contains text only.
We used Python and the Gensim library to
process the file 11. The original number of documents
was 1,454,884. We applied the following
exclusion criteria: we removed pages whose title
contains numbers only (as they mostly describe years
and contain lists of events and references), lists
(titles starting with “Lista d”), pages with summaries
with less than 80 characters and articles and pages
for which the article is less than 1.5 times longer
than the lead.</p>
        <p>We then preprocessed the text in the following
way: from the summary, we removed the content
of parentheses (as they often contain alternative
names or names in a different language, which
cannot be inferred from the article). For the
article, we further excluded the following sections,
which are not relevant for our task: Note
(Footnotes), Bibliografia (References), Voci correlate
(See also), Altri progetti (Other projects),
Collegamenti esterni (External links), Galleria di
Immagini (Images).
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Dataset Statistics</title>
        <p>Table 1 shows some statistics on the dataset and
compares WITS with the Italian split of
WikiLingua (which we will refer to as IT-WikiLingua).</p>
        <p>IT-WikiLingua contains documents from
17,673 WikiHow pages, but some of these pages
describe more than one method related to the same
topic. For example, the page “How to Reduce the
Redness of Sunburn” contains several methods:
“Healing and Concealing Sunburns”, “Lessening
Your Pain and Discomfort”, and “Preventing
a Sunburn”. We consider distinct methods as
separate documents, as they can be summarized
10https://dumps.wikimedia.org/itwiki/l
atest/itwiki-latest-pages-articles.xml.b
z2</p>
        <p>11https://radimrehurek.com/gensim/scri
pts/segment\ wiki.html
PER (avg)
LOC (avg)
ORG (avg)
MISC (avg)
All (avg)</p>
        <sec id="sec-3-3-1">
          <title>Named Entities in WITS and IT</title>
          <p>in isolation. Notice that WITS is more than an
order of magnitude larger than IT-Wikilingua.</p>
          <p>We computed the number of tokens and the
number of sentences through the spaCy
it-corenews-lg12 model. Compared to IT-WikiLingua,
documents in WITS contains more tokens both in
their summary and in their source (which is more
than double in length), making the dataset
particularly challenging. Note how the sentences are
also more lengthy (thus complex) on average. For
example, summaries in WITS contain on average
less than 4 sentences, but more than 70 words;
in contrast, IT-WikiLingua’s summaries consist of
more than 5 sentences but contain on average 44
tokens. Not surprisingly, WITS’ compression
ratio is larger than IT-WikiLingua’s and very high
in absolute value. Finally, we also notice that the
dataset is very rich in named entities. Table 2
reports the Named Entities as extracted with spaCy
from WITS and IT-Wikilingua.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Baselines</title>
      <p>We tested some preliminary non-neural baseline
methods on the dataset, reported in Table 3.</p>
      <p>
        All methods reported are unsupervised. Thus,
we unsupervisedly obtained the summary from the
source and then used the lead as the gold standard
for evaluation. We evaluated the summaries using
Recall-Oriented Understudy for Gisting
Evaluation (ROUGE)
        <xref ref-type="bibr" rid="ref8">(Lin, 2004)</xref>
        . ROUGE is an n-gram
based, recall-oriented metric for summary quality
evaluation. Following previous work
        <xref ref-type="bibr" rid="ref10">(Lloret et al.,
2018)</xref>
        , we report ROUGE-1 (R-1), ROUGE-2
(R2), and ROUGE-L (R-L) (recall).
      </p>
      <p>
        We considered the following baselines:
Lead-3 We extract the first three sentences from
the source. Previous work has shown that
this baseline is often hard to beat
        <xref ref-type="bibr" rid="ref22">(See et
al., 2017)</xref>
        , especially in news summarization,
12https://spacy.io/models/it
which presents an “inverted pyramid”
structure and tends to report the most important
content at the start.
      </p>
      <sec id="sec-4-1">
        <title>TextRank (Mihalcea and Tarau, 2004)</title>
        <p>
          TextRank is an unsupervised algorithm
that extracts the most relevant sentences
in the source. The algorithm constructs a
graph with sentences as nodes and sentence
similarity (in terms of shared vocabulary)
as edges. The sentences are then ranked
by using the PageRank
          <xref ref-type="bibr" rid="ref17">(Page et al., 1999)</xref>
          algorithm.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>LexRank (Erkan and Radev, 2004) LexRank</title>
        <p>works in a similar way as TextRank.
However, instead of computing sentence
similarity on normalized shared vocabulary,
it uses the cosine similarity of their TF-IDF
vectors.</p>
      </sec>
      <sec id="sec-4-3">
        <title>SumBasic (Nenkova and Vanderwende, 2005)</title>
        <p>SumBasic extracts sentences based on their
word probabilities. Specifically, it scores
each sentence as the mean of the probability
of the words it contains (based on their
frequency in the document). Iteratively, the
sentence with the best score among the ones
containing the most probable word is chosen.
The probability of the words in the chosen
sentence is then squared to limit redundancy.</p>
      </sec>
      <sec id="sec-4-4">
        <title>IT5-small (Raffel et al., 2020) The Text-to-Text</title>
        <p>
          Transfer Transformer (T5) is a pre-trained
sequence-to-sequence language model,
trained treating both input and output as
text strings; the rationale is to use the same
models for all NLP tasks, unifying them
under the sequence-to-sequence framework.
We use a small version of the original model
(60 million parameters)13, pretrained on the
Clean Italian mC4 IT14, the Italian split of
the multilingual cleaned version of Common
Crawl’s Corpus (mC4)
          <xref ref-type="bibr" rid="ref19">(Raffel et al., 2020)</xref>
          .
We extracted 10,000 summary-source pairs
from the dataset for the validation set, and
10,000 for the test set. We trained the model
on the rest of the data for 100,000 steps; this
account for around 30% of the training data.
13https://huggingface.co/gsarti/it5-sm
all
        </p>
        <p>14https://huggingface.co/datasets/gsar
ti/clean mc4 it
We trained on two GeForce RTX 2080 GPUs
and kept the batch size per GPU to 1. We
kept the summary length to 75 tokens, and
the source text length to 1000 tokens.</p>
        <sec id="sec-4-4-1">
          <title>Lead-3</title>
          <p>TextRank
LexRank
SumBasic
IT5-small</p>
          <p>Results show that the Lead-3 baseline
performance is low; this is likely due to the structure of
Wikipedia, which contains several thematic
sections without a general introduction outside the
lead section. Extracting the first sentence(s) from
each section would likely produce better results
and could be investigated in future work.</p>
          <p>In contrast, TextRank is the best non-neural
baseline, with a ROUGE-2 score of 6.57; LexRank
performs comparably. SumBasic metrics are even
lower than those obtained with the Lead-3
baseline, suggesting that a purely frequency-based
approach is insufficient given the dataset complexity.</p>
          <p>Finally, the neural baseline achieves the best
results in terms of ROUGE-2, even if it is
relatively small and likely severely under-trained,
since only around 30% of the data are used for
ifne-tuning, due to computational constraints. This
suggests that sequence-to-sequence neural models
have great potential on the dataset, and should be
better investigated in future work. Surprisingly,
however, results in terms of ROUGE-1 are instead
below most of the other baselines. Future work
should investigate this discrepancy.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>We have presented WITS, the first large-scale
dataset for abstractive summarization in Italian.
We have exploited Wikipedia’s articles’ structure
to build a challenging, non-technical dataset, with
high-quality human-written abstracts. Given the
lengthy source documents, the short summaries
and the short extractive fragments, the dataset calls
for an abstractive approach. In the paper, we
have explored some standard non-neural extractive
baselines and a neural abstractive baseline. Future
work will investigate further neural baselines for
the dataset. Moreover, the dataset can be easily
extended applying the procedure described in the
paper to more languages, including low-resource
ones given Wikipedia structure. We are confident
that research in summarization in languages other
than English will become more active in the near
future and hope that WITS can be a valuable step
in this direction.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Ziqiang</given-names>
            <surname>Cao</surname>
          </string-name>
          , Furu Wei,
          <string-name>
            <given-names>Wenjie</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Sujian</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Faithful to the original: Fact aware neural abstractive summarization</article-title>
          .
          <source>In Proceedings of the AAAI Conference on Artificial Intelligence .</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Sumit</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Auli</surname>
          </string-name>
          , and
          <string-name>
            <surname>Alexander</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Rush</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Abstractive sentence summarization with attentive recurrent neural networks</article-title>
          .
          <source>In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , pages
          <fpage>93</fpage>
          -
          <lpage>98</lpage>
          , San Diego, California, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Junyoung</given-names>
            <surname>Chung</surname>
          </string-name>
          , Caglar Gulcehre, Kyunghyun Cho, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Empirical evaluation of gated recurrent neural networks on sequence modeling</article-title>
          .
          <source>In NIPS 2014 Workshop on Deep Learning</source>
          ,
          <year>December 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Arman</given-names>
            <surname>Cohan</surname>
          </string-name>
          , Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim,
          <string-name>
            <given-names>Walter</given-names>
            <surname>Chang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Nazli</given-names>
            <surname>Goharian</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A discourse-aware attention model for abstractive summarization of long documents</article-title>
          .
          <source>In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>2</volume>
          (
          <issue>Short Papers)</issue>
          , pages
          <fpage>615</fpage>
          -
          <lpage>621</lpage>
          , New Orleans, Louisiana, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Minneapolis, Minnesota, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>Gu¨nes Erkan</article-title>
          and
          <string-name>
            <surname>Dragomir R. Radev</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>LexRank: Graph-based lexical centrality as salience in text summarization</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <fpage>457</fpage>
          -
          <lpage>479</lpage>
          , December.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Mike</given-names>
            <surname>Lewis</surname>
          </string-name>
          , Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed,
          <string-name>
            <surname>Omer Levy</surname>
          </string-name>
          , Veselin Stoyanov, and
          <string-name>
            <given-names>Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>BART: Denoising sequence-to-sequence pretraining for natural language generation, translation, and comprehension</article-title>
          .
          <source>In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>7871</fpage>
          -
          <lpage>7880</lpage>
          , Online, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Chin-Yew Lin</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>ROUGE: A package for automatic evaluation of summaries</article-title>
          .
          <source>In Text Summarization Branches Out</source>
          , pages
          <fpage>74</fpage>
          -
          <lpage>81</lpage>
          , Barcelona, Spain, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Yinhan</given-names>
            <surname>Liu</surname>
          </string-name>
          , Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>RoBERTa: A robustly optimized BERT pretraining approach</article-title>
          . ArXiv, abs/
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Elena</given-names>
            <surname>Lloret</surname>
          </string-name>
          , Laura Plaza, and
          <string-name>
            <given-names>Ahmet</given-names>
            <surname>Aker</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>The Challenging Task of Summary Evaluation: An Overview</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>52</volume>
          (
          <issue>1</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Mastronardo</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Tamburini</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Enhancing a text summarization system with ELMo</article-title>
          . In CLiC-it.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Rada</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          and
          <string-name>
            <given-names>Paul</given-names>
            <surname>Tarau</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>TextRank: Bringing order into text</article-title>
          .
          <source>In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>404</fpage>
          -
          <lpage>411</lpage>
          , Barcelona, Spain, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Ramesh</given-names>
            <surname>Nallapati</surname>
          </string-name>
          , Bowen Zhou,
          <article-title>Cicero dos Santos, C¸ ag˘lar Gulc¸ehre, and</article-title>
          <string-name>
            <given-names>Bing</given-names>
            <surname>Xiang</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Abstractive text summarization using sequence-tosequence RNNs and beyond</article-title>
          .
          <source>In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning</source>
          , pages
          <fpage>280</fpage>
          -
          <lpage>290</lpage>
          , Berlin, Germany, August. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Feng</given-names>
            <surname>Nan</surname>
          </string-name>
          , Ramesh Nallapati, Zhiguo Wang, Cicero Nogueira dos Santos, Henghui Zhu, Dejiao Zhang,
          <string-name>
            <surname>Kathleen McKeown</surname>
            ,
            <given-names>and Bing</given-names>
          </string-name>
          <string-name>
            <surname>Xiang</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Entitylevel factual consistency of abstractive text summarization</article-title>
          .
          <source>In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume</source>
          , pages
          <fpage>2727</fpage>
          -
          <lpage>2733</lpage>
          , Online, April. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Ani</given-names>
            <surname>Nenkova</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lucy</given-names>
            <surname>Vanderwende</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>The impact of frequency on summarization</article-title>
          .
          <source>Technical report</source>
          , Microsoft Research.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Faisal</given-names>
            <surname>Ladhak</surname>
          </string-name>
          , Esin Durmus,
          <string-name>
            <given-names>Claire</given-names>
            <surname>Cardie</surname>
          </string-name>
          , and
          <string-name>
            <surname>Kathleen McKeown</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>WikiLingua: A new benchmark dataset for multilingual abstractive summarization</article-title>
          .
          <source>In Findings of EMNLP</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Lawrence</given-names>
            <surname>Page</surname>
          </string-name>
          , Sergey Brin, Rajeev Motwani, and
          <string-name>
            <given-names>Terry</given-names>
            <surname>Winograd</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>The pagerank citation ranking: Bringing order to the web</article-title>
          .
          <source>Technical Report 1999-66</source>
          ,
          <string-name>
            <surname>Stanford</surname>
            <given-names>InfoLab</given-names>
          </string-name>
          , November.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Alec</given-names>
            <surname>Radford</surname>
          </string-name>
          , Jeff Wu, Rewon Child, David Luan,
          <string-name>
            <given-names>Dario</given-names>
            <surname>Amodei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Language models are unsupervised multitask learners</article-title>
          .
          <source>Technical report.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Colin</given-names>
            <surname>Raffel</surname>
          </string-name>
          , Noam Shazeer, Adam Roberts,
          <string-name>
            <given-names>Katherine</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sharan</given-names>
            <surname>Narang</surname>
          </string-name>
          , Michael Matena,
          <string-name>
            <surname>Yanqi Zhou</surname>
            ,
            <given-names>Wei</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <surname>Peter J. Liu</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>21</volume>
          (
          <issue>140</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Sascha</given-names>
            <surname>Rothe</surname>
          </string-name>
          , Shashi Narayan, and
          <string-name>
            <given-names>Aliaksei</given-names>
            <surname>Severyn</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Leveraging pre-trained checkpoints for sequence generation tasks</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>8</volume>
          :
          <fpage>264</fpage>
          -
          <lpage>280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Alexander M. Rush</surname>
            , Sumit Chopra, and
            <given-names>Jason</given-names>
          </string-name>
          <string-name>
            <surname>Weston</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A neural attention model for abstractive sentence summarization</article-title>
          .
          <source>In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>379</fpage>
          -
          <lpage>389</lpage>
          , Lisbon, Portugal, September. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Abigail</given-names>
            <surname>See</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter J. Liu</surname>
            , and
            <given-names>Christopher D.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Get to the point: Summarization with pointer-generator networks</article-title>
          .
          <source>In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>1073</fpage>
          -
          <lpage>1083</lpage>
          , Vancouver, Canada, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Eva</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Chen</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Lu</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BIGPATENT: A large-scale dataset for abstractive and coherent summarization</article-title>
          .
          <source>In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>2204</fpage>
          -
          <lpage>2213</lpage>
          , Florence, Italy, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Kaitao</given-names>
            <surname>Song</surname>
          </string-name>
          , Xu Tan, Tao Qin, Jianfeng Lu, and
          <string-name>
            <surname>Tie-Yan Liu</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>MASS: masked sequence to sequence pre-training for language generation</article-title>
          .
          <source>In Proceedings of the 36th International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2019</year>
          ,
          <volume>9</volume>
          -
          <fpage>15</fpage>
          June 2019, Long Beach, California, USA, volume
          <volume>97</volume>
          <source>of Proceedings of Machine Learning Research</source>
          , pages
          <fpage>5926</fpage>
          -
          <lpage>5936</lpage>
          . PMLR.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Vaswani</surname>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
          <string-name>
            <surname>Łukasz Kaiser</surname>
            , and
            <given-names>Illia</given-names>
          </string-name>
          <string-name>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attention is all you need</article-title>
          . In I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , and R. Garnett, editors,
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>30</volume>
          . Curran Associates, Inc.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Linting</given-names>
            <surname>Xue</surname>
          </string-name>
          , Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and
          <string-name>
            <given-names>Colin</given-names>
            <surname>Raffel</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>mT5: A massively multilingual pre-trained text-to-text transformer</article-title>
          .
          <source>In NAACL.</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Jingqing</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Yao Zhao,
          <string-name>
            <given-names>Mohammad</given-names>
            <surname>Saleh</surname>
          </string-name>
          , and
          <string-name>
            <surname>Peter J. Liu</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>PEGASUS: pre-training with extracted gap-sentences for abstractive summarization</article-title>
          .
          <source>In Proceedings of the 37th International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2020</year>
          ,
          <volume>13</volume>
          -
          <issue>18</issue>
          <year>July 2020</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          , volume
          <volume>119</volume>
          <source>of Proceedings of Machine Learning Research</source>
          , pages
          <fpage>11328</fpage>
          -
          <lpage>11339</lpage>
          . PMLR.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>