<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Idiap Abstract Text Summarization System for German Text Summarization Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shantipriya Parida Petr Motlicek</string-name>
          <email>petr.motlicek@idiap.ch</email>
          <email>shantipriya.parida@idiap.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Idiap Research Institute Rue Marconi 19</institution>
          ,
          <addr-line>1920 Martigny</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>Text summarization is considered as a challenging task in the NLP community. The availability of datasets for the task of multilingual text summarization is rare, and such datasets are difficult to construct. In this work, we build an abstract text summarizer for the German language text using the state-of-the-art “Transformer” model. We propose an iterative data augmentation approach which uses synthetic data along with the real summarization data for the German language. To generate synthetic data, the Common Crawl (German) dataset is exploited, which covers different domains. The synthetic data is effective for the low resource conditions, and is particularly helpful for multilingual scenario where availability of summarizing data is still a challenging issue.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Automatic text summarization is considered as a
challenging task because while summarize a piece
of text, we read it entirely to develop our
understanding to prepare highlighting its main points.
Due to the lack of human knowledge and language
processing abilities in computers, automatic text
summarization is a major non-trivial tasks
        <xref ref-type="bibr" rid="ref1">(Allahyari et al., 2017)</xref>
        .
      </p>
      <p>
        The two major approaches for automatic text
summarization are: abstractive and extractive.
The extractive summarization approach produces
summaries by choosing a subset of sentences in
the original text. The abstractive text
summarization approach aims to shorten the long text into a
human-readable form that contains the most
important fact from the original text
        <xref ref-type="bibr" rid="ref1 ref6">(Allahyari et al.,
2017; Krys´cin´ski et al., 2018)</xref>
        .
      </p>
      <p>
        The deep learning based neural attention model
when applied to abstract text summarization
performs well compared to standard learning based
approaches
        <xref ref-type="bibr" rid="ref12">(Rush et al., 2015)</xref>
        . Abstract text
summarization using the attentional encoder-decoder
recurrent neural network approach shows a
stateof-the-art performance, and sets a baseline model
        <xref ref-type="bibr" rid="ref9">(Nallapati et al., 2016)</xref>
        . Further improvements are
introduced to the baseline model by using pointer
generator network and coverage mechanism using
reinforcement learning based training procedure
        <xref ref-type="bibr" rid="ref10 ref13">(See et al., 2017; Paulus et al., 2017)</xref>
        . There is
an inherent limitation to natural language
processing tasks such as text summarization for
resourcepoor and morphological complex languages
owing to a shortage of quality linguistic data
available
        <xref ref-type="bibr" rid="ref5 ref7">(Kurniawan and Louvan, 2018)</xref>
        . The use
of synthetic data along with the real data is one
of the popular approaches followed in machine
translation domain for the low resource
conditions to improve the translation quality
        <xref ref-type="bibr" rid="ref2 ref3 ref4">(Bojar and
Tamchyna, 2011; Hoang et al., 2018; Chinea-Rıos
et al., 2017)</xref>
        . The iterative back-translation (e.g.
training back-translation systems multiple times)
were also found effective in machine translation
        <xref ref-type="bibr" rid="ref4">(Hoang et al., 2018)</xref>
        . We explore similar
approaches in our experiments for the text
summarization task.
      </p>
      <p>The organizations of this paper is as follows:
Section 2 explains the techniques followed in our
work. Section 3 describes the dataset used in our
experiments. Section 4 explains the experimental
settings: models and their parameters. Section 5
provides evaluation results with analysis and
discussion. Section 6 provides conclusion to the
paper.</p>
    </sec>
    <sec id="sec-2">
      <title>Method Description</title>
      <p>
        Across all experiments performed in this paper, we
have used the Transformer model as implemented
in OpenNMT-py1
        <xref ref-type="bibr" rid="ref13 ref15">(Vaswani et al., 2018; See et al.,
2017)</xref>
        . The Transformer model is based on
encoder/decoder architecture. In context to
summarize, it takes text as input and provides its
summary.
      </p>
      <p>We use synthetic data as shown in Figure 1 to
increase the size of the training data.</p>
      <p>Real
Text
We use German wiki data (spread across
different domain) collected from the SwissText 2019
2 (real data) and Common Crawl 3 German data
(synthetic data) in our experiment. The statistics
of all the datasets are shown in Table 1.
3.1</p>
      <sec id="sec-2-1">
        <title>SwissText datasets used as real data</title>
        <p>We divide the 100K SwissText dataset
(downloaded from SwissText 2019 website) into three
subsets: train, dev, and test in 90:5:5 ratio (i.e.
90K for training, 5K for development and 5K for
the test data). The experiments performed over
1http://opennmt.net/OpenNMT-py/
Summarization.html
2https://www.swisstext.org/
3http://commoncrawl.org/
these datasets are described in the Section 4
(denoted as S1 experimental setup).
3.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Common Crawl dataset used as synthetic data</title>
        <p>The data crawled from the Internet (Common
Crawl) used to prepare synthetic data to boost the
training. The steps followed to create the synthetic
data as follows.</p>
        <p>Step 1: Build vocab: We create vocabulary using
SwissText based on the occurrence of the
most frequent (top N) German words.</p>
        <p>Step 2: Sentence selection: The sentences from
the Common Crawl data are selected with
respect to the vocabulary based on the
threshold we provide (e.g. a sentence has
10 words and the threshold is 10% (0.1)).
For a sentence to be selected, at least 1
word out of the 10 words should be in the
vocabulary.</p>
        <p>Step 3: Filtering: Select random sentences (e.g.
100K) from the selected Common Crawl
data in the previous step.</p>
        <p>Step 4: Generate summary: The 100K data
obtained from the previous step are used as a
summary and required to generate
corresponding text. We use the reverse trained
model where we provide the summary as
source and target as text. This results
in the text as well as the corresponding
summary as additional data to be utilized
along with real data (SwissText).</p>
        <p>Eventually, the 190K dataset is created
(denote as Train RealSynth) as a combination of 90K
SwissText train data (real) and 100K synthetic
data. This dataset is used in the experimental setup
S2 (described in details in Section 4).</p>
        <p>
          DataSet #Sent #Summ
Train Real (SwissText) 90K 90K
Train RealSynth (Swiss+CC) 190K 190K
Train RealSynthRegen (Swiss+CC) 190K 190K
Dev (SwisText) 5K 5K
Test (SwissText) 5K 5K
Test (SwissText Evaluation) 2K
This section describes our experiments conducted
for the text summarization task.
The preprocess step involves preprocessing the
data such that source and target are aligned, and
use the same dictionary. Additionally we
truncate the source length at 400 tokens and the
target length at 100 tokens to expedite training
          <xref ref-type="bibr" rid="ref13">(See
et al., 2017)</xref>
          .
The Transformer model is implemented in
OpenNMT-py. To train the model, we use a
single GPU. To fit the model to the GPU cluster, a
batch size equal to 4096 is selected for training.
The validation batch size is set to 8. We use an
initial learning rate of 2, drop out of 0.2 and 8000
warm-up steps. Decoding uses a beam size of 10
and we did not set a minimum length of output
summary.
4.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Model Setup</title>
        <p>
          We use 3 settings: i) real data (we set this as the
baseline for our experiments), ii) real data and
synthetic data, and iii) real and regenerated synthetic
data for the summarization task, described as
follows:
1. S1: Transformer model using Train Real
data: In this setup, we use the “Train Real”
data for training the Transformer model.
2. S2: Transformer Model using
Train RealSynth data: In this setup, we
use the “Train RealSynth” data for training
the Transformer model. As the balance
between real and synthetic data is an important
factor, we maintain a 1:1 ratio (e.g. 1 (real)
:1 (synthetic)) for our experiment
          <xref ref-type="bibr" rid="ref14">(Sennrich
et al., 2015)</xref>
          .
3. S3: Transformer Model using
Train RealSynthRegen data: We propose
an iterative approach to improve the quality
of synthetic summaries. In this setup, after
training the system with (real+synthetic)
data, it is used to regenerate synthetic data
for the final system. As a result, the input
data to the final system is a combination of
real and regenerated synthetic data as shown
in Figure 2.
4.4
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Training Procedure</title>
        <p>
          The copying mechanism is applied during
training. It allows the summarizer to fall back and copy
the source text when encounters &lt; unk &gt; tokens
by referencing to the softmax of the multiplication
between attention scores of the output with the
attention scores of the source
          <xref ref-type="bibr" rid="ref13">(See et al., 2017)</xref>
          . The
systems are trained over 300K iterations.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation and Discussion</title>
      <p>
        We evaluate the results for every 10,000
iterations on the dev and test set. The automatic
evaluation results based on the dev and test set are
shown in Table 2 with sample summaries in
Table 3. To evaluate the proposed algorithms, we use
ROUGE (Recall-Oriented Understudy for Gisting
Evaluation) score, which is a popular metric for
text summarization task, and has several variants
like ROUGE-N, and ROUGE-L, which measure
the overlap of n-grams between the system and
reference summary
        <xref ref-type="bibr" rid="ref8">(LIN, 2004)</xref>
        . In addition, we
also use the SacreBLEU4 evaluation metric
        <xref ref-type="bibr" rid="ref11">(Post,
2018)</xref>
        .
      </p>
      <p>Figure 3 presents the learning curves for the
models (S1 and S2) on the development set. It
can be seen that there is a variance (e.g. word
selection, summary length) for model S2
generated summary as compared with model S1.
During manual verification, we found that the
summaries generated without a minimum length
constraint appear better compared to summaries with
minimum length constraint. Although we do not
explicitly specify a minimum length parameter for
generating summaries for the models, the average
length of words generated by model S2 (e.g. 41.42
words) is longer than the model S1 (e.g. 39.81
words). Some data (e.g. name, year) were found
inconsistent during a comparison of the generated
summary with respect to the reference. There is a
variance in summaries generated by model S3 as
compared to model S2 and S1. In terms of Rouge
score model S3 outperforms model S1 but perform
worse than model S2 (see Table 2).</p>
      <p>4https://github.com/mjpost/sacreBLEU
Text
Ref Summary : “Das Feuerschiff Relandersgrund war ein finnisches Feuerschiff, das von 1888 bis 1914 im Schrenmeer bei
Rauma positioniert war. Heute dient es als Restaurantschiff in Helsinki.”
S1 Summary: :“Die “Rauma”. ist ein 1886—1888 Feuerschiff der norwegischen Reederei “Libauskij”,Das Schiff wurde in
den 1930er Jahren gebaut und in den 2000er Jahren als Museumsschiff als”
S2 Summary: :“Das Feuerschiff Relandersgrund war ein Feuerschiff des das von 1888 bis 1914 im Einsatz war.Heute dient es
als Restaurantschiff in Kotka,”</p>
      <p>S3 Summary: :“Die Relandersgrund ist ein 18861888 Schiff der russischen Marine, das furl eine und Wracks gebaut worden ist.”
42.5
150000</p>
      <p>Iteration
0
50000
100000
200000
250000</p>
      <p>
        300000
In this paper, we highlighted the implementation
of synthetic data for the abstract text
summarization task under low resource condition, which
helps improving text summarization system in
terms of automatic evaluation metrics. As the next
step, we plan to investigate : i) synthetic
summarization data, and ii) applying transfer learning
on text summarization for the multilingual low
resource data set with little or no ground truth
summaries
        <xref ref-type="bibr" rid="ref5">(Keneshloo et al., 2018)</xref>
        .
      </p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>The work is supported by a joint research project
(under an InnoSuisse grant) oriented to improve
the automatic speech recognition and natural
language understanding technologies for German.
Title: SM2: Extracting Semantic Meaning from</p>
      <sec id="sec-4-1">
        <title>Spoken Material.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Mehdi</given-names>
            <surname>Allahyari</surname>
          </string-name>
          , Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei,
          <string-name>
            <surname>Elizabeth D Trippe</surname>
            , Juan B Gutierrez, and
            <given-names>Krys</given-names>
          </string-name>
          <string-name>
            <surname>Kochut</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Text summarization techniques: a brief survey</article-title>
          .
          <source>arXiv preprint arXiv:1707</source>
          .
          <fpage>02268</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Ondrej</given-names>
            <surname>Bojar and Alesˇ Tamchyna</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Improving translation model by monolingual data</article-title>
          .
          <source>In Sixth Workshop on Statistical Machine Translation. page 330.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Mara</given-names>
            <surname>Chinea-Rıos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Alvaro</given-names>
            <surname>Peris</surname>
          </string-name>
          , and Francisco Casacuberta.
          <year>2017</year>
          .
          <article-title>Adapting neural machine translation with parallel synthetic data</article-title>
          .
          <source>WMT 2017 page 138.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Cong</given-names>
            <surname>Duy Vu Hoang</surname>
          </string-name>
          , Philipp Koehn, Gholamreza Haffari, and
          <string-name>
            <given-names>Trevor</given-names>
            <surname>Cohn</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Iterative backtranslation for neural machine translation</article-title>
          .
          <source>ACL</source>
          <year>2018</year>
          23(
          <issue>32</issue>
          .5):
          <fpage>18</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Yaser</given-names>
            <surname>Keneshloo</surname>
          </string-name>
          , Naren Ramakrishnan, and
          <article-title>Chandan</article-title>
          K Reddy.
          <year>2018</year>
          .
          <article-title>Deep transfer reinforcement learning for text summarization</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .06667 .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Wojciech</given-names>
            <surname>Krys</surname>
          </string-name>
          <article-title>´cin´ski, Romain Paulus</article-title>
          , Caiming Xiong, and Richard Socher.
          <year>2018</year>
          .
          <article-title>Improving abstraction in text summarization</article-title>
          .
          <source>In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          . pages
          <fpage>1808</fpage>
          -
          <lpage>1817</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Kemal</given-names>
            <surname>Kurniawan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Samuel</given-names>
            <surname>Louvan</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Indosum: A new benchmark dataset for indonesian text summarization</article-title>
          .
          <source>In 2018 International Conference on Asian Language Processing (IALP)</source>
          . IEEE, pages
          <fpage>215</fpage>
          -
          <lpage>220</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>C-Y LIN</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Rouge: A package for automatic evaluation of summaries</article-title>
          .
          <source>In Proc. of Workshop on Text Summarization Branches Out, Post Conference Workshop of ACL</source>
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Ramesh</given-names>
            <surname>Nallapati</surname>
          </string-name>
          , Bowen Zhou, Caglar Gulcehre,
          <string-name>
            <given-names>Bing</given-names>
            <surname>Xiang</surname>
          </string-name>
          , et al.
          <year>2016</year>
          .
          <article-title>Abstractive text summarization using sequence-to-sequence rnns and beyond</article-title>
          .
          <source>arXiv preprint arXiv:1602</source>
          .
          <fpage>06023</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Romain</given-names>
            <surname>Paulus</surname>
          </string-name>
          , Caiming Xiong, and Richard Socher.
          <year>2017</year>
          .
          <article-title>A deep reinforced model for abstractive summarization</article-title>
          .
          <source>arXiv preprint arXiv:1705</source>
          .
          <fpage>04304</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Matt</given-names>
            <surname>Post</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A call for clarity in reporting BLEU scores</article-title>
          .
          <source>In Proceedings of the Third Conference on Machine Translation: Research Papers. Association for Computational Linguistics</source>
          , pages
          <fpage>186</fpage>
          -
          <lpage>191</lpage>
          . http://aclweb.org/anthology/W18-6319.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Alexander M Rush</surname>
            ,
            <given-names>Sumit</given-names>
          </string-name>
          <string-name>
            <surname>Chopra</surname>
            , and
            <given-names>Jason</given-names>
          </string-name>
          <string-name>
            <surname>Weston</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A neural attention model for abstractive sentence summarization</article-title>
          .
          <source>In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          . pages
          <fpage>379</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Abigail</given-names>
            <surname>See</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter J Liu</surname>
            , and
            <given-names>Christopher D</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Get to the point: Summarization with pointer-generator networks</article-title>
          .
          <source>arXiv preprint arXiv:1704</source>
          .
          <fpage>04368</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Rico</given-names>
            <surname>Sennrich</surname>
          </string-name>
          , Barry Haddow, and
          <string-name>
            <given-names>Alexandra</given-names>
            <surname>Birch</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Improving neural machine translation models with monolingual data</article-title>
          .
          <source>arXiv preprint arXiv:1511</source>
          .
          <fpage>06709</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Vaswani</surname>
          </string-name>
          , Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner,
          <string-name>
            <given-names>Niki</given-names>
            <surname>Parmar</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>Tensor2tensor for neural machine translation</article-title>
          .
          <source>arXiv preprint arXiv:1803</source>
          .07416 .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>