=Paper= {{Paper |id=Vol-3604/paper6 |storemode=property |title=Benchmarking Natural Language Processing Algorithms for Patent Summarization |pdfUrl=https://ceur-ws.org/Vol-3604/paper6.pdf |volume=Vol-3604 |authors=Silvia Casola,Alberto Lavelli |dblpUrl=https://dblp.org/rec/conf/patentsemtech/CasolaL23 }} ==Benchmarking Natural Language Processing Algorithms for Patent Summarization== https://ceur-ws.org/Vol-3604/paper6.pdf
                                Benchmarking Natural Language Processing Algorithms for
                                Patent Summarization
                                Silvia Casola1,* , Alberto Lavelli1
                                1
                                    University of Padua, Fondazione Bruno Kessler


                                                                       Abstract
                                                                       The number of patent applications is enormous, and patent documents are long and complex. Methods for automatically
                                                                       obtaining the most salient information in a short text would thus be useful for patent professionals and other practitioners.
                                                                       However, patent summarization is currently under-researched; moreover, the proposed methods are difficult to compare
                                                                       directly as they are generally tested on different datasets. In this paper, we benchmark several extractive, abstractive, and
                                                                       hybrid summarization methods on the BigPatent dataset, compare automatic metrics and show qualitative insights.

                                                                       Keywords
                                                                       Summarization, Patents, Natural language processing,, Natural language generation



                                1. Introduction                                                                                  proaches in the patent domain, specifically on the Big-
                                                                                                                                 Patent [4] dataset. The dataset is popular in the NLP com-
                                Patents protect inventions that their holders consider                                           munity, as patents present several challenges in terms of
                                important enough to take legal action to obtain the                                              abstractivity, length, and language, among others; more-
                                monopoly in using, making, and selling them — and thus,                                          over, while not exempt from design issues, it is also one
                                profit from their wit. Thus, they help in valuing intellec-                                      of the few patent benchmarks that allow for a direct
                                tual work. At the same time, inventors must disclose the                                         comparison between approaches. We evaluate extrac-
                                invention and its characteristics in detail to file a patent                                     tive, abstractive, and hybrid methods; we also explore
                                application: thus, patents are intended to benefit society                                       transferring summarization methods from the scientific
                                and help new knowledge spread — correcting the ten-                                              paper domain [5] with limited success. For each method,
                                dency to keep valuable technical details secret. Patents,                                        we discuss strengths and limitations, provide standard
                                however, are difficult to process: the number of patent                                          summarization metrics and qualitative insights.
                                applications is enormous and patent documents are long
                                and hard to read, rich in technical and legal language.
                                   To this end, tools that automatically extract or gener-                                       2. Previous work
                                ate summaries from patent documents can be particularly
                                valuable in helping patent agents, R&D groups, and other                                         2.1. Automatic text summarization
                                professionals; using summaries instead of the whole doc-
                                                                                                                                      Methods for text summarization are generally classified
                                ument can also improve the performance of automatic
                                                                                                                                      into extractive, abstractive, and hybrid ones.
                                processes, as shown in other domains [1, 2].
                                                                                                                                         In extractive text summarization, a subset of sentences
                                   In the general domain, summarization tools and
                                                                                                                                      from the source document is chosen as the most represen-
                                methodologies have shown promising results; applica-
                                                                                                                                      tative, and the final summary is a simple concatenation of
                                tions to the patent domain are, however, still relatively
                                                                                                                                      such sentences. Methods can be graph-based [6, 7, 8], rely
                                limited. Moreover, while previous work has explored
                                                                                                                                      on token frequency [9], or on learned intrinsic features
                                methods for automatically generating patent summaries,
                                                                                                                                      [10, 11, 12].
                                these methods are hard to compare, as no generally ac-
                                                                                                                                         In contrast, abstractive text summarization aims at
                                cepted benchmarks exist; thus, conclusions on the pros
                                                                                                                                      generating a new piece of text based on the source, sim-
                                and cons of each approach are hard to make. Even the
                                                                                                                                      ilar to what a person would do, and can contain novel
                                most recent abstractive dataset presents important limi-
                                                                                                                                      vocabulary or expressions. Sequence-to-sequence mod-
                                tations and issues that might make direct comparisons
                                                                                                                                      els [13, 14, 15, 16, 17] are popular for this task, with
                                meaningless [3].
                                                                                                                                      transformer-based ones being particularly performative
                                   To partially fill this gap, we benchmark existing ap-
                                                                                                                                      [18, 19, 20, 21]. Finally, hybrid methods try to fuse both
                                PatentSemTech’23: 4th Workshop on Patent Text Mining and Semantic approaches, for example, by extracting and rewriting
                                Technologies, July 27th, 2023, Taipei, Taiwan                                                         sentences [22].
                                *                                                                                                        Extractive models are generally simpler than abstrac-
                                  Corresponding author.
                                $ scasola@fbk.eu (S. Casola); lavelli@fbk.eu (A. Lavelli)                                             tive ones and require fewer computational resources and
                                         © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                         Attribution 4.0 International (CC BY 4.0).
                                    CEUR

                                         CEUR Workshop Proceedings (CEUR-WS.org)
                                    Workshop
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                                                                                      data; however, summaries have to contain complete sen-
                                    Proceedings




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                                             1
tences from the source, which often contain both central                             # docs                  258,935
                                                                                        # tokens (avg)        121.0
and peripheral information. Moreover, the final sum-
                                                                            Summary      # sents (avg)         3.6
mary is a simple concatenation of sentences, with pend-
                                                                                         sent len (avg)        43.4
ing references and no discourse structure. Abstractive                                  # tokens (avg)       4,893.6
summaries are more similar to those written by humans.                       Source      # sents (avg)        161.2
Information can be easily condensed and the generated                                  sent length (avg)       31.3
text is much more natural and easier to read. However,                           compression ratio            45.8
abstractive models might produce non-factual informa-
tion, i.e. include statements that are not in the source or        Table 1
                                                                   Length statistics on the BigPatent/G dataset. The number of
that directly contradict them. See, e.g., [23] for a compre-
                                                                   tokens, sentences, tokens per sentence, and the compression
hensive survey of summarization techniques.                        ratio are computed per document and then averaged. The
                                                                   compression ratio is the ratio between the number of tokens
2.2. Patent summarization                                          in the source and the number of tokens in the Abstract.

Many traditional approaches for patent summarization
have been extractive. The document is often segmented              tains the full Description with all its subsections, in the
into sentences or fragments [24] and preprocessed (e.g.,           original casing. We will use this version in this paper.
to keep specific parts of speech only [25, 26]); features          However, we notice two main limitations. First, patents
can then be extracted. General-domain ones include key-            lack section headers due to the performed preprocessing;
words [27], title words, cue words, and position. An               thus, any structural information is lost. Second, the input
anthology for technical terms might also be used [26, 28].         often contains the author’s Summary of the Invention,
Domain-specific approaches [24] are often linguistically-          which significantly simplifies the task.
motivated. Once extracted, features are used to score the             To solve both issues, we download the raw data and (i)
sentence relevance in the summary either heuristically             apply all original preprocessing steps, excluding remov-
[27] or in a data-driven way [24, 29, 30, 31]. Alternative         ing the subsection headers and newlines, and (ii) remove
approaches use the patent discourse structure, which               the Summary of the Invention section by heuristically
they prune [32].                                                   matching headers. Table 1 contains some metrics on our
   Recently, [4] introduced the BigPatent dataset, whose           version of the dataset.
associated task is that of summarizing the patent’s De-
tailed Description into its Abstract. As authors show,
patents’ Abstracts are highly abstractive — with relevant          4. Evaluation Protocol
content spread throughout the input — and have many
novel n-grams. The dataset has been used as a testbed        Evaluating patent summarization results is challenging.
for general-purpose systems [19, 33, 34, 21, 35], given         On the one hand, automatic text simplification outputs
the high abstractivity of its targets and the length of its  (and Natural Language Generation outputs in general)
inputs.                                                      are difficult to evaluate automatically, and the problem
   For an overview of patent summarization approaches,       is considered open [37, 38].
see [36].                                                       While automatic metrics such as ROUGE [39] exist,
                                                             they have known limitations. In the patent domain in
                                                             particular, some previous work [40, 31] has anecdotally
3. Dataset                                                   questioned the metric validity (and its correlation to ex-
                                                             pert’s opinion and practical utility), even if no quantita-
We use the G (Physics) subsection of the BigPatent dataset tive studies in the patent domain have been performed,
[4].                                                         to the best of our knowledge. More complex metrics, e.g.,
   The dataset is associated with the task of generating model-based methods [41, 42], should be fine-tuned with
the patent Abstract from its Description. We are aware domain-specific data.
of the practical imitations of this setting, as the Abstract    On the other hans, human evaluation is not easier.
contains superficial and general information, but still In fact, it is particularly hard in the patent domain for
consider experimenting on the dataset useful given its two main reasons: a) the best way to evaluate a summa-
popularity in the Natural Language Processing commu- rization output is to read the whole source document.
nity.                                                        However, patents are extremely long and hard to read; b)
   The dataset exists in two versions [3]. The original ver- patent documents and Abstracts are extremely complex
sion text is uncased and tokenized, and its input typically and should be evaluated by legal and technical experts,
contains the Detailed Description only (i.e., a subsection but hiring such experts is very expensive and unpractical
of the Description section). The alternative version con- in most scenarios.




                                                               2
  Aware of these limitations, we will use two main eval-                   Set       #T.     ROUGE-1        ROUGE-2        ROUGE-L
                                                                           Val       50        28.20           8.52          18.08
uation methods:
                                                                           Val       100       37.06          11.40          21.99
        • Automatic evaluation: we will select hyper-                      Val       150      38.60          12.33          22.33
          parameters and automatically evaluate outputs                    Val       250       35.39          12.27          20.69
                                                                           Val       500       25.74          10.37          16.11
          using ROUGE [39]. We also experimented with
                                                                           Val      1000       16.22           7.65          11.00
          factuality-related metrics, e.g., QAEval [41]; how-              Test      150       38.59          12.30          22.33
          ever, they do not seem to adapt well to the patent
          domain and should be fine-tuned.                          Table 2
        • Qualitative evaluation: we report a preliminary           Results using TextRank. We selected the number of extracted
          qualitative evaluation of a subset of candidate           tokens on the validation test and run the most promising
                                                                    model on the test set.
          summaries. We will consider the patent fluency,
          consistency, and similarity to the Abstract.
                                                                            Set      #S     ROUGE-1        ROUGE-2        ROUGE-L
                                                                            Val       1       26.03           8.12          17.40
5. Extractive methods                                                       Val       2       34.72          10.93          21.14
                                                                            Val       3       37.48          12.02         21.89
                                                                            Val       4      37.76           12.40          21.71
5.1. Graph-based systems                                                    Val       5       36.92         12.46           21.16
The core idea of graph-based methods is to represent the                    Val       6       35.62          12.36          20.48
original document as a graph having sentences as nodes                      Test      4       37.76          12.46          21.76
and their similarity as edges, and then extract the most            Table 3
central sentences only.                                             Results using LexRank. We selected the number of extracted
                                                                    sentences on the validation test and run the most promising
5.1.1. TextRank [7]                                                 model on the test set.

TextRank uses the number of shared words among two
sentences, normalized by the length of the sentences as             Automatic evaluation ROUGE scores are shown in
its similarity metrics. Edges in the complete graph are             Table 2 and 3. As expected, performance is similar for the
then pruned using a threshold, and the most central sen-            two systems, with TextRank being marginally superior.
tences according to PageRank [43] are extracted. We used            Unsurprisingly, the best-performing systems are those
the summa1 implementation. In this implementation, the              that select a number of tokens or sentences similar to
user chooses the target summary length in terms of to-              that of the gold standard.
kens, and the number of sentences that best approximate
that number is extracted. We cross-validated the number
                                                          Qualitative assessment The outputs obtained using
of tokens and left any other parameters at their default
                                                          the two algorithms are relatively similar. We notice that
values. Some sample outputs are in Table ??.
                                                          the sentence tokenization is not always perfect: for exam-
                                                          ple, the extracted summary of patent US-2005152022-A1
5.1.2. LexRank [6]                                        contains the sentence "The mixed color display [...] by the
LexRank is similar in nature to TextRank, but it uses type of processes described in the aforementioned U.S. Pat.
the cosine similarity of their Term Frequency–Inverse No.", where the patent number has been incorrectly con-
Document Frequency (TF-IDF) representation as its simi- sidered as a stand-alone sentence. This is in accordance
larity metrics. We used the sumy implementation2 . We with previous work [44, 45], which showed that general-
validated the number of extracted sentences per patent domain Natural Language Processing resources tend to
and left any other parameters at their default value. The have suboptimal performance in the patent domain and
algorithms are unsupervised and can easily be used even should be adapted.
for very long documents with no modifications. We also       Moreover, sentences naturally contain references to
                                                                                             3
tried to perform experiments with PacSum [8] but found other parts of the original text e.g., "as described be-
the algorithm extremely computationally demanding in      low"  in US.2005152011-A1    or  "according to claim 1" in
our use case.                                             US-9478115-B2.
                                                             We also notice that all the extracted sentences tend to
                                                          be extremely long and naturally contain core and periph-
                                                          eral information (e.g., included in parenthesis). These

1                                                                   3
    https://summanlp.github.io/textrank/                                Sentences also tend to contain numerical references to the figures,
2
    https://github.com/miso-belica/sumy                                 which are lost.




                                                                3
    Set     #S     ROUGE-1     ROUGE-2       ROUGE-L                       Set       ROUGE-1       ROUGE-2      ROUGE-L
    Val      1       20.09       4.38          13.54                   Validation      41.70         17.52        28.38
    Val      2       28.51       6.48          17.15                      Test         41.53         17.25        28.18
    Val      3       32.37       7.70          18.43
    Val      4       33.93       8.38         18.80                Table 5
    Val      5      34.28        8.78          18.70               RBART results on the validation and test sets.
    Val      6       34.00       9.02          18.43
    Val      7       33.30       9.14          18.05
    Val      8       32.44       9.20          17.63          Automatic evaluation Table 4 shows the ROUGE
    Test     5       34.26       8.72          18.66          scores. LSA tends to perform worse than the graph-based
Table 4
                                                              algorithms. In contrast to the graph-based methods, it
Result using LSA. We selected the number of extracted sen- tends to work best when extracting several short sen-
tences (#S) on the validation test and run the most promising tences.
model on the test set.
                                                                   Qualitative assessment Even with the known limi-
                                                                   tations of extractive systems (references, structure, sen-
are known limitations of naive extractive models and are           tences needing compression, etc.), some reasonable con-
very common problems of our extracted summaries. Ex-               tent selection is performed. For example, they often ex-
tracted sentences do not seem too similar to each other,           tract the sentence that describes the invention’s nature,
which is sometimes described as a limitation of graph-             as in “The present invention is based on the object to pro-
based systems.                                                     vide an operator system for a machine, which is ergonomic
   Even with their limitations, the algorithms seem to per-        with regard to the handling thereof and offers sufficient
form reasonable content selection (with TextRank being             work protection." for US-9478115-B2 or “The present in-
superior to LexRank also from a qualitative perspective);          vention relates to computer security and, more particularly,
when compared to their references, the extracted sum-              to an efficient method of screening untrusted digital files."
maries often contain most of their core elements and, in           for US-9208317-B2. Sentences are generally shorter than
many cases, are very similar to the reference in terms             those extracted by graph-based systems.
of content. This is evident in some specific cases (e.g.,             [31] noticed LSA showed a better quality when com-
patent US-9478115-B2 and US-2003016244-A1) and is in-              pared to TextRank in the generation of patent titles. Our
teresting, considering the algorithm is unsupervised.              results do not confirm this finding for Abstract gener-
   If we assume the final target of the extracted sum-             ation from the Description as measured automatically;
maries is human readers, the lack of discourse structure           qualitatively, the results are relatively different and might
and the length of the extracted sentences might make               be used for different purposes.
the outputs too hard to understand. It might, however,
be possible to use the outputs in an ad hoc interface, e.g.,
where core sentences are highlighted.                              6. Abstractive methods
                                                             We use BART [18], a sequence-to-sequence system, as
5.2. Latent Semantic Analysis                                a baseline for abstractive summarization. We fine-tune
                                                             a BART-base model (∼ 140 million parameters) on the
Latent Semantic Analysis [46] aims at exploiting the la-
                                                             BigPatent/G datasets. We train using the Hugging Face
tent semantic structure of the document and extracts
                                                             library with early stopping on the evaluation loss (pa-
sentences that best represent the most important latent
                                                             tience: 5) and the following hyperparameters: max target
topics. The algorithm decomposes the term-sentence ma-
                                                             length: 250; number of beams: 5; evaluation steps: 10k;
trix constructed from the source document using SVD
                                                             max steps: 500M. We leave all other parameters at their
[47]. The 𝑡 × 𝑠 terms-by-sentence matrix 𝐴 is thus de-
                           𝑇                                 default values. Some sample outputs are in Table ??.
composed as 𝐴 = 𝑈 Σ𝑉 . Thus, the original matrix
is decomposed into a matrix of term distributions over
latent topics, a diagonal matrix of topic importance (the Automatic evaluation Table 5 shows the results in
singular values), and a matrix of topic distributions across terms of ROUGE. As expected, the results improve over
sentences. For each of the 𝐾 most salient latent topics all extractive systems, with an increase of almost 5
(i.e., those corresponding to the largest singular values), ROUGE-2 points over the best extractive system.
the sentence with the largest index value is included in
the summary [48, 10]. We use the sumy implementation,              Qualitative assessment Qualitatively, we notice that
validate the number of sentences, and leave all other pa-          summaries are generally grammatical, with very rare lo-
rameters at their default values. Some sample outputs              cal problems. Text is coherent and much easier to read
are in Table ??.                                                   and understand than those composed through extracted




                                                               4
                      BART     Hybrid    Gold standard                Set      #T      ROUGE-1       ROUGE-2      ROUGE-L
   Coverage (avg)     95.75     96.12        90.68                    Val     1000      42.79         17.92        28.79
   Density (avg)      11.84     8.83          3.82                    Val      500       41.54         16.74        27.88
                                                                      Val      250       40.33         15.60        27.01
Table 6                                                               Test    1000       42.47         17.74        28.59
Extractivity metrics on the summaries generated by the fine-
tuned BART and the select and rephrase models. We also             Table 7
report the corresponding metrics on the gold-standard sum-         Result using the previously described hybrid approach. We
maries for comparison. The metrics are computed per docu-          selected the number of extracted tokens (#T) on the validation
ment and then averaged.                                            test and run the most promising model on the test set.



sentences. In all cases, summaries seem adequate and                  Thus, in this section, we explore a hybrid approach. We
convey the main points of their gold standard counter-             first select important sentences using an unsupervised
parts.                                                             graph-based algorithm and then rewrite the content us-
   However, we noticed that the generated summaries                ing an abstractive system. Specifically, we use TextRank
are largely extractive, with no or few modifications to            as it performed best among the considered extractive
sentences in the source. In the following example, the             models. We considered three extracted lengths: 1000,
extractive fragments in the summary generated for patent           500, and 250 tokens. Then, we train a BART system to
US-2005152022-A1 and its source (Background of the                 rephrase the selected sentences to generate the target
Invention subsection) are underlined.                              summary: we use the selected sentences as the input and
   More specifically, in one aspect this invention relates         the original gold standard as the target and fine-tuned
to electro-optic displays with simplified backplanes, and          the model. Some sample outputs are in Table ??.
methods for driving such displays. In another aspect, this
invention relates to electro-optic displays in which mul-          Automatic evaluation Table 7 reports the ROUGE
tiple types of electro-optic units are used to improve the         scores. Extracting 1000 tokens through TextRank and
colors available from the displays. The present invention          then rephrasing the summary using BART results in the
is especially, though not exclusively, intended for use in         highest ROUGE, surpassing the vanilla BART approach
electrophoretic displays.                                          on all metrics. The obtained metrics are the highest
   While some deletion is performed, most text is directly         among all the extractive and abstractive models we con-
extracted from the source. To quantify how extractive              sidered.
the generated summaries are with respect to the source,               Note that, even for the approaches where a smaller
we compute the coverage and the density of the gen-                number of tokens is extracted, relatively good perfor-
erated summaries, following [49], which we report in               mances are obtained. Extracting 500 tokens results in
Table 6. The extractive fragment coverage measures the             scores only marginally worse than those obtained by a
proportion of tokens in the summary that is part of an             BART model fed with the first 1024 subtokens. While re-
extractive fragment; it roughly measures how much a                sults obtained by extracting 250 tokens only score worse
summary vocabulary is derivative of a text. The den-               in terms of ROUGE, the rewriting component is crucial.
sity also takes into account the length of the extractive          In fact, an improvement of 5 ROUGE-1, 3.3 ROUGE-2, and
fragments: the higher the density, the better a summary            5.3 ROUGE-L points is observed over the results obtained
can be described as a series of extractions. We notice             using TextRank only.
that the generated summaries tend to have much longer
abstractive fragments with respect to the gold standard.           Qualitative assessment The outputs obtained with
                                                                   this approach are fluent, and relatively similar to those
7. Hybrid methods                                                  obtained through the vanilla BART. The coverage and
                                                                   density (Table 6) also show a marginally lower extractiv-
7.1. Extractive to abstractive: select and                         ity of the generated summaries.
     rephrase
                                                                   7.2. DANCER
Results in the previous sections show graph-based extrac-
tive methods tend to be able to select central content but         An alternative approach to deal with high document
lack any discourse structure. Using BART solved some               length is to exploit the document structure. To sum-
of these issues, but the model can only summarize the              marize scientific documents, for example, [5], proposed
first part of the patent document, as its input length is          to deal with different sections independently; however,
limited to 1024 subtokens.                                         no experiments were performed in the patent domain.




                                                               5
Here, we explore if adapting this method to the patent                                                     #Tokens     % patents
                                                                                        Field                 73.73       38.27%
domain can be useful.
                                                                                        Background           710.04       94.85%
   Specifically, we perform the steps described in the
                                                                                        Drawings             243.43       97.60%
following.                                                                              Embodiments         3168.25       53.07%
                                                                                        References            92.10       28.18%
      • Dividing and normalizing subsections: To divide
                                                                                        Related Art          644.27        4.12%
        the Description text into subsections, we use sim-                              Objective            256.95        2.09%
        ple regular expressions, exploiting the fact that                               Detailed Descr.     3404.91       55.23%
        section headers lines include fully cased tokens
        only. Patent headers can follow different conven-                   Table 8
        tions4 . Thus, we normalize the headers through                     Average length of each subsection type and percentage of
                                                                            patents that contain the subsection.
        a simple keyword-matching algorithm into nine
        classes. The classes are shown in Table 8. Subsec-
        tions that did not match with any of the keywords
        were left in a default category and ignored.                                   all subsections appear in all patents. We thus con-
      • Alignment between abstract sentences and sub-                                  sider several strategies for subsection selection:
        sections: Following [5], we use ROUGE-L [39]                                      (i) Pre-selection: We heuristically pre-select
        to align sentences in the abstract to patent sub-                                     subsections based on their role6 and fed
        sections. Specifically, for each sentence in the                                      them to the trained model in their original
        Abstract, we compute its ROUGE-L recall with all                                      order. We then concatenated the results.
        individual paragraphs in all subsections; we then                                (ii) Generate from M subsections: We retrieve
        align the sentence with the subsection containing                                     all subsections in the patent and sort them
        the paragraph with the maximum score5 . Figure                                        according to how likely they are to be
        3 shows the percentage of subsections that, when                                      aligned in the whole dataset (Figure 3). We
        present, align with at least one sentence in the                                      generate from the first M most commonly
        patent’s Abstract.                                                                    aligned subsection, where M goes from 1
      • Using paired elements as training data: Fol-                                          to the total number of subsections in the
        lowing the previous steps, each Abstract sen-                                         patent. The final summary is a concatena-
        tence is aligned with a Description subsection.                                       tion of the generated sentences.
        Thus, for each (Description, Abstract)𝑖 pair, we                                (iii) Generate from all subsections in the patent:
        created 𝑁 (Subsection, Abstract sentence(s))𝑖𝑛                                        we use all subsections in their original or-
        pairs, where 𝑁 is the number of unique subsec-                                        der and concatenate the results.
        tions that are aligned with at least one sentence in
        the Abstract. If multiple sentences align with the                          • Second abstractive step: The final abstract ob-
        same patent subsection, the target contains all the                           tained as a concatenation of sentences lacks any
        aligned sentences in their original order. We then                            discourse structure and might not be coherent;
        trained a BART-base model [18] using the subsec-                              in particular, we notice that it often contains
        tion as input and the aligned sentence(s) as target;                          repeated information. Thus, we explore if per-
        we set the maximum generated length to 250, the                               forming a second abstractive step can improve
        number of beams to 5, and left all other hyper-                               performance. To this end, we train a second
        parameters at their default values. We trained                                BART model that, given the output of the pre-
        with early stopping on the validation set. Table 9                            vious step (i.e., the summary as a concatenation
        reports the metrics obtained by the model on the                              of sentences), is trained to paraphrase it to be
        sentence generation step. We also experimented                                more similar to the target Abstract.
        with prepending the subsection type (as a special
                                                                            Some sample outputs (before performing the second ab-
        token) to its text but with no improvement.
                                                                            stractive step) are in Table ??.
      • Inference: At inference, we obtain the final sum-
        mary by concatenating the sentences generated
                                                                            Automatic evaluation Table 10 reports the results on
        from the individual subsections. Patent structure
                                                                            the validation set. We report results obtained by gener-
        is less coherent than that of papers; in fact, not
                                                                            ating from pre-selection, using the best-aligned section
4
  For example, subsections with similar content can be named Fields,        only (as a baseline), the best result with a varying number
  Field, Field Of The Invention, etc.                                       of sections (and Figure 2 shows ROUGE-L as a function of
5
  We retrieve the subsection containing most of the sentence content,
                                                                            6
  regardless of any possible additional text (that the summarization            We selected the subsections of type FIELD, BACKGROUND, EM-
  model will learn to filter out).                                              BODIMENTS, OBJECTIVE, DESCRIPTION




                                                                        6
[ name=plotLeft, scale=0.45, tick label style=font=, ylabel near                   [ height=5cm, width=7cm, ybar interval, ymin=0,
   ticks, xlabel style=yshift=2.2ex, xticklabel style=rotate=90,               ymax=150000, xmin=0.5,xmax=7.5, minor y tick num = 1,
      title=Train aligned sections, title style=yshift=-1.5ex,,                xlabel=distinct subsections, ylabel=Abs. # of Abstracts, ]
       symbolic x coords=BACKGROUND, DESCRIPTION,                              +[ybar interval, mark=no, draw=black, fill = white] plot
       DRAWINGS, EMBODIMENTS, FIELD, OBJECTIVE,                              coordinates (1, 130664) (2, 113373) (3, 13537) (4, 1196) (5, 138)
        REFERENCES, RELATED ART, xtick=data, ] [ybar]                                               (6, 24) (7, 3) ;
 coordinates (BACKGROUND, 48.66) (DESCRIPTION, 87.02)
                                                                                     Figure 3: Number of unique subsections types to
  (DRAWINGS, 1.76) (EMBODIMENTS, 83.49) (FIELD, 22.51)
                                                                                     which the Abstract aligns.
    (OBJECTIVE, 2.14) (REFERENCES, 2.02) (RELATED ART,
    33.65) ; [ at=(plotLeft.right of south east), anchor=left of
    south west, scale=0.45, tick label style=font=, ylabel near
   ticks, xlabel style=yshift=2.2ex, xticklabel style=rotate=90,             subsections are very similar and describe what the in-
title=Val aligned sections, title style=yshift=-1.5ex,, symbolic             vention is and its goal. While the second abstractive step
    x coords=BACKGROUND, DESCRIPTION, DRAWINGS,                              helps limit repetition, the resulting output is often short
      EMBODIMENTS, FIELD, OBJECTIVE, REFERENCES,
                                                                             and contains too little information compared to the gold
         RELATED ART, xtick=data, ] [ybar] coordinates
(BACKGROUND, 49.31) (DESCRIPTION, 86.67) (DRAWINGS,                          standard. We noticed a number of issues that could make
  1.86) (EMBODIMENTS, 83.59) (FIELD, 23.04) (OBJECTIVE,                      the transfer from the scientific publications to the patent
      29.21) (REFERENCES, 1.79) (RELATED ART, 34.74) ;                       domain unsuccessful:
Figure 1: Percentage of subsections that, when present, are                       • Less predictable structure and session headers:
aligned to at least one sentence in the Abstract in the train                       Scientific papers have a very coherent structure
(left) and validation (right) sets.                                                 as they tend to roughly follow a fixed schema
                                                                                    (e.g., Introduction, Previous Work, Method, Con-
           Model                  R1        R2        RL                            clusions), with each section having a clear fixed
           BART                  35.00     15.74     26.63                          role. While, on a superficial level, patent docu-
           BART(+ subs. type)    33.28     14.81     25.66                          ments have a similar structure with sections and
Table 9                                                                             subsections, they are less coherent. As Table 8
Model trained on generating the Abstract sentence(s) given                          shows, the subsections of the Description tend to
the subsection. We also experimented with prepending the                            vary. Moreover, the role of each subsection is less
subsection text with its type.                                                      determined.
                                                                                  • Less compositional Abstracts: An analysis of the
      Model                           R1        R2        RL                        Abstracts’ compositionality shows that many of
      DANCER (preselection)          38.73     16.03     25.63                      the sentences in the Abstract align with the same
      DANCER (best aligned, M=1)     27.39     10.64     19.83                      patent subsections. Figure 3 represents the num-
      DANCER (best M, M=3)           40.70     16.45     25.08                      ber of unique sentences to which each Abstract
      DANCER (all)                   40.68     16.38     25.90                      aligns. Note most patent Abstracts only align to
      DANCER + abstractive           38.88     15.89     26.99                      one or two different subsections. Moreover, a
Table 10                                                                            qualitative analysis of the Abstract shows that
Results on the validation set.                                                      while paper abstracts tend to follow a fixed struc-
                                                                                    ture (first describing the background, then the
       [ xlabel= Number of sections, ylabel= ROUGE-L,                               goal and methods, then the results and conclu-
width=7.7cm,height=7cm] [color=black,mark=x] coordinates                            sions), patent Abstracts seem to lack the compo-
 (1, 17.95) (2, 22.46) (3, 24.42) (4, 24.40) (5, 24.39) (6, 24.39) (7,              sitional nature of scientific papers. The lack of a
                               24.39) ;                                             fixed flow in the Abstract might also explain the
Figure 2: ROUGE-L results as a function of the number of                            relatively low results obtained by the abstractive
subsections used for the generation.                                                model when generating the Abstract sentence(s)
                                                                                    from the original subsections. As the alignment
                                                                                    is more random, finding a pattern and correctly
                                                                                    generating the aligned sentences is more chal-
the number of summarized subsections), and the result
                                                                                    lenging.
obtained by summarizing all sections. We also report
the results after the second abstractive step. Note that
none of the configurations surpasses the simple BART                         8. Conclusions
baseline.
                                                        In this paper, we have benchmarked several extractive, ab-
Qualitative analysis Inspecting the outputs, we no- stractive, and hybrid methods on the BigPatent/G dataset.
ticed that many of the sentences generated from various




                                                                         7
   Among extractive systems, we found that graph-based Acknowledgments
ones seem appropriate for content selection and perform
relatively well in metrics and outputs. However, the We acknowledge the support of the PNRR project FAIR —
extracted outputs are subject to all the limitations of Future AI Research (PE00000013), under the NRRP MUR
extractive summarization, with dangling references being program funded by the NextGenerationEU
particularly common. The length of the sentences, the
dangling references, and the lack of discourse structure
make the outputs challenging to process for humans and
                                                               References
possibly machines.                                             [1] I. Mani, D. House, G. Klein, L. Hirschman, T. Firmin,
   Among the abstractive approaches, we have analyzed              B. Sundheim, The TIPSTER SUMMAC text summa-
BART and have found that it performs best in automatic             rization evaluation, in: Ninth Conference of the
metrics compared to extractive algorithms. We have also            European Chapter of the Association for Computa-
found that the produced outputs are, in fact, not very             tional Linguistics, Association for Computational
abstractive with respect to the input, with long chunks            Linguistics, Bergen, Norway, 1999, pp. 77–85. URL:
of texts identical to input passages; the model seems,             https://aclanthology.org/E99-1011.
however, very good in removing non-central content             [2] T. Sakai, K. Sparck-Jones, Generic summaries for
from the single sentences, which extractive systems are            indexing in information retrieval, in: Proceed-
natively unable to do. In future work, we plan to explore          ings of the 24th Annual International ACM sim-
more powerful abstractive models, including those in the           plificationR Conference on Research and Develop-
GPT family [50, 51, 52].                                           ment in Information Retrieval, SIGIR ’01, Associ-
   We have considered a simple select-and-rewrite ap-              ation for Computing Machinery, New York, NY,
proach, which obtained the best automatic metrics. We              USA, 2001, p. 190–198. URL: https://doi.org/10.1145/
have also tried to adapt DANCER, initially designed for            383952.383987. doi:10.1145/383952.383987.
scientific articles, to the patent domain. However, we         [3] S. Casola, A. Lavelli, H. Saggion, What’s in a
have found that patents are more variable in the sections          (dataset’s) name? The case of BigPatent, in: Pro-
they contain and in the sections’ content itself, and their        ceedings of the 2nd Workshop on Natural Language
Abstracts tend to be less compositional than those of pa-          Generation, Evaluation, and Metrics (GEM), Asso-
pers. Thus, the approach was not particularly successful           ciation for Computational Linguistics, Abu Dhabi,
when transferring to the patent domain.                            United Arab Emirates (Hybrid), 2022, pp. 399–404.
   Our setting, however, has several limitations. First, the       URL: https://aclanthology.org/2022.gem-1.34.
BigPatent dataset has known issues, and the Abstract is        [4] E. Sharma, C. Li, L. Wang, BIGPATENT: A Large-
not regarded as the best target for summarization in the           Scale Dataset for Abstractive and Coherent Sum-
patent community, as it contains superficial information           marization, in: Proceedings of the 57th An-
rather than the core invention nature. Second, we did              nual Meeting of the Association for Computa-
not have the opportunity to collaborate with legal and             tional Linguistics, Association for Computational
technical experts to evaluate our outputs.                         Linguistics, Florence, Italy, 2019, pp. 2204–2213.
   We believe that future work on patent summariza-                URL: https://www.aclweb.org/anthology/P19-1212.
tion should tackle a number of open problems. First, we            doi:10.18653/v1/P19-1212.
hope that this work will motivate the creation of better       [5] A. Gidiotis, G. Tsoumakas, A divide-and-conquer
benchmarks, which can be shared among researchers and              approach to the summarization of long documents,
practitioners interested in patent summarization. Sec-             IEEE/ACM Transactions on Audio, Speech, and
ond, we hope that the design of such a benchmark can be            Language Processing 28 (2020) 3029–3040. doi:10.
made in conjunction with patent experts and industrial             1109/TASLP.2020.3037401.
practitioners to ensure that it can be practically useful;     [6] G. Erkan, D. R. Radev, LexRank: Graph-Based Lexi-
while it is likely not practical to ask experts to write gold-     cal Centrality as Salience in Text Summarization, J.
standard summaries, there is space for improvement in              Artif. Int. Res. 22 (2004) 457–479.
the current setting. Third, the validity of the standard       [7] R. Mihalcea, P. Tarau, TextRank: Bringing Order
evaluation metrics in the patent domains should be mea-            into Text, in: Proceedings of the 2004 Conference
sured based on experts’ evaluation of the outputs. Finally,        on Empirical Methods in Natural Language Pro-
the factual accuracy of abstractive methods — which is             cessing, Association for Computational Linguis-
particularly important in a legal and technical domain —-          tics, Barcelona, Spain, 2004, pp. 404–411. URL:
should be better investigated.                                     https://www.aclweb.org/anthology/W04-3252.
                                                               [8] H. Zheng, M. Lapata, Sentence centrality revis-
                                                                   ited for unsupervised summarization, in: Proceed-




                                                          8
     ings of the 57th Annual Meeting of the Associa-              Association for Computational Linguistics (Volume
     tion for Computational Linguistics, Association for          1: Long Papers), Association for Computational Lin-
     Computational Linguistics, Florence, Italy, 2019, pp.        guistics, Vancouver, Canada, 2017, pp. 1073–1083.
     6236–6247. URL: https://aclanthology.org/P19-1628.           URL: https://www.aclweb.org/anthology/P17-1099.
     doi:10.18653/v1/P19-1628.                                    doi:10.18653/v1/P17-1099.
 [9] A. Nenkova, L. Vanderwende, The impact of fre-          [18] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad,
     quency on summarization, Microsoft Research, Red-            A. Mohamed, O. Levy, V. Stoyanov, L. Zettle-
     mond, Washington, Tech. Rep. MSR-TR-2005 101                 moyer, BART: Denoising sequence-to-sequence
     (2005).                                                      pre-training for natural language generation, trans-
[10] J. Steinberger, K. Jezek, et al., Using latent seman-        lation, and comprehension, in: Proceedings of the
     tic analysis in text summarization and summary               58th Annual Meeting of the Association for Com-
     evaluation, Proc. ISIM 4 (2004) 8.                           putational Linguistics, Association for Computa-
[11] Y. Liu, M. Lapata, Text summarization with pre-              tional Linguistics, Online, 2020, pp. 7871–7880. URL:
     trained encoders, ArXiv abs/1908.08345 (2019).               https://aclanthology.org/2020.acl-main.703. doi:10.
[12] M. Zhong, P. Liu, Y. Chen, D. Wang, X. Qiu,                  18653/v1/2020.acl-main.703.
     X. Huang, Extractive summarization as text              [19] J. Zhang, Y. Zhao, M. Saleh, P. Liu, PEGASUS:
     matching,       in: Proceedings of the 58th An-              Pre-training with Extracted Gap-sentences for Ab-
     nual Meeting of the Association for Computa-                 stractive Summarization, in: H. D. III, A. Singh
     tional Linguistics, Association for Computational            (Eds.), Proceedings of the 37th International Confer-
     Linguistics, Online, 2020, pp. 6197–6208. URL:               ence on Machine Learning, volume 119 of Proceed-
     https://aclanthology.org/2020.acl-main.552. doi:10.          ings of Machine Learning Research, PMLR, 2020, pp.
     18653/v1/2020.acl-main.552.                                  11328–11339. URL: http://proceedings.mlr.press/
[13] I. Sutskever, O. Vinyals, Q. V. Le, Sequence to Se-          v119/zhang20ae.html.
     quence Learning with Neural Networks, in: Pro-          [20] I. Beltagy, M. E. Peters, A. Cohan, Longformer: The
     ceedings of the 27th International Conference on             long-document transformer, ArXiv abs/2004.05150
     Neural Information Processing Systems - Volume               (2020).
     2, NIPS’14, MIT Press, Cambridge, MA, USA, 2014,        [21] M. Zaheer, G. Guruganesh, K. A. Dubey, J. Ainslie,
     p. 3104–3112.                                                C. Alberti, S. Ontanon, P. Pham, A. Ravula, Q. Wang,
[14] A. M. Rush, S. Chopra, J. Weston, A neural attention         L. Yang, A. Ahmed, Big Bird: Transformers for
     model for abstractive sentence summarization, in:            Longer Sequences, in: H. Larochelle, M. Ranzato,
     Proceedings of the 2015 Conference on Empirical              R. Hadsell, M. F. Balcan, H. Lin (Eds.), Advances in
     Methods in Natural Language Processing, Associa-             Neural Information Processing Systems, volume 33,
     tion for Computational Linguistics, Lisbon, Portu-           Curran Associates, Inc., 2020, pp. 17283–17297. URL:
     gal, 2015, pp. 379–389. URL: https://aclanthology.           https://proceedings.neurips.cc/paper/2020/file/
     org/D15-1044. doi:10.18653/v1/D15-1044.                      c8512d142a2d849725f31a9a7a361ab9-Paper.pdf.
[15] S. Chopra, M. Auli, A. M. Rush, Abstractive sen-        [22] S. Huang, R. Wang, Q. Xie, L. Li, Y. Liu, An
     tence summarization with attentive recurrent neu-            extraction-abstraction hybrid approach for long
     ral networks, in: Proceedings of the 2016 Con-               document summarization, in: 2019 6th Interna-
     ference of the North American Chapter of the As-             tional Conference on Behavioral, Economic and
     sociation for Computational Linguistics: Human               Socio-Cultural Computing (BESC), 2019, pp. 1–6.
     Language Technologies, Association for Compu-                doi:10.1109/BESC48373.2019.8962979.
     tational Linguistics, San Diego, California, 2016,      [23] W. S. El-Kassas, C. R. Salama, A. A. Rafea,
     pp. 93–98. URL: https://aclanthology.org/N16-1012.           H. K. Mohamed,             Automatic text summa-
     doi:10.18653/v1/N16-1012.                                    rization: A comprehensive survey,             Expert
[16] R. Nallapati, B. Zhou, C. dos Santos, Ç. Gulçehre,           Systems with Applications 165 (2021) 113679.
     B. Xiang, Abstractive text summarization using               URL:       https://www.sciencedirect.com/science/
     sequence-to-sequence RNNs and beyond, in: Pro-               article/pii/S0957417420305030.            doi:https:
     ceedings of The 20th SIGNLL Conference on Com-               //doi.org/10.1016/j.eswa.2020.113679.
     putational Natural Language Learning, Association       [24] J. Codina-Filbà, N. Bouayad-Agha, A. Burga,
     for Computational Linguistics, Berlin, Germany,              G. Casamayor, S. Mille, A. Müller, H. Sag-
     2016, pp. 280–290. URL: https://aclanthology.org/            gion, L. Wanner,          Using genre-specific fea-
     K16-1028. doi:10.18653/v1/K16-1028.                          tures for patent summaries, Information Pro-
[17] A. See, P. J. Liu, C. D. Manning, Get to the point:          cessing & Management 53 (2017) 151 – 174.
     Summarization with pointer-generator networks,               URL: http://www.sciencedirect.com/science/article/
     in: Proceedings of the 55th Annual Meeting of the            pii/S0306457316302825. doi:https://doi.org/




                                                         9
     10.1016/j.ipm.2016.07.002.                                    Methods in Natural Language Processing (EMNLP),
[25] A. Trappey, C. Trappey, B. H. Kao, Automated                  Association for Computational Linguistics, Online,
     Patent Document Summarization for R&D Intellec-               2020, pp. 9308–9319. URL: https://www.aclweb.org/
     tual Property Management, 2006 10th International             anthology/2020.emnlp-main.748. doi:10.18653/
     Conference on Computer Supported Cooperative                  v1/2020.emnlp-main.748.
     Work in Design (2006) 1–6.                               [34] J. He, W. Kryściński, B. McCann, N. Rajani, C. Xiong,
[26] A. J. C. Trappey, C. V. Trappey, C.-Y. Wu, A                  CTRLsum: Towards Generic Controllable Text Sum-
     Semantic Based Approach for Automatic Patent                  marization, arXiv preprint arXiv:2012.04281 (2020).
     Document Summarization, in: R. Curran, S.-Y.             [35] M. Guo, J. Ainslie, D. Uthus, S. Ontanon, J. Ni, Y.-
     Chou, A. Trappey (Eds.), Collaborative Product and            H. Sung, Y. Yang, LongT5: Efficient text-to-text
     Service Life Cycle Management for a Sustainable               transformer for long sequences, in: Findings of the
     World, Springer London, London, 2008, pp. 485–                Association for Computational Linguistics: NAACL
     494.                                                          2022, Association for Computational Linguistics,
[27] Y.-H. Tseng, C.-J. Lin, Y.-I. Lin, Text mining                Seattle, United States, 2022, pp. 724–736. URL: https:
     techniques for patent analysis, Information Pro-              //aclanthology.org/2022.findings-naacl.55.
     cessing & Management 43 (2007) 1216 – 1247.              [36] S. Casola, A. Lavelli, Summarization, simpli-
     URL: http://www.sciencedirect.com/science/article/            fication, and generation: The case of patents,
     pii/S0306457306002020. doi:https://doi.org/                   Expert Systems with Applications 205 (2022)
     10.1016/j.ipm.2006.11.011, patent Process-                    117627. URL: https://www.sciencedirect.com/
     ing.                                                          science/article/pii/S0957417422009356. doi:https:
[28] A. Trappey, C. Trappey, C.-Y. Wu, Automatic patent            //doi.org/10.1016/j.eswa.2022.117627.
     document summarization for collaborative knowl-          [37] A. Celikyilmaz, E. Clark, J. Gao, Evaluation of Text
     edge systems and services, Journal of Systems                 Generation: A Survey, 2020. arXiv:2006.14799.
     Science and Systems Engineering 18 (2009) 71–94.         [38] E. Lloret, L. Plaza, A. Aker, The Challenging
     doi:10.1007/s11518-009-5100-7.                                Task of Summary Evaluation: An Overview, Lang.
[29] K. Girthana, S. Swamynathan, Query Oriented                   Resour. Eval. 52 (2018) 101–148. URL: https://
     Extractive-Abstractive Summarization System (QE-              doi.org/10.1007/s10579-017-9399-2. doi:10.1007/
     ASS), in: Proceedings of the ACM India Joint Inter-           s10579-017-9399-2.
     national Conference on Data Science and Manage-          [39] C.-Y. Lin, ROUGE: a Package for Automatic Evalu-
     ment of Data, CoDS-COMAD ’19, Association for                 ation of Summaries, in: Workshop on Text Summa-
     Computing Machinery, New York, NY, USA, 2019,                 rization Branches Out, Post-Conference Workshop
     p. 301–305. URL: https://doi.org/10.1145/3297001.             of ACL 2004, Barcelona, Spain, 2004, pp. 74–81.
     3297046. doi:10.1145/3297001.3297046.                    [40] J.-S. Lee, Controlling Patent Text Generation by
[30] K. Girthana, S. Swamynathan, Query-Oriented                   Structural Metadata, Association for Computing
     Patent Document Summarization System (QPSS),                  Machinery, New York, NY, USA, 2020, p. 3241–3244.
     in: M. Pant, T. K. Sharma, O. P. Verma, R. Singla,            URL: https://doi.org/10.1145/3340531.3418503.
     A. Sikander (Eds.), Soft Computing: Theories and         [41] A. Wang, K. Cho, M. Lewis, Asking and Answer-
     Applications, Springer Singapore, Singapore, 2020,            ing Questions to Evaluate the Factual Consistency
     pp. 237–246.                                                  of Summaries, in: Proceedings of the 58th An-
[31] C. M. de Souza, M. E. Santos, M. R. G. Meireles,              nual Meeting of the Association for Computational
     P. E. M. Almeida, Using Summarization Techniques              Linguistics, Association for Computational Linguis-
     on Patent Database Through Computational Intelli-             tics, Online, 2020, pp. 5008–5020. URL: https://www.
     gence, in: P. Moura Oliveira, P. Novais, L. P. Reis           aclweb.org/anthology/2020.acl-main.450. doi:10.
     (Eds.), Progress in Artificial Intelligence, Springer         18653/v1/2020.acl-main.450.
     International Publishing, 2019, pp. 508–519.             [42] A. Pu, H. W. Chung, A. P. Parikh, S. Gehrmann,
[32] N. Bouayad-Agha, G. Casamayor, G. Ferraro,                    T. Sellam, Learning compact metrics for MT, in:
     S. Mille, V. Vidal, L. Wanner, Improving the compre-          Proceedings of EMNLP, 2021.
     hension of legal documentation: The case of patent       [43] L. Page, S. Brin, R. Motwani, T. Winograd, The
     claims, in: Proceedings of the International Con-             PageRank citation ranking : Bringing order to the
     ference on Artificial Intelligence and Law, 2009, pp.         web, in: WWW 1999, 1999.
     78–87. doi:10.1145/1568234.1568244.                      [44] A. Burga, J. Codina, G. Ferraro, H. Saggion, L. Wan-
[33] J. Pilault, R. Li, S. Subramanian, C. Pal, On Ex-             ner, The challenge of syntactic dependency parsing
     tractive and Abstractive Neural Document Summa-               adaptation for the patent domain, in: ESSLLI-13
     rization with Transformer Language Models, in:                workshop on extrinsic parse improvement., 2013.
     Proceedings of the 2020 Conference on Empirical          [45] L. Andersson, M. Lupu, A. Hanbury, Domain




                                                         10
     Adaptation of General Natural Language Process-
     ing Tools for a Patent Claim Visualization System,
     in: M. Lupu, E. Kanoulas, F. Loizides (Eds.), Multi-
     disciplinary Information Retrieval, Springer Berlin
     Heidelberg, Berlin, Heidelberg, 2013, pp. 70–82.
[46] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K.
     Landauer, R. A. Harshman, Indexing by latent se-
     mantic analysis, Journal of the Association for In-
     formation Science and Technology 41 (1990) 391–
     407. doi:10.1002/(SICI)1097-4571(199009)
     41:6<391::AID-ASI1>3.0.CO;2-9.
[47] V. Klema, A. Laub, The singular value decomposi-
     tion: Its computation and some applications, IEEE
     Transactions on Automatic Control 25 (1980) 164–
     176. doi:10.1109/TAC.1980.1102314.
[48] Y. Gong, X. Liu, Generic text summarization using
     relevance measure and latent semantic analysis, in:
     Annual International ACM SIGIR Conference on Re-
     search and Development in Information Retrieval,
     2001.
[49] M. Grusky, M. Naaman, Y. Artzi, Newsroom: A
     dataset of 1.3 million summaries with diverse ex-
     tractive strategies, in: Proceedings of the 2018
     Conference of the North American Chapter of the
     Association for Computational Linguistics: Hu-
     man Language Technologies, Volume 1 (Long Pa-
     pers), Association for Computational Linguistics,
     New Orleans, Louisiana, 2018, pp. 708–719. URL:
     https://aclanthology.org/N18-1065. doi:10.18653/
     v1/N18-1065.
[50] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever,
     Improving language understanding by generative
     pre-training, Technical Report, 2018.
[51] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei,
     I. Sutskever, Language Models are Unsupervised
     Multitask Learners, Technical Report, 2019.
[52] T. B. Brown, B. Mann, N. Ryder, M. Subbiah,
     J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam,
     G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss,
     G. Krueger, T. Henighan, R. Child, A. Ramesh,
     D. M. Ziegler, J. Wu, C. Winter, C. Hesse,
     M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess,
     J. Clark, C. Berner, S. McCandlish, A. Radford,
     I. Sutskever, D. Amodei,        Language Models
     are Few-Shot Learners,         in: H. Larochelle,
     M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.),
     Advances in Neural Information Processing
     Systems 33: Annual Conference on Neural
     Information Processing Systems 2020, NeurIPS
     2020, December 6-12, 2020, virtual, 2020. URL:
     https://proceedings.neurips.cc/paper/2020/hash/
     1457c0d6bfcb4967418bfb8ac142f64a-Abstract.
     html.




                                                         11