<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Santiago de Compostela, August</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A new approach for extracting the conceptual schema of texts based on the linguistic Thematic Progression theory</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elena del Olmo</string-name>
          <email>elenadelolmo@ucm.es</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ana María Fernández-Pampillón</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>29</volume>
      <issue>2020</issue>
      <fpage>23</fpage>
      <lpage>27</lpage>
      <abstract>
        <p>3 The purpose of this article is to present a new approach for the discovery and labelling of the implicit conceptual schema of texts through the application of the Thematic Progression theory. The underlying conceptual schema is the core component for the generation of summaries that are genuinely consistent with the semantics of the text. Automatic Summary Generation was first proposed in the late 1950s. Outstanding examples of this early stage are Luhn (1958), whose method is based on sentence extraction relying on its words weightings, inferred from TF-IDF metrics, or Edmundson (1969), who proposed novel sentence weighting metrics, such as the presence of words from a predefined list, the presence of the words of the title of the document or its positioning at the beginning of documents and paragraphs. These are paradigmatic examples of the first extractive summarization techniques: techniques based on the verbatim extraction of the most relevant parts of a text. The generated text summary was, thus, a collection of sentences considered relevant but, often, semantically inconsistent because of the overall weakness in coherence (the text does not make overall sense) and cohesion (the sentences are connected incorrectly). The summary generated was consequently a poorly connected text with no global meaning, presumably due to the assumption of independence of the extracted sentences (Lloret et al. 2012).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Currently, five main approaches to extractive techniques can be
distinguished: (i), statistical approaches
        <xref ref-type="bibr" rid="ref16 ref23 ref38">(Luhn 1958, McCargar
2004, Galley 2006)</xref>
        , based on different strategies for term counting,
(ii), topic-based approaches
        <xref ref-type="bibr" rid="ref15 ref36">(Edmundson 1969, Harabagiu et al.
2005)</xref>
        , which assume that several topics are implicit in a text and
attempt to formally represent those topics, (iii), graph-based
approaches
        <xref ref-type="bibr" rid="ref10 ref11">(Erkan et al. 2004, Giannakopoulos et al. 2008)</xref>
        , based
on the representation of the linguistic elements in texts judged to be
relevant as nodes connected by arcs, (iv), discourse-based
approaches
        <xref ref-type="bibr" rid="ref17 ref4 ref5">(Marcu 2000, Cristea et al. 2005, da Cunha et al. 2007)</xref>
        ,
whose target is to capture the discursive relations within texts, and,
(v), machine learning approaches
        <xref ref-type="bibr" rid="ref22 ref33">(Aliguliyev 2010, Hannah et al.
2014)</xref>
        , intended to reduce the text summarization task to a
classification task by assigning a relevance value to each sentence.
      </p>
      <p>
        Although historically less addressed in the literature, abstractive
models try to address the lack of coherence and cohesion in the
summaries produced, using some source of semantic internal
representation of the text (which can be merely the output of an
extractive process) to generate the ultimate summary, composed of
sentences not necessarily included in the original text. Although this
approaches theoretically improve the consistency issue, they
introduce a new complexity layer: a natural language generator
module. Despite this greater complexity, nowadays text
summarization research is progressively shifting towards abstractive
approaches
        <xref ref-type="bibr" rid="ref14">(Lin et al. 2019)</xref>
        .
      </p>
      <p>
        Traditionally, abstractive summarization techniques have been
classified into structure-based, intended to populate predefined
information structures out of the relevant sentences of the texts, and
semantic-based, involving a wide variety of knowledge
representation techniques. Regarding the former, depending on the
structural schema chosen, it is possible to identify, (i), tree-based
models
        <xref ref-type="bibr" rid="ref41">(Kikuchi at. al. 2014)</xref>
        , which perform different strategies for
syntactic parsing analysis in order to codify paraphrasing
information mainly by linking and reducing the syntactic sentence
trees of the text, (ii), template-oriented models
        <xref ref-type="bibr" rid="ref19 ref30">(Elhadad et al. 2015,
Wu et al. 2018, Wang et al. 2019)</xref>
        , which rely on extraction rules led
by linguistic patterns matching sequences of tokens to be mapped
into predefined templates, (iii), ontology-based models
        <xref ref-type="bibr" rid="ref3 ref7">(Nguyen
2009, Baralis et al. 2013)</xref>
        , which are highly domain-dependent and
include a hierarchical classifier mapping concepts into the nodes of
an ontology, and, (iv), rule-based models
        <xref ref-type="bibr" rid="ref29">(Genest et al. 2011)</xref>
        , based
on extraction rules operating on categories and features
representative of the content of the text. Regarding the latter,
semantic-based techniques for abstractive summarization, there are
interesting approaches based on the concept of information item
        <xref ref-type="bibr" rid="ref1">(Gatt et al. 2009)</xref>
        , the smallest units with internal coherence, in the
format of subject-verb-object triplets obtained through semantic role
labeling, disambiguation, coreference resolution and the
formalization of predication. Besides, there are approaches based on
discourse information
        <xref ref-type="bibr" rid="ref27 ref34">(Gerani et al. 2014, Goyal et al. 2016)</xref>
        ,
predicate-argument tuples
        <xref ref-type="bibr" rid="ref18 ref39">(Li 2015, Zhang et al. 2016)</xref>
        and
semantic graphs
        <xref ref-type="bibr" rid="ref42">(Liu et al. 2019)</xref>
        .
      </p>
      <p>The aforementioned tendency towards abstractive approaches in
recent years is framed at a stage when the new Deep Learning
models have proved to be particularly promising for using vector
spaces as a way to address the shortcomings of discrete symbols as
the input for Natural Language Processing tasks, such as tokens or
lemmas, which cannot represent the underlying semantics of the
concepts involved. This new paradigm has provided techniques for
both extractive and abstractive summarization, such as the clustering
of sentence and document embeddings, or the generation of correct
sentences given a sentence embedding and a language model.</p>
      <p>Remarkable examples are the contributions of Templeton et al.
(2018), who compare different methods of sentence embeddings
computing their cosine similarity, or Miller et al. (2019), who</p>
      <p>Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
proposed k-means clustering to identify sentences closest to the
centroid for summary selection.</p>
      <p>
        In addition to the distinction between extractive and abstractive
approaches, there is a crucial challenge in automatic summarization
which affects them both: the subjectivity of the accuracy scoring of
summaries. This implies a new difficulty in the creation of objective
gold datasets composed of correct summaries. In this context,
unsupervised summary models, such as the one proposed in this
paper, which does not require training labelled data, has become
particularly relevant. Among the unsupervised approaches we can
highlight, (i), approaches which are extensions of word embedding
techniques, such as the n-grams embeddings
        <xref ref-type="bibr" rid="ref37">(Mikolov et al. 2013)</xref>
        ,
or doc2vec
        <xref ref-type="bibr" rid="ref31">(Le et al. 2014)</xref>
        , (ii), the skip-thought vectors
        <xref ref-type="bibr" rid="ref32">(Kiros et
al. 2015)</xref>
        , (iii), the Word Mover’s Distance model
        <xref ref-type="bibr" rid="ref24">(Kusner et al.
2015)</xref>
        , (iv), quick-thought vectors
        <xref ref-type="bibr" rid="ref20">(Logeswaran et al. 2018)</xref>
        , and, (v),
models based on contextual embeddings obtained from
transformers, such as SBERT
        <xref ref-type="bibr" rid="ref28">(Reimers et al. 2019)</xref>
        .
      </p>
      <p>This paper addresses one of the weaknesses of extractive models
discussed in the previous section, i. e. the lack of coherence in the
summaries produced, especially when there are insufficient
linguistic datasets in a language for applying machine or Deep
Learning methods. In this respect, the solution we propose identifies
implicit conceptual schemas from texts using the morpho-syntactic
knowledge currently provided by NL analyzers.</p>
      <p>The paper is organized as follows: in section 2 we define the
hypothesis and objectives of the research work. In section 3 we
present a review of the linguistic theories on which we base our
solution: the Thematic theory and Thematic Progression theory. In
section 4 we present our solution: the application of both theories
for the identification of the text conceptual schema. In section 5 we
study the feasibility of our solution for the automatic extraction of
thematic progression in Spanish, a language with few linguistic
datasets for text summarization. Finally, in section 6 we draw the
conclusions of this work and present our future research lines.</p>
    </sec>
    <sec id="sec-2">
      <title>HYPOTHESIS AND OBJECTIVES</title>
      <p>Our hypothesis is that applying the Thematic Theory and the
Thematic Progression Theory to annotate the discourse features
theme, rheme and their coreferences will allow us to extract thematic
progression schemas, which represent the implicit conceptual
schemas of texts.</p>
      <p>Therefore, our aim is obtaining an internal representation of the text
informational structure as a formal representation for text
summarization. The advantage of this solution is that it can be
applied to any language regardless of whether or not there are
enough training data for the implementation of machine learning and
Deep Learning techniques. In our work we will use Spanish as the
language to study the feasibility of the solution. We also hope to
contribute to the generation of summaries in Spanish, a task
currently performed with moderate efficiency due to the limited
availability of linguistic resources.</p>
    </sec>
    <sec id="sec-3">
      <title>REVIEW OF THEMATIC AND</title>
    </sec>
    <sec id="sec-4">
      <title>THEMATIC PROGRESSION</title>
      <p>3.1.</p>
    </sec>
    <sec id="sec-5">
      <title>Thematic theory</title>
      <p>
        The thematic theory is framed within the optics of linguistic analysis
corresponding to the informational layer. The uses and applications
that the authors have been giving to terms such as theme, focus, topic
and notions such as new information or emphasis have been
overwhelmingly numerous (Gutiérrez 2000). In accordance with the
Thematic theory, in descriptive and narrative texts, which are the
ones most typically to summarize, known information, or theme, is
consensually described to be positioned at the beginning of
sentences. By contrast, the phrases containing the informative
contribution of the sentence, also known as rheme, tend to be located
further to the right, ahead in the time of enunciation. This description
is consistent with how the acquisition of new knowledge is described
at the neurological level, through networking the known with the
novel or by altering pre-existing relationships
        <xref ref-type="bibr" rid="ref40">(McCulloch et al.
1943)</xref>
        .
      </p>
      <p>In order to clarify how we will use these concepts, we present here
a series of examples adapted from Gutiérrez (2000: 18) and their
corresponding answers:
1. Who joined Luis this morning?</p>
      <p>Ludwig was joined this morning by Peter.
2. When did Pedro join Luis?</p>
      <p>Pedro joined Luis this morning.
3. Who joined Pedro this morning?</p>
      <p>This morning Peter joined Ludwig.</p>
      <p>That these statements are different is a standard judgment for any
native speaker. Although they share the same representative
function, i. e. their syntactic and semantic relations do not differ,
they show different informative functions. Therefore, in spite of
transmitting the same information about the world, they do not
inform in the same way. Accordingly, the underlying assumption of
our proposal is that the thematic status of a phrase (like who, when
and who in the examples above, respectively) is relevant in terms of
the prevalence of the concept involved for the summarization of a
document. Their clustering along a document, taking into account
the thematic progressions patterns found, as further explained in the
next section, is expected to reveal the conceptual schema of the text.
3.2.</p>
    </sec>
    <sec id="sec-6">
      <title>Thematic Progression theory</title>
      <p>Daneš (1974: 114) presents the thematic progression as the choice
and arrangement of discourse themes, their concatenation, hierarchy
and relation with the topics of texts. Accordingly, he argues that it is
possible to infer the informational schema of a text from its
themerheme organization. It is considered that there are three main
typologies of thematic progressions: (i), linear progression, in which
the rheme of one sentence is the theme of the subsequent sentence,
(ii), constant progression, in which a theme is repeated over several
sentences, and, (iii), derived progression, in which several topics are
derived from the same inferred hypertheme. Apart from these three
basic types, Daneš (1974: 120) also proposed that the combination
of them can lead to thematic progressions of higher levels of
abstraction, such as, (iv), the split rheme progression, which consists
of the existence of a complex rheme, whose hyponyms and
meronyms are themes of the subsequent sentences. Finally, he
concludes (1974: 122) that the study of the thematic organization of
a text could be useful for numerous practical applications, among
which outstands information retrieval, given the performance
achieved nowadays by the tools available for the automatic text
analysis.</p>
    </sec>
    <sec id="sec-7">
      <title>THEMATIC PROGRESSION AS A</title>
    </sec>
    <sec id="sec-8">
      <title>MODEL FOR SEMANTIC TEXT</title>
    </sec>
    <sec id="sec-9">
      <title>REPRESENTATION</title>
      <p>The usefulness of the thematic or rhematic roles of concepts along
texts for automatic text summarization arises from two main facts.
On the one hand, the theoretical validation of the concept of thematic
progression enjoys consensus among researchers as a relevant
description for the semantic structure of texts. On the other hand,
although it has been traditionally examined through the optics of the
Pragmatics layer, the thematic or rhematic status of a concept is
actually embodied in the surface syntactic layer, which is prone to
be represented in an easy-to-compute form.</p>
      <p>
        Concerning the correlation between the theme of a sentence and its
syntactic structure, which is crucial for its automatic annotation,
Halliday (1985) proposed an interesting categorization based on the
concept of linguistic markedness. Thus, in SVO languages, such as
English or Spanish, for declarative sentences there are unmarked
themes, prototypically the syntactic subjects preceding principal
verbs, and marked themes, such as circumstantial attachments,
complements, or sentences with predicate construction. Examples
for the former are the first and second sentence of the examples
provided above with Ludwig and Pedro as unmarked themes
respectively, whilst the third sample sentence is an example for the
latter, with this morning as theme. Thematic equative sentences,
such as What I want is a proper cup of coffee, would be excluded
from this categorization. For interrogative sentences, the unmarked
themes are definite verbs for yes-no questions, such as did deliver in
Did the president deliver a speech, and interrogative pronouns and
similar phrases for non-polar questions, such as where in Where is
Berlin located?, whilst marked themes are circumstantial adjuncts.
Besides, a constituent which is not the theme of a sentence may
appear occasionally in a prototypical theme position. This
phenomenon has been referred to by several names, such as
focussing
        <xref ref-type="bibr" rid="ref13">(Campos et al. 1990)</xref>
        , focus preposition
        <xref ref-type="bibr" rid="ref12">(Ward 1985)</xref>
        or
thematization
        <xref ref-type="bibr" rid="ref26">(Chomsky 1972)</xref>
        . Examples of this type of
informational structure are It was Pedro who lied to me. A number
of authors (e.g. Gutiérrez 2000: 34) have argued that the intent of
this particular information schemas is to gain the attention of the
interlocutor to overcome their presumed predisposition to receive
information that is at some point contrary to that which is intended
to be communicated, or simply to emphasize the importance of a
certain aspect in the informational process. This nuance of
enunciative modality would undoubtedly be applicable for the
weighing of the relevant concepts for a proper summary, especially
since the syntactic structures involved are relatively easy to match
with rules only out of the tokens positions, the dependency relation
tags and the dependency heads.
3 http://clic.ub.edu/corpus/es/ancora-descarregues
4 https://github.com/eelenadelolmo/HI4NLP/tree/master
5 http://nlp.lsi.upc.edu/freeling/demo/demo.php
      </p>
      <p>In short, it is possible to locate the discourse elements theme and
rheme using syntactic knowledge. To the extent that syntactic
analysis is a task that can be considered well solved in NLP, it seems
feasible to be able to automatically locate the theme and rheme in
every sentence of a text. The next natural step in order to obtain the
thematic progression schema, i. e. the conceptual schema of a text,
is to connect each theme and rheme of the sentences of the text in
the ultimate thematic path.
5.</p>
    </sec>
    <sec id="sec-10">
      <title>A FIRST STUDY OF THE FEASIBILITY</title>
    </sec>
    <sec id="sec-11">
      <title>OF THEMATIC PROGRESSION</title>
    </sec>
    <sec id="sec-12">
      <title>THEORY IN SPANISH TEXTS</title>
      <p>Aiming to conduct a first study to verify the applicability of the
Thematic Progression theory for the extraction of the underlying
conceptual schema of a text, we carried out an exploratory corpus
survey with Spanish descriptive texts. We analyzed the mean ratio
and the ratio per text of preceding subjects, since they are the
prototypically unmarked themes. The examined corpus is AnCora
Surface Syntax Dependencies3 (AnCora GLiCom-UPF 1.1),
published in 2014 at Pompeu Fabra University, which contains
17,376 sentences manually annotated with the lemmas, the PoS tags
plus other morphological features and both dependency heads and
relations for every token. The analysis was based on a symbolic
rulebased grammar expressed as sub-tree extraction operations from the
dependency tree of the sentences. In order to ensure the generality
of the grammar, the elements obtained through rules for sampling
preceding subjects were compared with the corresponding results in
a second version of the corpus, resulting from its automatic
annotation with the Freeling analyzer. The thematic progression
analyzer scripts used to process the grammar and to generate the
outputs of the corpus rendering are publicly accessible from github4.
Basically, the grammar and the analysis algorithm to extract the
thematic progression schema of the text were designed in three
consecutive steps: (i), first, the automatic identification and labelling
of themes for every sentence; (ii), second the subsequent
identification of their rheme; and, (iii), finally, the identification of
concepts corresponding to the same theme or rheme in a text. Each
step was carried out using a symbolic syntactic-semantic rule-based
grammar expressed as sub-tree extraction operations. For the
grammar definition, an approach based on the transformation of
dependency trees has been applied. Thus, for the simplest scenario
of finding unmarked subjects in SVO languages, such as Spanish,
two categories of rules were defined: (i), matching rules of child
dependencies from a selected head token, consisting of the
identification of a dependency relation as the name of a relation of
arity two with a first argument as the key and the value expected for
the selected parent and a second argument with the options for
matching, being ALL if all children nodes from the head of a scope
should be matched or ONE if only the immediate child should be
matched. For example, the SUB(deprel:ROOT, ALL) rule would
match subtrees consisting of all child nodes of the SUB children of
a token tagged with a ROOT dependency relation (as shown in figure
1, obtained from the Freeling 4.1 demo5); and, (ii), matching rules
of head dependencies from a selected child token, consisting also of
the identificator of a dependency relation as the name of a relation
of arity two, whose arguments are the same as for the child
dependency rules but in the opposite order. The second type of rules
can apply, for example, for sentence compression when several
propositions are involved.
The analyzer accepts various corpus formats as input and transforms
them into the universal CoNLL-U format, where the additional
theme and rheme features are added for every token in the main
proposition of sentences. We found that, with our first version of the
rules, theme and rheme annotation was correct in roughly half of the
cases, as shown in table 1. Through a careful manual review of the
data, these results have enabled us to confirm a significant
correlation between the syntactic-semantic and discourse layers
outlined by the Thematic theory, and, consequently, the feasibility
of automating identification of, at least, half of the themes.
With the syntactically annotated GLiCom-UPF 1.1 version of the
AnCora corpus we seek to provide objective metrics, not prone to
major annotation errors, as the corpus annotation has been manually
reviewed. The qualitative analysis of this data is intended to assess
whether or not the preceding subjects match the main themes of
sentences in order to ultimately detect the underlying thematic
progression template of texts. By contrast, with the Freeling version
of the corpus we aim to assess the accuracy in applying the rule for
the extraction of unmarked themes with no dependence on manually
annotated data, because Freeling is the best option for syntax
annotation in Spanish at this stage. According to this objective, we
have generated two different files, one for each version of the
corpus, with the suspected overmatched and undermatched
sentences, as shown in table 2.
As the figures suggest, a pervasive tendency for undermatching
sentences with preceding subjects by Freeling has been confirmed
through careful examination. We found that the vast majority of
mismatched annotations involve some type of coordinated or
juxtaposed clauses. These syntactic structures are analyzed by
Freeling with a highly fluctuating dependency structure, which is
quite different from the analysis in GLiCom-UPF 1.1. This high
variability in the syntactic tree accounts for the vast majority of both
the undermatched and overmatched sentences ratios.</p>
      <p>A qualitative analysis of unmatched sentences has also been
conducted, revealing a strong presence of thematization as the most
relevant finding. This sentence pattern has been referred to by the
Thematic theory as a relevant discourse feature, indicating a break
in the information flow of the text, as further discussed above. A
promising conclusion of the analysis of the patterns found is their
suitability for being implemented in the rule formalism designed.
Besides, as a synthesis of the findings obtained from the qualitative
analysis of the manually annotated version, there is a strong presence
of subordinate clauses in the corpus, which implies the necessity of
more complex rules to select the most informative proposition in the
sentence. The observed patterns have been categorized into three
main categories (the most informative clauses appear in bold and the
selected theme is underlined):</p>
      <p>Sentences whose root clause is the most relevant (e. g. Since
pharmacists work with a high profit margin, the business
opportunity is huge).</p>
      <p>Sentences whose root clause is not the most relevant. (e. g. The
main factor is that electricity consumption during the summer
is now not much lower than it used to be).</p>
      <p>Sentences whose root clause is not the most relevant but
provides a crucial modality feature for information retrieval (e.
g. Investigators are convinced that someone deliberately cut
that rubber).</p>
    </sec>
    <sec id="sec-13">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>As observed in the study conducted, this first approach to rule-based
theme annotation seems to claim the theoretically hypothesized
correlation between the syntactic-semantic and discourse layers
required by our proposal. However, the qualitative analysis of both
matched and unmatched sentences revealed the need for more
complex tree rewriting rules to achieve a more accurate theme
selection in order to obtain thematic progression schemas from texts.
Regarding subordination, i. e. sentences with several propositions
with different syntactic status, we are working on two feasible
options for sentence compression: (i), the choice of the most relevant
proposition for every sentence, and, (ii), the choice of the ordered
subset of its n more relevant clauses. In addition, this study shows
the necessity to implement an algorithm to infer the modality from
the main verb. We also found that the rules should be refined in order
to capture the various ways in which coordinated and juxtaposed
clauses could be analyzed, given the high variability observed in the
automatic syntactic annotation. Finally, the study revealed that it
will be necessary to design lexicon-based rules to capture lexical
semantic generalizations. With all this, in principle, it seems
possible to apply this new linguistic approach for the extraction of
implicit conceptual schemas from texts.</p>
    </sec>
    <sec id="sec-14">
      <title>REFERENCES</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Gatt</surname>
          </string-name>
          and E. Reiter, '
          <article-title>SimpleNLG: A realisation engine for practical applications'</article-title>
          ,
          <source>Proceedings of the 12th European Workshop on Natural Language Generation</source>
          ,
          <fpage>90</fpage>
          -
          <lpage>93</lpage>
          , (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Templeton</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kalita</surname>
          </string-name>
          . '
          <article-title>Exploring Sentence Vector Spaces through Automatic Summarization'</article-title>
          ,
          <source>Proceedings of the 17th IEEE International Conference on Machine Learning and Applications</source>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.Q.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.T.</given-names>
            <surname>Phan</surname>
          </string-name>
          . '
          <article-title>An ontology-based approach for key phrase extraction'</article-title>
          ,
          <source>Proceedings of the ACL-IJCNL</source>
          ,
          <fpage>181</fpage>
          -
          <lpage>184</lpage>
          , (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Cristea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Postolache</surname>
          </string-name>
          and
          <string-name>
            <surname>I. Pistol.</surname>
          </string-name>
          '
          <source>Summarisation through discourse structure', Lecture Notes in Computer Science</source>
          ,
          <volume>3406</volume>
          ,
          <fpage>632</fpage>
          -
          <lpage>644</lpage>
          , (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Marcu</surname>
          </string-name>
          .
          <source>The Theory and Practice of Discourse and Summarization</source>
          , The MIT Press, Cambridge,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Miller</surname>
          </string-name>
          . '
          <source>Leveraging BERT for Extractive Text Summarization on Lectures', Proceedings of arXiv,</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>E.</given-names>
            <surname>Baralis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cagliero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jabeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fiori</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Shah</surname>
          </string-name>
          . '
          <article-title>Multidocument summarization based on the Yago ontology'</article-title>
          ,
          <source>Expert Systems with Applications</source>
          ,
          <volume>9</volume>
          ,
          <fpage>6976</fpage>
          -
          <lpage>6984</lpage>
          , (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>E.</given-names>
            <surname>Lloret</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Palomar</surname>
          </string-name>
          . '
          <article-title>Text summarization in progress: a literature review'</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          ,
          <volume>37</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>41</lpage>
          , (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>F.</given-names>
            <surname>Daneš</surname>
          </string-name>
          , Papers on functional sentence perspective, Mouton, The Hague,
          <year>1974</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>G.</given-names>
            <surname>Erkan</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.R.</given-names>
            <surname>Radev</surname>
          </string-name>
          . 'LexRank:
          <article-title>Graph-based lexical centrality as salience in text summarization'</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>22</volume>
          ,
          <fpage>457</fpage>
          -
          <lpage>459</lpage>
          , (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>G.</given-names>
            <surname>Giannakopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karkaletsis</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.A.</given-names>
            <surname>Vouros</surname>
          </string-name>
          . '
          <article-title>Testing the Use of N-gram Graphs in Summarization Sub-tasks'</article-title>
          ,
          <source>Proceedings of the Text Analytics Conference</source>
          , (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. G.
          <article-title>Ward, The Semantics</article-title>
          and Pragmatics of Preposing, Universidad de Pennsylvania,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>H.</given-names>
            <surname>Campos</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampini</surname>
          </string-name>
          . 'Focalization Strategies in Spanish', Probus,
          <volume>2</volume>
          ,
          <fpage>47</fpage>
          -
          <lpage>64</lpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          . '
          <article-title>Abstractive Summarization: A Survey of the State of the Art'</article-title>
          ,
          <source>Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence</source>
          ,
          <fpage>9816</fpage>
          -
          <lpage>9822</lpage>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>H.P Edmundson</surname>
          </string-name>
          . '
          <article-title>New methods in automatic extracting'</article-title>
          ,
          <source>Journal of the ACM</source>
          ,
          <volume>16</volume>
          (
          <issue>2</issue>
          ),
          <fpage>264</fpage>
          -
          <lpage>285</lpage>
          , (
          <year>1969</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>H.P.</given-names>
            <surname>Luhn</surname>
          </string-name>
          . '
          <article-title>The Automatic Creation of Literature Abstracts'</article-title>
          ,
          <source>IBM Journal of research and development</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ),
          <fpage>159</fpage>
          -
          <lpage>165</lpage>
          , (
          <year>1958</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. I. da Cunha, S. Fernández,
          <string-name>
            <given-names>P.V.</given-names>
            <surname>Morales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vivaldi</surname>
          </string-name>
          , E. San Juan and
          <string-name>
            <given-names>J.M.</given-names>
            <surname>Torres-Moreno</surname>
          </string-name>
          .
          <article-title>'A new hybrid summarizer based on vector space model</article-title>
          ,
          <source>statistical physics and linguistics', Lecture Notes in Computer Science</source>
          ,
          <volume>4827</volume>
          ,
          <fpage>872</fpage>
          -
          <lpage>882</lpage>
          , (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zong</surname>
          </string-name>
          . '
          <article-title>Abstractive Cross-Language Summarization via Translation Model Enhanced Predicate Argument Structure Fusing'</article-title>
          , IEEE/ACM, 24, (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Quan</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          . '
          <article-title>BiSET: Bi-directional Selective Encoding with Template for Abstractive Summarization', The 57th Annual Meeting of the Association for Computational Linguistics</article-title>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>L.</given-names>
            <surname>Logeswaran</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          . '
          <article-title>An efficient framework for learning sentence representations'</article-title>
          ,
          <source>Proceedings of the 6th International Conference on Learning Representations</source>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>M.A.K. Halliday</surname>
          </string-name>
          , An Introduction to Functional Grammar, Arnold, London,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>M.E. Hannah</surname>
            and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Mukherjee</surname>
          </string-name>
          .
          <article-title>'A classification-based summarisation model for summarising text documents'</article-title>
          ,
          <source>International Journal of Information and Communication Technology</source>
          ,
          <volume>6</volume>
          ,
          <fpage>292</fpage>
          -
          <lpage>308</lpage>
          , (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>M.</given-names>
            <surname>Galley</surname>
          </string-name>
          . '
          <article-title>Skip-chain Conditional Random Field for ranking meeting utterances by importance'</article-title>
          ,
          <source>Proceedings of the Conference on Empirical Methods in Natural Language Processing</source>
          , (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>M.J. Kusner</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>N.I.</given-names>
          </string-name>
          <string-name>
            <surname>Kolkin</surname>
            and
            <given-names>K.Q.</given-names>
          </string-name>
          <string-name>
            <surname>Weinberger</surname>
          </string-name>
          . 'From Word Embeddings To Document Distances',
          <source>Proceedings of the 32nd International Conference on International Conference on Machine Learning</source>
          ,
          <volume>37</volume>
          ,
          <fpage>957</fpage>
          -
          <lpage>966</lpage>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>N.</given-names>
            <surname>Elhadad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>McKeown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D</given-names>
            . Kaufman, and D.
            <surname>Jordan</surname>
          </string-name>
          , '
          <article-title>Facilitating physicians' access to information via tailored text summarization'</article-title>
          ,
          <source>Proceedings of the AMIA Annual Symposium</source>
          ,
          <fpage>226</fpage>
          -
          <lpage>300</lpage>
          , (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26. N. Chomsky, Studies on Semantics in Generative Grammar, Mouton, La Haya,
          <year>1972</year>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Eisenstein</surname>
          </string-name>
          .
          <article-title>'A Joint Model of Rhetorical Discourse Structure and Summarization'</article-title>
          ,
          <source>Proceedings of the Workshop on Structured Prediction for NLP</source>
          ,
          <fpage>25</fpage>
          -
          <lpage>34</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          and
          <string-name>
            <surname>I. Gurevych.</surname>
          </string-name>
          '
          <article-title>Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks'</article-title>
          ,
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing</source>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <given-names>P.E.</given-names>
            <surname>Genest</surname>
          </string-name>
          and
          <string-name>
            <surname>G. Lapalme.</surname>
          </string-name>
          '
          <article-title>Framework for Abstractive Summarization using Text-to-Text Generation'</article-title>
          ,
          <source>Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <fpage>64</fpage>
          -
          <lpage>73</lpage>
          , (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30. P. Wu,,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Qiu</surname>
          </string-name>
          and
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          , '
          <article-title>Template Oriented Text Summarization via Knowledge Graph'</article-title>
          ,
          <source>International Conference on Audio, Language and Image Processing (ICALIP)</source>
          ,
          <fpage>79</fpage>
          -
          <lpage>83</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikilov</surname>
          </string-name>
          . '
          <article-title>Distributed Representations of Sentences and Documents'</article-title>
          ,
          <source>Proceedings of the 31st International Conference on Machine Learning</source>
          , (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <given-names>R.</given-names>
            <surname>Kiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.S.</given-names>
            <surname>Zemel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Urtasun</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Fidler</surname>
          </string-name>
          . '
          <string-name>
            <surname>Skip-Thought Vectors</surname>
          </string-name>
          ',
          <source>Proceedings of the 28th International Conference on Neural Information Processing Systems</source>
          ,
          <volume>2</volume>
          ,
          <fpage>3294</fpage>
          -
          <lpage>3302</lpage>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>R.M. Aliguliyev</surname>
          </string-name>
          . '
          <article-title>Clustering techniques and discrete particle swarm optimization algorithm for multi-document summarization'</article-title>
          ,
          <source>Computational Intelligence</source>
          ,
          <volume>26</volume>
          ,
          <fpage>420</fpage>
          -
          <lpage>448</lpage>
          , (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <given-names>S.</given-names>
            <surname>Gerani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mehdad</surname>
          </string-name>
          , G. Carenini,
          <string-name>
            <given-names>R.T.</given-names>
            <surname>Ng</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Neja</surname>
          </string-name>
          . '
          <article-title>Abstractive summarization of product reviews using discourse structure'</article-title>
          ,
          <source>Proceeding of the Conference on Empirical Methods in NLP</source>
          ,
          <volume>1602</volume>
          -
          <fpage>1613</fpage>
          , (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35. S. Gutiérrez, Temas, remas, focos, tópicos y comentarios,
          <source>Arco Libros</source>
          , Madrid,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <given-names>S.</given-names>
            <surname>Harabagiu</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Lacatusu</surname>
          </string-name>
          . '
          <article-title>Topic themes for multi-document summarization'</article-title>
          ,
          <source>Proceedings of the 28th Annual International ACM SIGIR</source>
          , (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Corrado</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          . '
          <article-title>Distributed Representations of Words and Phrases and their Compositionality'</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          ,
          <volume>26</volume>
          , (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <given-names>V.</given-names>
            <surname>McCargar</surname>
          </string-name>
          . '
          <article-title>Statistical approaches to automatic text summarization', Bulletin of the American Society for Information Science</article-title>
          and Technolog,
          <volume>30</volume>
          , (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          . '
          <article-title>Abstractive multi-document summarization with semantic information extraction'</article-title>
          ,
          <source>Proceedings of the Conference on Empirical Methods in NLP</source>
          ,
          <year>1908</year>
          -
          <fpage>1913</fpage>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <given-names>W.S.</given-names>
            <surname>McCulloch</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Pitts</surname>
          </string-name>
          .
          <article-title>'A logical calculus of the ideas immanent in nervous activity'</article-title>
          ,
          <source>Bulletin of Mathematical Biophysics</source>
          ,
          <volume>5</volume>
          ,
          <fpage>115</fpage>
          -
          <lpage>133</lpage>
          , (
          <year>1943</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kikuchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hirao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Takamura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Okumura</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Nagata</surname>
          </string-name>
          . '
          <article-title>Single Document Summarization based on Nested Tree Structure'</article-title>
          ,
          <source>Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <fpage>315</fpage>
          -
          <lpage>320</lpage>
          , (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          42. Y. Liu,
          <string-name>
            <given-names>I.</given-names>
            <surname>Titov</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lapata</surname>
          </string-name>
          . '
          <article-title>Single Document Summarization as Tree Induction'</article-title>
          ,
          <source>Proceedings of NAACL-HLT</source>
          ,
          <fpage>1745</fpage>
          -
          <lpage>1755</lpage>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>