On the Role of Communicative Structure in Read Aloud
                      Applications for the Elderly
                       Mónica Domínguez                                                                   Alicia Burga
                     University Pompeu Fabra                                                       University Pompeu Fabra
                         Barcelona, Spain                                                              Barcelona, Spain
                    monica.dominguez@upf.edu                                                        alicia.burga@upf.edu

                           Mireia Farrús                                                                   Leo Wanner
                      University Pompeu Fabra                                Catalan Institute for Research and Advanced Studies and
                          Barcelona, Spain                                                   University Pompeu Fabra
                       mireia.farrus@upf.edu                                                      Barcelona, Spain
                                                                                                leo.wanner@upf.edu

ABSTRACT                                                                     used in computational approaches to achieve a more fine-grained
Conversational technologies that assist elderly people need to adapt         communicative interaction adapted to the elderly.
to common disabilities in old age. Visual, hearing and even more                Virtual agents with human interaction capabilities have a large
so cognitive impairments pose serious difficulties for our seniors           potential for the exploration of such user-oriented advanced func-
to handle a standard conversation with a human. Understanding                tionalities. We work with KRISTINA. KRISTINA is a Knowledge-
a virtual agent may be ever harder. In this case, communicative              Based Information Agent with Social Competence and Human
strategies are key to adapt the virtual agent to the needs of elderly        Interaction Capabilities [32]. KRISTINA interacts with the user
users. This paper addresses the role of the communicative struc-             in different scenarios. One of these scenarios consists in reading
ture for expressive speech prosody, which is known to be crucial             the newspaper to elderly people with eyesight impairments. This
for better speech comprehension. It reports on efforts to improve            target audience requires a varied range of expressiveness in the
prosody within a text-to-speech system based on one aspect of the            synthetic voice, which state-of-the-art text-to-speech (TTS) appli-
communicative structure, namely thematicity. The work has been               cations usually lack, especially when processing long monologue
implemented as an application in a social virtual agent, KRISTINA,           discourse.
which reads aloud news articles upon request for elderly users in               This paper discusses the role of the Information (or Communica-
German.                                                                      tive) Structure–prosody interface for reading aloud applications,
                                                                             and scratches the surface of the theoretical framework behind this
CCS CONCEPTS                                                                 interface. The discussion is based upon the authors’ implementation
                                                                             of a thematicity-based prosody module that enriches raw texts ex-
• Social and professional topics → Seniors;
                                                                             tracted from news with communicative information with the goal to
                                                                             achieve a more expressive reading for targeted elderly users.1 The
KEYWORDS
                                                                             aim is to analyze syntactic and Information Structure, and then use
intelligent conversational agents, geriatric applications, commu-            high-level linguistic features derived from the analysis to generate
nicative structure, thematicity, prosody, text-to-speech, human-             more expressive prosody in the synthesized speech. The proposed
machine interaction                                                          methodology encompasses a modular pipeline consisting of (1) a
                                                                             tokenizer, (2) a syntactic parser, (3) a theme/rheme parser, and (3) an
1   INTRODUCTION                                                             SSML prosody tag converter. The implementation has been tested
In the last decades, conversational interfaces involving text-to-            in an experimental setting for German, using web-retrieved news
speech (TTS) applications have improved expressiveness and over-             articles.
all naturalness to a reasonable extent. Conversational features, such           The rest of the paper is structured as follows. Section 2 intro-
as speech acts, affective states and Information Structure have been         duces the motivation and background of this work. In Section 3,
instrumental to derive more expressive prosodic contours. However,           we dive into the theoretical grounds that support the proposed
synthetic speech is still perceived as monotonous, when a text that          computational model from a linguistic perspective. Then, Section 4
lacks those conversational features is read aloud in the interface,          sketches how this model has been implemented within the context
i.e., when it is fed directly to the TTS application. If users of the        of KRISTINA. Finally, conclusions are drawn in Section 5.
conversational interface furthermore have some impairments, as it
is usually the case with elderly people using assisting technologies,         2    MOTIVATION AND BACKGROUND
it is paramount to adapt the conversational agent’s speech to guar-          The way information is formally packaged in a sentence, known
antee the communication flow of the interaction, and thus improve            as “Information Structure”, has been a fruitful field of research
the acceptance of the agent by the user. This adaptation requires            in linguistic studies to better understand how communication is
advanced functionalities that usually involve several areas of ex-
pertise. In this paper, we present how theoretical linguistics can be         1 Such an application may also be handy for other users, not only elderly.


                                                                        40
produced and perceived. Information Structure is a wide term and                             [21] distinguishes different levels of representation. These levels
its study usually involves various linguistic dimensions in connec-                          are sequentially mapped from an unordered semantic representa-
tion with how content is packaged, hence its interfaces at least                             tion (SemR) through a dependency tree structure of the Syntactic
semantics, syntax and prosody.                                                               Representation (SyntR) and linearized chain of lexemes onto the
    Different linguistic schools have long stated that Information                           Morphological Representation (MorphR) to get to the ordered string
Structure, and, in particular, the dichotomy referred to as theme–                           of phonemes at the Phonetic Representation (PhonR). Starting from
rheme [17], given–new [28], or topic–focus [16] is related to intona-                        SyntR and until PhonR, there is a subdivision into deep and surface
tion.2 Moreover, prosody structure on the grounds of thematicity                             representations.
partitions plays a key role in the understanding of a message [8].                              The SemR includes four structures: (1) the Semantic Structure
Empirical studies in different languages provide evidence that when                          (SemS), which is a predicate-argument (meaning) structure of the
thematicity and prosody are appropriately put together, comprehen-                           message; (2) the Semantic Communicative Structure (SemCommS),
sion of the message is positively affected (cf., e.g., [24] for German                       which consists of a representation of the communicative intention
and [31] for Catalan). Several works also show that a correlation                            of the speaker; (3) the Rhetorical Structure (RhetS), which encodes
between thematicity and beat gestures, which are an important                                the artistic intentions and stylistic decisions of the speaker (irony,
non-verbal “prosodic” means to mark rythm and to “accentuate                                 humorous, etc.); and (4) the Referential Structure (RefS), which
speech” [5], improves discourse recall and comprehension [18, 20].                           specifies real-world referents for semantic configurations. The Sem-
Therefore, there is reason to assume that a conversational appli-                            CommS superimposes on the SemS the communicative properties
cation considering the notions of content packaging by means of                              of the meaning of the sentence to be synthesized rather than the
the relation between thematicity and prosody will benefit from the                           communicative properties of the sentence itself.3 Consequently,
same advantages as in natural conversation environments. Most                                the functions of SemCommS are:
of all, conversational avatars in applications for children in edu-                               • organizing initial meaning into a message;
cational settings [26], applications for those with special needs                                 • ensuring coherence of the text of which the sentence under
[23] as well as for the elderly [25, 33] and, in particular, for those                              synthesis is supposed to be a part;
with cognitive impairments [34], would greatly benefit from such                                  • reducing periphrastic potential of the initial SemS, specifying
a communicatively-oriented improvement.                                                             more precisely the meaning.
    On the other hand, expressive speech that uses a varied range of
                                                                                                In other words, the same abstract Semantic Structure can be
prosodic cues (variation in fundamental frequency, speech rate and
                                                                                             shared by a given set of sentences, and it is by means of the Sem-
intensity) is often regarded as more understandable and commu-
                                                                                             CommS that these sentences are distinguished at subsequent levels
nicative. However, previous attempts to implement the concepts
                                                                                             (namely, SyntR, MorphR and PhonR). Figure 1 sketches the common
of Information Structure in text-to-speech (TTS) applications are
                                                                                             SemS of sentences from (1a) to (1d) taken from [22].
rather scarce [19, 27]. Moreover, it is usually a simple binary theme–
rheme structure what is being tested in short sentences. A more                                (1a) John met the doctor at the airport.
fine-grained analysis of thematicity structure, as defined by Mel’čuk                          (1b) The doctor was met at the airport by John.
[22] has been proved to yield better results to predict a wider vari-                          (1c) The airport was where John met the doctor.
ety of prosodic contours, which are furthermore perceived as more                              (1d) It was John who met the doctor at the airport.
natural when implemented in a TTS application; see, e.g. [11, 15].

3     COMMUNICATIVE STRUCTURE
Despite the great efforts along the years for defining communicative
notions, studies on Information Structure have remained within
the field of theoretical linguistics. These studies sometimes explore
different linguistic phenomena in relation to Information Structure
(e.g., discourse, dialog, anaphora, and co-reference). The Commu-
nicative Structure within the Meaning-Text Theory (MTT) comes
to cope with some of the limitations other theories on Information                               Figure 1: Shared SemS of examples (1a–1d) from [22].
Structure have, as this representation is devised in the context of a
theoretical production-oriented linguistic model, which is described                             The Deep Syntactic Structure (DSyntS), which may already re-
in what follows.                                                                             flect some of the SemCommS features, is the central component of
                                                                                             the Deep-Syntactic Representation (DSyntR).4 Consider, for illus-
3.1     A Theoretical Framework for                                                          tration, the DSyntS’s of sentences (1a) (Figure 2) and (1d) (Figure
        Computational Linguistics                                                            3). They show how SemCommS determines the different resulting
The Meaning-Text Theory proposes a framework for language anal-                              3 In general linguistics, the term ‘communicative’ is usually linked to the idea of
ysis and generation suitable for Natural Language Processing (NLP)                           ‘communicative competence’ and refers to concepts related to the study of pragmatics;
applications [4]. In particular, the Meaning-Text Theory Model                               see the definition of ‘linguistic competence’ and ‘performance’ by Chomsky [7].
                                                                                             4 Apart from DSyntS, DSyntR includes, in its turn, three further components: Deep-
2 In our work, we use the first denotation, i.e., theme–rheme or thematicity. ‘Theme’        Syntactic Communicative Structure, Deep-Syntactic Anaphoric Structure and Deep-
marks what a sentence is about, and ‘rheme’ what is said about the theme.                    Syntactic Prosodic Structure (which represents semantically conditioned prosodies).


                                                                                        41
dependency trees. The communicative subject (Theme) may coin-
cide or not with the semantic subject (Actor) and syntactic subject
(Synt-Subject), as represented in Table 1. This underlines the idea
that CommS is a distinct dimension.

Table 1: Communicative, semantic and syntactic subjects in
examples (1a) and (1d) from [22].
                                                                                        Figure 3: DSyntS from example (1d) taken from [22].
         (1a)     John           met       the doctor       at the airport
         SemS     Actor
         SyntS    Synt-Subject
         CommS    Theme
                                                                                  3.2     Thematicity
         (1d)     The doctor     was met   at the airport   by John               In contrast to Information Structure models that propose a partition
         SemS                                               Actor                 of sentences into a theme and a rheme, Mel’čuk [22] argues in the
         SyntS
         CommS
                  Synt-Subject
                  Theme
                                                                                  context of the Meaning–Text Theory for a tripartite hierarchical
                                                                                  division (‘theme’, ‘rheme’, and ‘specifier’ –the element which sets
                                                                                  the utterance’s context) within propositions that further permits
   In a nutshell, CommS is part of the SemR and DSyntR of indi-                   embeddedness of communicative spans; consider (1) for illustra-
vidual sentences. The communicative organization of text is not                   tion of hierarchical thematicity (annotated following the guidelines
covered by CommS, it rather accounts for the structure of the so-                 established in [2]) of the sentence Ever since, the remaining mem-
called propositional content. Going back to example (1) taken from                bers have been desperate for the United States to rejoin this dreadful
[22], the set of sentences may seem fully synonymous, but only (1a)               group. A total of five partitions are identified, including three spans
is an appropriate reply to D1, whereas (1d) better suits D2:                      at level 1, a specifier (SP1), theme (T1) and rheme (R1), and two
   D1 - Nobody saw the doctor last night?                                         embedded spans at level 2 in the rheme, a theme (T1(R1)) and a
        - John met him at the airport.                                            rheme (R1(R1)).5
   D2 - Ask John.                                                                     (1) [Ever since,]SP1 [the remaining members]T1 [have been des-
        - Why John?                                                                       perate [for the United States]T1(R1) [to rejoin this dreadful
        - It was John who met the doctor at the airport.                                  group.]R1(R1)]R1
                                                                                     A hierarchical thematicity structure of this kind has been shown
                                                                                  to correlate better with ToBI [1, 29] labels than binary flat thematic-
                                                                                  ity [10, 11]. Such a correlation still does not solve the problem of a
                                                                                  one–to-one mapping between a specific intonation label (e.g., H*)
                                                                                  to a static acoustic parameter (e.g., an increase of 50% in funda-
                                                                                  mental frequency). This is one of the reasons why we propose an
                                                                                  implementation using a more varied range of automatically derived
                                                                                  prosodic cues based on hierarchical thematicity spans, as described
       Figure 2: DSyntS from example (1a) from [22].                              in what follows.

    CommS is composed of eight distinct dimensions: ‘thematicity’,                4     AUTOMATIC GENERATION OF
‘givenness’, ‘focalization’, ‘perspective’, ‘emphasis’, ‘presupposed-                   THEMATICITY-BASED PROSODY IN
ness’, ‘unitariness’ and ‘locutionality’. As CommS characterizes
                                                                                        KRISTINA
the meaning of the sentence and the sentence itself, it is, conse-
quently, modeled at the semantic level, to be propagated then to the              In the use case of KRISTINA as social companion for the elderly, the
deep-syntactic and surface-syntactic levels of the linguistic descrip-            scenario of reading the newspaper involves a dialogue interaction
tion. Note that givenness, which is often treated as synonymous                   between the user (U) and KRISTINA (K). U requests K to read the
to thematicity, is in Mel’čuk’s communicative structure theory a                  newspaper and K prompts U to pick up a piece of news. Upon
distinct dimension from thematicity. According to Mel’čuk [22],                   reading of the title, the system retrieves the selected text, which
the thematicity of the initial SemS has to do with psychologically                is sent to the pipeline sketched in Figure 4. The pipeline tests the
motivated choices of the speaker, who decides that he/she wants                   formal representation of the Communicative Structure, in particular
to communicate some specific information (i.e., the rheme) con-                   of thematicity, proposed by Mel’čuk [22]. In the context of the
cerning some specific item (i.e., the theme), and thereby makes the               conversational agent KRISTINA, text coming from a web-retrieved
addressee follow him. In Mel’čuk’s words: “The Sem-Thematicity                    service is processed in the pipeline before it arrives to the TTS
is thus a SPEAKER-ORIENTED Comm-category.”                                        engine.
    In the following section, we sketch Mel’čuk’s definition of the-                 The proposed pipeline in Figure 4 includes four modules:
maticity, which is the dimension considered in previous work when                 5 As more than one thematicity span may exist within the same proposition, abbrevia-
the correspondence of the Information Structure with prosody is                   tions include a number (e.g., ‘SP1’) that indicates the number of occurrences at each
discussed.                                                                        level (e.g., ‘SP2’ would be the second specifier in a specific thematicity level).


                                                                             42
                                                                                         derives thematicity labels is introduced; and (iii) a platform for
                                                                                         prosody testing in TTS applications is demonstrated. Evaluation
                                                                                         shows that the thematicity-based prosody enrichment is perceived
                                                                                         as more expressive than the default TTS output. Expressiveness
                                                                                         was assessed by means of a perception test using a Mean Opinion
                                                                                         Score (MOS) with a 5-point Likert scale (LS): 1-bad, 2-poor, 3-fair,
                                                                                         4-good, and 5-excellent. Average results for the tested sentences
                                                                                         proved that the automatic prosody modifications (LS = 3.30) achieve
                                                                                         statistical significance at p <0.05 compared to the default score (LS
                                                                                         = 3.01). All in all, this study pivots the transition from theoretical
                                                                                         work on the IS–prosody interface to the integration of thematicity-
                                                                                         based prosody enrichment to achieve more expressive synthesized
            Figure 4: Communicative generation pipeline.                                 speech. Future work is aimed at exploring other dimensions of com-
                                                                                         municative structure like emphasis and foregroundedness within
                                                                                         the framework that has been discussed.
    (1) Tokenizer: Splits the text into sentences and words. Punc-                          Research carried out so far in this direction [9, 15] is a proof of
        tuation marks are also tokenized as the syntactic parser                         concept of the applicability of the Information Structure–prosody
        requires that.                                                                   interface in speech synthesis, but there are many issues that re-
    (2) Syntactic parser: An off-the-shelf parser [3], which is trained                  main unexplored. For now, only thematicity at the sentence level
        on the TIGER Penn Treebank [6] and which outputs a fourteen-                     has been tested. Other dimensions of the communicative struc-
        columned CoNLL file.6                                                            ture (like givenness and focus, as defined by Mel’čuk [22]) may
    (3) Communicative parser: Derives using rules hierarchical                           also have a strong correspondence with prosody. Corpora need to
        thematicity labels from syntactic structure. It outputs a CoNLL                  be compiled in order to continue looking into this field from an
        file with an added column for communicative structure (i.e.,                     empirical perspective; see e.g. [14]. With respect to prosody, an
        the output CoNLL has fifteen columns).                                           implementation with SSML tags does not suffice to address the re-
    (4) SSML prosody converter: Converts the thematicity spans                           quirements for prosody modeling in a pre-processing stage for TTS
        derived by the communicative parser to SSML spans and                            applications. Therefore, closer insights into how to model prosody
        assigns a variety of prosody tags to each span. This module                      to reflect better the communicative structure of a text also need to
        is based on the tool presented in [13].                                          be investigated.
   The correspondence between hierarchical thematicity and prosody                          Given the relevant role of the Information Structure–prosody
is presented in terms of variations of referent SSML7 [30] prosody                       interface in human communication, it seems reasonable that next
tag values involving fundamental frequency (F0), speech rate (SR)                        generation conversational agents face new challenges in adopting
and insertion of breaks.                                                                 communicatively-oriented models. In this paper, we have intro-
                                                                                         duced some basic concepts on the theoretical framework behind
5     CONCLUSIONS                                                                        an implementation of a hierarchical thematicity model as well as
Theoretical studies on the Information Structure-prosody interface                       an overview of the research carried out so far in this area in its
have stated for some time that there is a correspondence between                         correspondence to prosody.
how the linguistic content is structured communicatively and how
intonation is used in human speech to convey that content. In
previous work, this correspondence (in particular, the relation-
ship between hierarchical thematicity and prosodic variation) has
been brought to the foreground from an empirical perspective in
the context of expressive speech generation. Corpus-based experi-
ments and data-driven implementations [12–15] supported initial
expectations on the potential of the Information Structure–prosody
interface applied to speech technologies. The use of this potential is
an initial step ahead in communicative approaches for prosody gen-
eration within TTS/CTS applications that is one of the key aspects
for a next generation of more expressive conversational virtual
agents.
   The implementation described above contributes in several as-
pects to the state of the art: (i) a formal description of hierarchical
thematicity is used; (ii) a communicative parser that automatically
6 Details     about      the     CoNLL       format     are      provided      in
http://universaldependencies.org/docs/format.html
7 SSML stands for Speech Synthesis Markup Language: details about this convention
can be found in https://www.w3.org/TR/speech-synthesis11/


                                                                                    43
REFERENCES                                                                                             PA, USA, 1–9.
 [1] M. E. Beckman, J. B. Hirschberg, and S. Shattuck-Hufnagel. 2004. The Original                [25] A. Ortiz, M. del Puy Carretero, D. Oyarzun, J. J. Yanguas, C. Buiza, M. F. Gonzalez,
     ToBI System and the Evolution of the ToBI Framework. In Prosodic Models and                       and I. Etxeberria. 2007. Elderly Users in Ambient Intelligence: Does an Avatar
     Transcription: Towards Prosodic Typology, S.A. Jun (Ed.). Oxford University Press,                Improve the Interaction? Springer Berlin Heidelberg, Berlin, Heidelberg, 99–114.
     9–54.                                                                                        [26] D. Pérez-Marín and I. Pascual-Nieto. 2013. An exploratory study on how chil-
 [2] B. Bohnet, A. Burga, and L. Wanner. 2013. Towards the Annotation of Penn                          dren interact with pedagogic conversational agents. Behaviour & Information
     TreeBank with Information Structure. In Proceedings of the Sixth International                    Technology 32, 9 (2013), 955–964.
     Joint Conference on Natural Language Processing. Nagoya, Japan, 1250–1256.                   [27] M. Schröder and J. Trouvain. 2003. The German Text-to-Speech Synthesis System
 [3] B. Bohnet and J. Nivre. 2012. A Transition-Based System for Joint Part-of-Speech                  MARY: A Tool for Research, Development and Teaching. International Journal of
     Tagging and Labeled Non-Projective Dependency Parsing. In Proceedings of the                      Speech Technology 6, 4 (2003), 365–377. https://doi.org/10.1023/A:1025708916924
     2012 Joint Conference on Empirical Methods in Natural Language Processing and                [28] R. Schwarzschild. 1999. GIVENness, AvoidF and Other Constraints on the Place-
     Computational Natural Language Learning (EMNLP-CoNLL ’12). Jeju Island, Korea,                    ment of Accent. Natural Language Semantics 7, 1 (1999), 141–177.
     1455–1465.                                                                                   [29] K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J.
 [4] B. Bohnet and L. Wanner. 2010. Open Source Graph Transducer Interpreter and                       Pierrehumbert, and J. Hirschberg. 2010. ToBI: A standard for labeling English
     Grammar Development Environment. In Proceedings of the Seventh Conference                         prosody. In Proceedings of Interspeech. Makuhari, Japan, 146–149.
     on International Language Resources and Evaluation (LREC). European Language                 [30] P. Taylor and A. Isard. 1997. SSML: A Speech Synthesis Markup Language. Speech
     Resources Association (ELRA), Valletta, Malta.                                                    Communication 21, 1-2 (February 1997), 123–133.
 [5] E. Bozkurt, Y. Yemez, and E. Erzin. 2016. Multimodal analysis of speech and arm              [31] M. Vanrell, I Mascaró, F. Torres-Tamarit, and P. Prieto. 2013. Intonation as an
     motion for prosody-driven synthesis of beat gestures. Speech Communication 85                     Encoder of Speaker Certainty: Information and Confirmation Yes-No Questions
     (12 2016), 29–42. https://doi.org/10.1016/J.SPECOM.2016.10.004                                    in Catalan. Language and Speech 56, 2 (2013), 163–190. https://doi.org/10.1177/
 [6] S. Brants, S. Dipper, P. Eisenberg, S. Hansen, E König, W. Lezius, C. Rohrer, G.                  0023830912443942
     Smith, and H. Uszkoreit. 2004. TIGER: Linguistic Interpretation of a German                  [32] L. Wanner, E. André, J. Blat, S. Dasiopoulou, M. Farrús, T. Fraga, E. Kamateri, F.
     Corpus. Journal of Language and Computation 2 (2004), 597–620.                                    Lingenfelser, G. Llorach, O. Martínez, G. Meditskos, S. Mille, W. Minker, L. Pragst,
 [7] N. Chomsky. 1965. Aspects of the Theory of Syntax. The MIT Press, Cambridge.                      D. Schiller, A. Stam, L. Stellingwerff, F. Sukno, B. Vieru, and S. Vrochidis. 2017.
 [8] H. H. Clark and S. E. Haviland. 1977. Comprehension and the given-new contract.                   KRISTINA: A Knowledge-Based Virtual Conversation Agent. In Proceedings of the
     Discourse production and comprehension. Discourse processes: Advances in research                 15th International Conference on Practical Applications of Agents and Multi-Agent
     and theory 1 (1977), 1–40.                                                                        Systems (PAAMS). Oporto, Portugal.
 [9] M. Domínguez. 2017. The Information Structure–Prosody Interface: On the Role of              [33] L. Wanner, J. Blat, S. Dasiopoulou, M. Domínguez, G. Llorach, S. Mille, F. Sukno,
     Hierarchical Thematicity in an Empirically-grounded Model. Ph.D. Dissertation.                    E. Kamateri, S. Vrochidis, I. Kompatsiaris, et al. 2016. Towards a multimedia
     Universitat Pompeu Fabra.                                                                         knowledge-based agent with social competence and human interaction capabili-
[10] M. Domínguez, M. Farrús, A. Burga, and L. Wanner. 2014. The Information Struc-                    ties. In Proceedings of the 1st International Workshop on Multimedia Analysis and
     ture - Prosody Language Interface Revisited. In Proceedings of the 7th International              Retrieval for Multimodal Interaction. ACM Digital Library, 21–26.
     Conference on Speech Prosody. Dublin, Ireland, 539–543.                                      [34] P. Wargnier, G. Carletti, Y. Laurent-Corniquet, S. Benveniste, P. Jouvelot, and
[11] M. Domínguez, M. Farrús, A. Burga, and L. Wanner. 2016. Using hierarchical                        A. S. Rigaud. 2016. Field evaluation with cognitively-impaired older adults of
     information structure for prosody prediction in content-to-speech applications.                   attention management in the Embodied Conversational Agent Louise. In 2016
     In Proceedings of the 8th International Conference on Speech Prosody. Boston, USA,                IEEE International Conference on Serious Games and Applications for Health, SeGAH
     1019–1023.                                                                                        2016, Orlando, FL, USA, May 11-13, 2016. 1–8.
[12] M. Domínguez, M. Farrús, and L. Wanner. 2016. Combining acoustic and lin-
     guistic features in phrase-oriented prosody prediction. In Proceedings of the 8th
     International Conference on Speech Prosody. Boston, USA, 796–800.
[13] M. Domínguez, M. Farrús, and L. Wanner. 2017. A Thematicity-based Prosody
     Enrichment Tool for CTS. In Proceedings of the 18th Annual Conference of the
     International Speech Communication Association (INTERSPEECH 2017). Stockholm,
     Sweden, 3421–2.
[14] M. Domínguez, M. Farrús, and L. Wanner. 2018. Compilation of Corpora to Study
     the Information StructureâĂŞProsody Interface. In 11th edition of the Language
     Resources and Evaluation Conference (LREC2018). Mijazaki, Japan.
[15] M. Domínguez, M. Farrús, and L. Wanner. 2018. Thematicity-based Prosody
     Enrichment for Text-to-Speech Applications. In 9th International Conference on
     Speech Prosody 2018 (SP2018). Poznan, Poland.
[16] E Hajiĉova, B Partee, and P Sgall. 1998. Topic-Focus Articulation, Tripartite Struc-
     tures, and Semantic Content. Kluwer Academic Publishers, Dordrecht.
[17] M.A.K. Halliday. 1967. Notes on Transitivity and Theme in English, Parts 1-3.
     Journal of Linguistics 3, 1 (1967), 37–81.
[18] Alfonso Igualada, Núria Estebe-Gibert, and Pilar Prieto. 2017. Beat gestures
     improve word recall in 3- to 5-year-old children. Journal of Experimental Child
     Psychology 156 (2017), 99–112.
[19] Frank Kügler, Bernadett Smolibocki, and Manfred Stede. 2012. Evaluation of
     Information Structure in Speech Synthesis : The Case of Product Recommender
     Systems Perception. In ITG Conference on Speech Communication, IEEE. 26–29.
[20] J Llanes-Coromina, I Vilà-Giménez, O Kushch, J. Borràs-Comes, and P. Prieto.
     2018. Beat gestures help preschoolers recall and comprehend discourse informa-
     tion. Journal of Experimental Child Psychology 172 (2018), 168–188.
[21] I. A. Mel’čuk. 1988. Dependency Syntax: Theory and Practice. SUNY Press, Albany,
     NY. 400 pages.
[22] I. A. Mel’čuk. 2001. Communicative Organization in Natural Language: The
     semantic-communicative structure of sentences. Benjamins, Amsterdam, Philade-
     phia. 393 pages.
[23] B. Mencía-López, D. Pardo, A. Trapote-Hernández, and L. A. Gómez-Hernández.
     2013. Embodied Conversational Agents in Interactive Applications for Children
     with Special Educational Needs. In Technologies for Inclusive Education: Beyond
     Traditional Integration Approaches, David Griol Barres, Zoraida Callejas Carrión,
     and Ramón López-Cózar Delgado (Eds.). IGI Global, Hershey, USA, 59–88.
[24] D. Meurers, R. Ziai, N. Ott, and J. Kopp. 2011. Evaluating Answers to Reading
     Comprehension Questions in Context: Results for German and the Role of In-
     formation Structure. In Proceedings of the TextInfer 2011 Workshop on Textual
     Entailment (TIWTE ’11). Association for Computational Linguistics, Stroudsburg,


                                                                                             44