Natural Language Generation in Dialogue Systems for Customer Care
          Mirko Di Lascio♥ , Manuela Sanguinetti♥♦ , Luca Anselma♥ , Dario Mana♣ ,
                    Alessandro Mazzei♥ , Viviana Patti♥ , Rossana Simeoni♣
                ♥
                  Dipartimento di Informatica, Università degli Studi di Torino, Italy
        ♦
          Dipartimento di Matematica e Informatica, Università degli Studi di Cagliari, Italy
                                        ♣
                                          TIM, Torino, Italy
    ♥
        {first.last}@unito.it, ♦ {first.last}@unica.it, ♣ {first.last}@telecomitalia.it


                      Abstract                             user’s utterance: we call this information lin-
                                                           guistic channel (L-channel). However, especially
    English. In this paper we discuss the role             in the customer-care domain, this assumption is
    of natural language generation (NLG) in                only partially true. For instance, in the sentence
    modern dialogue systems (DSs). In partic-              “Scusami ma vorrei sapere come mai mi vengono
    ular, we will study the role that a linguis-           fatti certi addebiti?” (“Excuse me, I’d like to
    tically sound NLG architecture can have                know why I’m charged certain fees?”), even a very
    in a DS. Using real examples from a new                advanced NLU module can produce only a vague
    corpus of dialogue in customer-care do-                information about the user’s request to the Dia-
    main, we will study how the non-linguistic             logueManager. Indeed, in order to provide good
    contextual data can be exploited by using              enough responses, the DialogueManager resorts
    NLG.                                                   to other two sources of information: the domain
                                                           context channel (DC-channel) and the user model
1   Introduction                                           channel (UM-channel). The DC-channel is funda-
                                                           mental to produce the content of the answer, while
In this paper we present the first results of an           the UM-channel is necessary to give also the cor-
ongoing project on the design of a dialogue sys-           rect form.
tem for customer care in the telco field. In
                                                              It is worth noting that both channels, that are
most of the dialogue systems (DSs), the gen-
                                                           often neglected in the design of commercial DSs
eration side of the communication is quite lim-
                                                           for customer-care domain, have central roles in
ited to the use of templates (Van Deemter et
                                                           the design of (linguistically sound) natural lan-
al., 2005). Templates are pre-compiled sen-
                                                           guage generation (NLG) systems (Reiter and Dale,
tences with empty slots that can be filled
                                                           2000). In particular, considering the standard ar-
with appropriate fillers. Most of commercial
                                                           chitecture for data-to-text NLG systems (Reiter,
DSs, following the classical cascade architecture
                                                           2007; Gatt and Krahmer, 2018), the analysis of
N LU nderstanding ↔ DialogueM anager ↔
                                                           the DC-channel exactly corresponds to the con-
N LGeneration (McTear et al., 2016), use ma-
                                                           tent selection task and the UM-channel influences
chine learning-based Natural Language Under-
                                                           both the sentence planning and sentence realiza-
standing (NLU) techniques to identify important
                                                           tion phases. In other words, the central claims
concepts (e.g., intent and entities in (Google,
                                                           of this paper are that in commercial DSs for cus-
2020)) that will be used by the dialogue man-
                                                           tomer care: (1) L-channel is often not informative
ager (i) to update the state of the system and (ii)
                                                           enough and one needs to use the DC-channel and
to produce the next dialogue act (Bobrow et al.,
                                                           the UM-channel for producing a sufficiently good
1977; Traum and Larsson, 2003), possibly filling
                                                           answer, (2) DC-channel and UM-channel can be
the slots in the generation templates.
                                                           exploited by using standard symbolic1 NLG tech-
   This classical, and quite common, informa-
                                                           niques and methods. The remainder of the pa-
tion flow/architecture for dialogue processing has,
                                                           per supports both of these claims while presenting
as a working hypothesis, the assumption that
                                                           our ongoing project on the development of a rule-
most of necessary information is provided by the
                                                              1
     Copyright c 2020 for this paper by its authors. Use       The well-known problem of hallucinations in neural net-
permitted under Creative Commons License Attribution 4.0   works deters their use in real-world NLG (Rohrbach et al.,
International (CC BY 4.0).                                 2018).
based NLG prototype to be used in a customer care                          (a)
domain. Section 2 presents the corpus developed               Vuoi vedere il dettaglio della fattura?
in the first stage of this project, consisting of real        (Do you want to see the invoice details?)
dialogues containing explanation requests in telco            La fattura di gennaio 2020 non è arrivata
customer-care domain. Section 3 presents an NLG
                                                              (I haven’t received the invoice of January 2020 yet)
architecture for managing the L-DC-UM channels
                                                              Ignoring question/feedback - Neutral
that can be adopted in a DS for customer care. Fi-
nally, Section 4 concludes the paper with few re-             Ecco il dettaglio delle voci di spesa presenti
marks on the current state of the project and on              nella fattura InvoiceNumber del mese di gennaio
future work.                                                  per la linea PhoneNumber:InvoiceDetails.
                                                              Vuoi avere il dettaglio di una specifica voce di spesa
2   A Dialogue Corpus for Customer-care                       presente in fattura?
    Domain                                                    (Here is the detail of the items on the invoice InvoiceNumber
                                                              of January for the PhoneNumber:InvoiceDetails.
This study builds upon the analysis of a corpus of
dialogues between customers and a DS for cus-                 Do you want the detail of a specific item in the invoice?)
tomer service developed by an Italian telecom-                Straight wrong response
munications company. The dialogues, which take                Non mi interessa questa fattura, mi serve gennaio 2020
place by means of a textual chat, mainly deal with            (I don’t need this invoice, I need the one of January 2020)
requests for commercial assistance, both on land-             Repetition - Somewhat frustrated
line and mobile phones. For the purpose of this               Vuoi cambiare argomento?
study, the corpus was extracted by selecting, from
                                                              (Do you want to change topic?)
a sample of dialogues held over 24 hours, a re-
                                                              Topic change
duced subset that included requests for explana-
tions from customers. The selection criteria were             Ciao
conceived so as to include all the dialogues where            (Whatever)
at least one message from the user contained a                Non-cooperativity - Somewhat frustrated
clearly stated request for explanation. The kind                        (b)
                                                              Scusami, non ho capito bene la tua richiesta...
of requests identified in this collection basically
reflects the problems typically encountered with              Puoi ripeterla in modo più semplice?
a telecom service provider, such as undue or un-              (I’m sorry, I didn’t get your question...
familiar charges in the bill or in the phone credit           Could you rephrase it in a simpler way?)
(about 52% of the overall number of requests in               No non hai capito niente
this dataset).                                                (No you got it all wrong)
The resulting corpus consists of 142 dialogues,               Non cooperativity - Somewhat angry
with an average of 11 turns per dialogue, and an
                                                              Mi spiace non aver compreso correttamente.
average length of 9 tokens in customer messages
                                                              Ho la possibilità di farti chattare con un operatore.
and 38 tokens in the bot messages. Such dif-
                                                              umano più esperto. Vuoi?
ference in the message length is due to the way
the assistant’s responses are currently structured,           (I’m sorry I haven’t understood that correctly.
in that they usually include detailed information             I can put you in contact with a human representative.
on invoice items or options available, while, on              Is this what you want?)
the other hand, customer’s messages are most of-              Empathy
ten quite concise. Also, the relatively high num-        Figure 1: Excerpts from the annotated dataset.
ber of turns per dialogue might be explained with        Annotation of errors is highlighted in red, that of
the high occurrence in the corpus of repeated or         customer’s emotions in blue and the agent’s empa-
rephrased messages, both by the chatbot and by           thy in orange.
the customer, due to recurring misunderstandings
on both sides.
   As a matter of fact, the presence of such phe-        in this project, led us to the design of an annotation
nomena in the corpus, along with the overall goals       process that involved different dimensions, such
set forth for the development of the NLG module          as errors in conversation and emotions. By er-
ror, in this context, we mean any event that might                 approximately 21% of customers’ errors. On the
have a negative impact on the flow of the inter-                   chatbot side, the most frequent error type is repre-
action, and more generally on its quality, poten-                  sented by those cases in which the agent misinter-
tially resulting in breakdowns (i.e. whenever one                  prets a previous customer’s message and proposes
party leaves the conversation without completing                   to move on to another topic rather than providing
the given task (Martinovsky and Traum, 2003)).                     a proper response (30% of cases). As for the sec-
The error tagset used in this corpus is partially in-              ond annotation dimension, i.e. the one regarding
spired by three of the popular Gricean maxims, i.e.                customers’ emotions, most of the messages have a
those of Quantity, Relation and Manner (Grice,                     neutral tone (about 86% of user turns), but, among
1989) (each one including further sub-types, not                   non-neutral messages, the two main negative emo-
described here), and it has been conceived so as                   tions defined in this scheme, namely anger and
to include error categories that may apply to both                 frustration, are the ones most frequently encoun-
conversation parties. The second dimension, in-                    tered in user messages (both with a frequency of
stead, is meant to include, on the one hand, cus-                  41%), while the cases of messages with a positive
tomers’ emotions (as perceived in their messages),                 emotion constitute less than 1%, and usually trans-
and, on the other hand, the chatbot’s empathic re-                 late into some form of gratitude, appreciation, or
sponses (if any). In particular, as regards cus-                   simple politeness.
tomers’ emotions, besides two generic labels for                      All these dimensions are functional to a fur-
neutral and positive messages, we mostly focused                   ther development of the NLG module, in that they
on negative emotions, especially anger and frus-                   provide, through different perspectives, useful sig-
tration, also introducing for these ones two finer-                nals of how, and at which point in the conversa-
grained labels that define their lower or higher in-               tion, the template response currently used by the
tensity. While a full description of the annotation                chatbot might be improved using the NLG mod-
scheme is beyond the scope of this paper, Figure 1                 ule. Broadly speaking, framing the error taxon-
shows two brief examples of how we applied this                    omy within the Grice’s cooperative principle pro-
scheme to the sample dataset2 . An overview of the                 vides a useful support for the generation module
scheme with a discussion on the main findings and                  to understand, in case an error is reported, how to
annotation issues can be found in Sanguinetti et al.               structure the chatbot response so as to improve the
(2020).                                                            interaction quality in terms of informativeness and
   Due to privacy concerns and the related                         relevance (as also discussed in Section 3).
anonymization issues that may arise (as further
discussed in Section 4), the corpus cannot yet be                  3   Balancing information sources in NLG
publicly released. However, in an attempt to pro-                      for DS
vide a qualitative analysis of the annotated data,
we collected some basic statistics on the distribu-                In this Section, we illustrate a DS architecture that
tion of errors and emotions labeled in this sample                 explicitly accounts for the L-DC-UM information
set. Overall, we report an amount of 326 errors                    channels. In particular, we point out that DC and
(about 21% of the total number of turns) from both                 UM channels can be managed by using standard
parties; among them, the error class that includes                 NLG methods.
violations of the maxim of Relevance is by far the                    A commonly adopted architecture for NLG in
most frequent one (65% of the errors). Such vi-                    data-to-text systems is a pipeline composed of
olations may take different forms, also depending                  four modules: data analyzer, text planner, sentence
on whether they come from the customer or the                      planner and surface realizer (Reiter, 2007; Pauws
chatbot. As regards the customer, errors of such                   et al., 2019). Each module tackles a specific is-
kind typically take place when the user does not                   sue: (1) the data analyzer determines what can
take into account the previous message from the                    be said, i.e. a domain-specific analysis of input
chatbot, thus providing irrelevant responses that                  data; (2) the text planner determines what to say,
do not allow to move forward with the conver-                      i.e. which information will be communicated; (3)
sation and make any progress; these cases cover                    the sentence planner determines how to commu-
   2
                                                                   nicate, with particular attention to the design of
     For further details on the scheme and the definition of all
tags, the annotation guidelines are available in this document:    the features related to the given content and lan-
https://cutt.ly/cdMcnyM                                            guage (e.g. lexical choices, verb tense, etc.); (4)
                  DS

               NLU                        DM                                              NLG
                                                               DC-channel UM-channel              Text
                                                                                                Planning

                              L-channel     Content
                                                                                                Sentence
                                                                                                Planning
                                            Selection
                                                                                               Realization


                                                        USER

             Figure 2: A dialogue system architecture accounting for L-DC-UM channels.


the surface realizer produces the sentences by us-         shown in Figure 2, a more informative answer can
ing the results of the previous modules and consid-        be produced considering the UM-channel and the
ering language-specific constraints as well. Note          DC-channel.
that by definition NLG does not account for lin-              As a working hypothesis, we assume that the
guistic input (that is, L-channel), all the modules        user model consists uniquely in the age of the user.
account for the context of the communication. In           By assuming that the user is 18 years old, we can
other words, data analysis and text planning ex-           say that the DS should use an informal register,
plicitly process the information about the input           i.e. the Italian second person singular (tu) rather
data (the DC-channel), and text planning and sen-          than the more formal third person singular (lei). It
tence planning process the information about the           is worth noting that the current accounting of the
audience (the UM-channel). Moreover, by us-                user model is too simple and there is room for im-
ing the nomenclature defined in (Reiter and Dale,          provement both in the formalization of the model,
2000), the specific task of content selection de-          and in the effect of the user model on the gener-
cides what to say, that is the atomic nucleus of           ated text. Taking into account the classification of
information that will be communicated.                     the user model acquisition given by (Reiter et al.,
   In our project, we adopt a complete NLG archi-          2003), it is interesting to note that the dialogic na-
tecture in the design of the DS (Figure 2). In Fig-        ture of the system allow for the possibility to ex-
ure 2, we show the contributions of the L-DC-UM            plicitly ask users about their knowledge and pref-
channels in the interaction flow. It is worth noting       erences on the specific domain.
that we assigned the content selection task to the            Moreover, we assume that the DC-channel con-
DM module rather than to the text planning of the          sists of all the transactions of the last 7 months,
NLG module. Indeed, the content selection task is          for example: T1, with an amount of 9.99A      C (M1-
crucially the point where all the three information        M7); T2 with an amount of 2A     C (M5-M7, appear-
channels need to be merged in order to decide the          ing twice in M7); and T3 with an amount of 1.59A    C
content of the DS answer to the user question.             (M7) (see Table 1).
   In order to understand the contribution of the
                                                                      M1    M2     M3     M4            M5       M6     M7
three information channels to the final message
construction, we describe below the main steps of              T1    9.99   9.99   9.99   9.99         9.99      9.99   9.99
the module design using the following customer’s               T2      0     0      0      0                 2    2     2, 2
message, retrieved from the corpus, as an exam-                T3      0     0      0      0                 0    0     1.59
ple:
                                                                    Table 1: A possible transactions history.
Scusami ma vorrei sapere come mai mi vengono
fatti alcuni addebiti?. (“Excuse me, I’d like to
                                                              Looking at the data in Table 1, different forms
know why I’m charged certain fees?”)
                                                           of automatic reasoning could be applied in order
Here, the customer requests for an explana-                to evaluate the relevance of each singular trans-
tion about some (unspecified) charges on her/his           action of the user. At this stage of the project,
bill, making the whole message not informative             we aim to adapt the theory of importance-effect
enough. In this case, the DS can deduce from the           from (Biran and McKeown, 2017) to our specific
L-channel only a generic request of information            domain, where the relevant information is in the
on transactions. However, using the architecture           form of relational database entries. The idea is to
consider the time evolution of a specific transac-     (Demberg et al., 2011).
tion category, giving more emphasis to informa-           Finally, we add some closing remarks on the
tion contents that can be classified as exceptional    corpus availability and its anonymization. The
evidences. Informally, we can say that the transac-    publication of a dataset of conversations between
tions T2 and T3 have a more irregular evolution in     customers and a company virtual assistant is a
time with respect to T1, therefore they should be      great opportunity for the company and for its sur-
mentioned with more emphasis in the final mes-         rounding communities of academics, designers,
sage.                                                  and developers. However, it entails a number of
   The current implementation of the DS is based       obstacles to overcome. Rules and laws by regu-
on a trivial NLU (regular-expressions), a sym-         lating bodies must be strictly followed – see, for
bolic sentence planner and realizer (for Italian)      example, the GDPR regulation3 . This means, first
(Anselma and Mazzei, 2018; Mazzei et al., 2016).       of all, including within the to-be-published dataset
By considering all the three L-UM-DC channels,         only those conversations made by customers who
the answer generated by the DS is:                     have given their consent to this type of treatment
Il totale degli addebiti è AC15, 58. Hai pagato       of their data. Moreover, it is mandatory to obscure
A
C4, 00 (2×A  C2, 00) per l’Offerta Base Mobile e       both personal and sensitive customer data. Such
A
C1, 59 per l’Opzione ChiChiama e RiChiama. In-         obfuscation activities are particularly difficult in
fine, hai pagato il rinnovo dell’offerta 20 GB mo-     the world of chatbots, where customers are free to
bile. (“The total charge is A
                            C15.58. You have been      input unrestricted text in the conversations. Reg-
charged A C4.00(2×A C2.00) for the Mobile Base Of-     ular expressions can be used in order to recognize
fer and AC1.59 for the Who’sCalling and CallNow        the pieces of data to be obscured, such as email ad-
options. Finally, you have been charged for the        dresses, telephone numbers, social security num-
renewal of the 20 GB mobile offer.”)                   bers, bank account identifiers, dates of birth, etc.
                                                       More sophisticated techniques needed be adopted
4   Conclusion and Future Work                         to identify and obscure, within the text entered by
                                                       customers, names, surnames, home and work ad-
In this paper we have discussed the main fea-          dresses. Even more complex and open is the prob-
tures of the design of a DS system for telco cus-      lem of anonymizing sensitive customer data. For
tomer care. In particular, we outlined the peculiar-   example, consider the case of a disabled customer
ities of this domain, describing the construction      who reveals his/her sanitary condition to the vir-
of a specifically-designed dialogue corpus and dis-    tual assistant, in order to obtain a legitimate bet-
cussing a possible integration of standard DS and      ter treatment from the company: the text reveal-
NLG architectures in order to manage these pe-         ing the health condition of the customer must be
culiarities. This is an ongoing project and we are     obscured. Other relevant sensitive data include
considering various enhancements: (1) we will in-      racial or ethnic origins, religious or philosophical
tegrate emoji prediction capabilities into the pro-    beliefs, political opinions, etc. Some of these tech-
posed architecture in order to allow the DS to auto-   niques, used for identifying certain types of data
matically attach an appropriate emoji at the end of    to be obscured, have a certain degree of precision
the generated response, relying on previous work       that may even be far, given the current state of the
for Italian (Ronzano et al., 2018); we would also      art, from what a trained human analyst could do.
take into account the current user emotions, while     Therefore, it is also necessary to consider the need
generating an appropriate emoji – it may be the        for the dataset being published to be reviewed and
case that an emoji that is adequate when the con-      edited by specialized personnel before the actual
versation is characterized by a neutral tone, sud-     publication. With this in mind, the techniques
denly becomes inappropriate if the user is frus-       of data recognition mentioned above - regular ex-
trated or angry (Pamungkas, 2019; Cercas Curry         pressions, Named Entity Recognition, etc. - could
and Rieser, 2019); (2) we would like to enhance        also be exploited to develop tools that can speed
the system so as to adapt the generated responses      up the task of completing and verifying the accu-
to other aspects of the users, such as their mental    rate anonymization of the dataset.
models, levels of domain expertise, and personal-
ity traits; (3) we want to evaluate the DS follow-       3
                                                           https://eur-lex.europa.eu/eli/reg/
ing the user-based comparative schema adopted in       2016/679/oj
Acknowledgements                                           Michael McTear, Zoraida Callejas, and David Griol.
                                                             2016. The Conversational Interface: Talking to
The work of Mirko Di Lascio, Alessandro                      Smart Devices. Springer Publishing Company, In-
Mazzei, Manuela Sanguinetti e Viviana Patti has              corporated, 1st edition.
been partially funded by TIM s.p.a. (Studi e               Endang Wahyu Pamungkas. 2019. Emotionally-aware
Ricerche su Sistemi Conversazionali Intelligenti,            chatbots: A survey. CoRR, abs/1906.09774.
CENF CT RIC 19 01).
                                                           Steffen Pauws, Albert Gatt, Emiel Krahmer, and Ehud
                                                              Reiter. 2019. Making effective use of healthcare
                                                              data using data-to-text technology. In Data Science
References                                                    for Healthcare, pages 119–145. Springer.
Luca Anselma and Alessandro Mazzei. 2018. De-
  signing and testing the messages produced by a vir-      Ehud Reiter and Robert Dale. 2000. Building Natural
  tual dietitian. In Proceedings of the 11th Interna-        Language Generation Systems. Cambridge Univer-
  tional Conference on Natural Language Generation,          sity Press, New York, NY, USA.
  Tilburg University, The Netherlands, November 5-8,
                                                           Ehud Reiter, Somayajulu Sripada, and Sandra
  2018, pages 244–253.
                                                             Williams. 2003. Acquiring and using limited user
Or Biran and Kathleen McKeown. 2017. Human-                  models in NLG. In Proceedings of the 9th Euro-
  centric justification of machine learning predictions.     pean Workshop on Natural Language Generation
  In Proceedings of the Twenty-Sixth International           (ENLG-2003) at EACL 2003.
  Joint Conference on Artificial Intelligence, IJCAI-
                                                           Ehud Reiter. 2007. An architecture for data-to-text
  17, pages 1461–1467.
                                                             systems. In Proc. of the 11th European Work-
Daniel G. Bobrow, Ronald M. Kaplan, Martin Kay,              shop on Natural Language Generation, ENLG ’07,
  Donald A. Norman, Henry Thompson, and Terry                pages 97–104, Stroudsburg, PA, USA. Association
  Winograd. 1977. Gus, a frame-driven dialog sys-            for Computational Linguistics.
  tem. Artif. Intell., 8(2):155–173, April.
                                                           Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns,
Amanda Cercas Curry and Verena Rieser. 2019. A               Trevor Darrell, and Kate Saenko. 2018. Object hal-
 crowd-based evaluation of abuse response strate-            lucination in image captioning. In Proceedings of
 gies in conversational agents. In Proceedings of            the 2018 Conference on Empirical Methods in Nat-
 the 20th Annual SIGdial Meeting on Discourse                ural Language Processing, pages 4035–4045, Brus-
 and Dialogue, pages 361–366, Stockholm, Sweden,             sels, Belgium, Nov. Association for Computational
 September. Association for Computational Linguis-           Linguistics.
 tics.
                                                           Francesco Ronzano, Francesco Barbieri, En-
Vera Demberg, Andi Winterboer, and Johanna D.                dang Wahyu Pamungkas, Viviana Patti, and
  Moore. 2011. A strategy for information presenta-          Francesca Chiusaroli. 2018. Overview of the
  tion in spoken dialog systems. Computational Lin-          EVALITA 2018 Italian Emoji Prediction (ITAMoji)
  guistics, 37(3):489–539.                                   Task. In Proceedings of the Sixth Evaluation Cam-
                                                             paign of Natural Language Processing and Speech
Albert Gatt and Emiel Krahmer. 2018. Survey of the           Tools for Italian. Final Workshop (EVALITA 2018),
  state of the art in natural language generation: Core      volume 2263 of CEUR Workshop Proceedings.
  tasks, applications and evaluation. J. Artif. Intell.      CEUR-WS.org.
  Res., 61:65–170.
                                                           Manuela Sanguinetti, Alessandro Mazzei, Viviana
Google.        2020.     Dialogflow documentation.          Patti, Marco Scalerandi, Dario Mana, and Rossana
  https://dialogflow.com. Online; accessed 2020-08-         Simeoni. 2020. Annotating Errors and Emotions
  10 11:24:07 +0200.                                        in Human-Chatbot Interactions in Italian. In Pro-
                                                            ceedings of the 14th Linguistic Annotation Work-
Paul Grice. 1989. Studies in the Way of Words. Har-         shop (LAW@COLING 2020). Association for Com-
  vard University Press, Cambridge, Massachussets.          putational Linguistics.
Bilyana Martinovsky and David Traum. 2003. The             David Traum and Staffan Larsson. 2003. The Informa-
   error is the clue: Breakdown in human-machine in-         tion State Approach to Dialogue Management. In
   teraction. In In Proceedings of the ISCA Workshop         Current and New Directions in Discourse and Dia-
   on Error Handling in Dialogue Systems.                    logue, pages 325–353. Springer.
Alessandro Mazzei, Cristina Battaglino, and Cristina       Kees Van Deemter, Emiel Krahmer, and Mariët The-
  Bosco. 2016. SimpleNLG-IT: adapting Sim-                   une. 2005. Real versus template-based natural lan-
  pleNLG to Italian. In Proceedings of the 9th Inter-        guage generation: A false opposition? Comput. Lin-
  national Natural Language Generation conference,           guist., 31(1):15–24, March.
  pages 184–192, Edinburgh, UK, September 5-8. As-
  sociation for Computational Linguistics.