<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multifunctional ISO standard Dialogue Act tagging in Italian</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriel Roccabruna</string-name>
          <email>gabriel.roccabruna@studenti.unitn.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandra Cervone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Riccardi</string-name>
          <email>giuseppe.riccardi@unitn.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Amazon Alexa AI</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Signals and Interactive Systems Lab, University of Trento</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2009</year>
      </pub-date>
      <fpage>34</fpage>
      <lpage>41</lpage>
      <abstract>
        <p>English. The task of Dialogue Act (DA) tagging, a crucial component in many conversational agents, is often addressed assuming a single DA per speaker turn in the conversation. However, speakers' turns are often multifunctional, that is they can contain more than one DA (i.e. “I'm Alex. Have we met before?” contains a 'statement', followed by a 'question'). This work focuses on multifunctional DA tagging in Italian. First, we present iLISTEN2ISO, a novel resource with multifunctional DA annotation in Italian, created by annotating the iLISTEN corpus with the ISO standard. We provide an analysis of the corpus showing the importance of multifunctionality for DA tagging. Additionally, we train DA taggers for Italian on iLISTEN (achieving State of the Art results) and iLISTEN2ISO. Our findings indicate the importance of using a multifunctional approach for DA tagging.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Dialogue Acts (DAs), a linguistically motivated
model of speakers’ intentions in a conversation,
play a crucial role for several conversational AI
tasks. DAs have been successfully used as part
of conversational agents components, for example
for Spoken Language Understanding
        <xref ref-type="bibr" rid="ref17 ref2 ref22 ref7 ref8">(Zhao and
Feng, 2018)</xref>
        or Natural Language Generation, and
for response generation
        <xref ref-type="bibr" rid="ref12">(Hedayatnia et al., 2020)</xref>
        .
Moreover, DAs have been shown to be important
features to learn the intentional structure of
conversations
        <xref ref-type="bibr" rid="ref1 ref16 ref6 ref7">(Allen and Perrault, 1980; Cervone and
Riccardi, 2020; Cervone et al., 2018)</xref>
        .
      </p>
      <p>
        One of the bottlenecks for current research on
DAs is the lack of publicly available resources
with DA annotation. While this is true also for
English, it is even more important for languages with
fewer resources, such as Italian. For Italian, the
only publicly available resource with DA
annotation is currently the iLISTEN corpus
        <xref ref-type="bibr" rid="ref17 ref2 ref22 ref7">(Basile and
Novielli, 2018)</xref>
        , released for EVALITA in 2018.
      </p>
      <p>
        While useful, this resource relies on an
annotation scheme which assumes only one single DA
per conversational turn (see Figure 1). However,
ISO 24617-2
        <xref ref-type="bibr" rid="ref3 ref4">(Bunt et al., 2010; Bunt et al., 2020)</xref>
        ,
the latest accepted standard for DA annotation,
posits that conversational turns can be
multifunctional in a sequential way, i.e. speakers’ turns can
be composed of multiple DAs in sequence
        <xref ref-type="bibr" rid="ref13">(Huang,
2017)</xref>
        .
      </p>
      <p>In this work, we investigate the task of
multifunctional DA tagging in Italian. The
contributions of this paper are: (1) we create
iLISTEN2ISO, to the best of our knowledge the first
publicly available resource with DA annotation in
Italian which uses a multifunctional approach and
is ISO-standard compliant; (2) we present an
analysis of iLISTEN2ISO showing the importance of
multifunctional DA annotation; (3) we propose
baseline DA tagging models for Italian trained on
iLISTEN (achieving, to the best of our knowledge,
SOTA results) and iLISTEN2ISO.1
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        Dialogue act corpora Most publicly available
resources with DA annotation are hardly
compatible, given that each resource is typically tagged
with its own different scheme tailored for a given
domain
        <xref ref-type="bibr" rid="ref5">(Carletta et al., 1997)</xref>
        . This prevents
both meaningful comparisons among different
resources, and the possibility of experimenting with
cross-corpora training of DA taggers. ISO 24617
        <xref ref-type="bibr" rid="ref3">(Bunt et al., 2010)</xref>
        , the latest universally accepted
standard for DA annotation, represents an
attempt to overcome this fragmentation by
providing a domain- and task-independent taxonomy,
useful for both task- and non-task-oriented
dialogue. Compared to previous schemes, the ISO
standard is multifunctional, both from a sequential
perspective (the same turn can contain multiple
DAs in sequence) and from a simultaneous
perspective (a text span can have multiple DA tags at
once). Moreover, the ISO standard is a
hierarchical taxonomy, rather than a flat one, which enables
it to capture similarities among different tags.
Sequential multifunctionality is also present in the
DAMSL schema
        <xref ref-type="bibr" rid="ref11">(Core and Allen, 1997)</xref>
        , although
this definition is not commonly applied to corpora
that adopted DAMSL
        <xref ref-type="bibr" rid="ref9">(Chowdhury et al., 2016)</xref>
        ,
with the consequent possibility of introducing
ambiguities and a lack of precision in understanding
the communicative functions of text spans.
While for English there have recently been
successful attempts to create publicly available
resources mapped to ISO 24617-2
        <xref ref-type="bibr" rid="ref16">(Mezza et al.,
2018)</xref>
        ; datasets mapped to ISO are scarcely
available for other languages, see for example
        <xref ref-type="bibr" rid="ref17">(Ngo
et al., 2018)</xref>
        for Vietnamese and
        <xref ref-type="bibr" rid="ref21">(Yoshino et al.,
2018)</xref>
        for Japanese. For the Italian language, the
only corpus with a subset of dialogues tagged with
ISO in a multifunctional way is LUNA
        <xref ref-type="bibr" rid="ref9">(Chowdhury et al., 2016)</xref>
        , which is currently not publicly
1iLISTEN2ISO and the code of our experiments are
available at: https://github.com/BrownFortress/
Multifunctional-Dialogue-Act-tagging-inItalian.
available.
      </p>
      <p>
        Dialogue Act tagging DA tagging is the task of
assigning a DA tag to a given utterance in a
dialogue. The definition of utterance depends on the
schema used: in some schemes (Dinarelli et al.,
2009), the utterance corresponds to a turn, while
in others
        <xref ref-type="bibr" rid="ref14">(Jurafsky, 1997)</xref>
        to segments of a turn.
DA tagging is usually framed as text classification
        <xref ref-type="bibr" rid="ref15 ref16 ref9">(Lee and Dernoncourt, 2016; Mezza et al., 2018)</xref>
        or as a sequence tagging problem
        <xref ref-type="bibr" rid="ref10 ref19 ref8">(Quarteroni et
al., 2011; Chen et al., 2018; Colombo et al., 2020)</xref>
        .
      </p>
    </sec>
    <sec id="sec-3">
      <title>3 iLISTEN2ISO: Mapping iLISTEN to</title>
    </sec>
    <sec id="sec-4">
      <title>ISO standard</title>
      <p>
        The iLISTEN corpus
        <xref ref-type="bibr" rid="ref17 ref2 ref22 ref7">(Basile and Novielli, 2018)</xref>
        is a dataset of dyadic dialogues about food and
dietary issues in Italian annotated with DAs, used
during the 2018 EVALITA competition for a DA
classification task. The corpus consists of 60
dialogues, with 1576 user turns and 1611 system
turns. Dialogues were collected with a Wizard
of Oz procedure using either written (30) or
spoken (30) interactions. The system side mimics a
diet therapist, asking questions about users diets
or answering to users’ questions. The DA schema
adopted is a refined version of DAMSL
        <xref ref-type="bibr" rid="ref11">(Core and
Allen, 1997)</xref>
        . As reported in Table 1, the number
of DAs in the schema is 15, where 7 are reserved
only to users, 6 only to the system and the
remaining 2 are in common.
      </p>
      <p>In iLISTEN the turn DA annotation is not
multifunctional, i.e. each turn is assigned one single
DA. However, not tackling the turn DAs with a
multifunctional approach could result in loss of
information, with the DA tag capturing only the
most dominant function of a turn. In Figure 1,
for example, tagging the entire turn with one DA
would prevent the system from understanding that
two different questions are asked.</p>
      <p>
        In order to create the iLISTEN2ISO annotation,
each turn from iLISTEN was annotated with a
multifunctional approach following the ISO
standard. This process involved first segmenting turns
into functional units (FUs), defined as minimal
stretches of communicative behaviour that have a
communicative function
        <xref ref-type="bibr" rid="ref3">(Bunt et al., 2010)</xref>
        ; and
then annotating each FU with a DA tag. The
subset of ISO schema used for mapping iLISTEN to
ISO was build incrementally, since an a-priori
definition was impossible due to the fact that many
communicative functions were hidden by the lack
of segmentation. This annotation process involved
user and system turns, since system turns are used
as context in the prediction phase. The annotation
of system turns was done only on unique turns,
given the repetitiveness of system turns (only 430
of 1611 are unique).
      </p>
      <p>Because of the lack of resources, the
segmentation and mapping process of iLISTEN was
conducted by one single annotator, under the
supervision of a second annotator with previous
training in ISO standard annotation. In order to
ensure a reliable annotation process, after the
creation of the guidelines, the second annotator
repeatedly assessed a sample (100 utterances) of the
annotated data. This sample was built through a
stratified random sample, where for each DA tag,
20% of examples of that class was randomly
sampled. This evaluation and reassessment was
performed twice. In the first round, performed after
the first annotation of the data, some issues
regarding the usage of some DAs arose and were
discussed; in the second examination, performed
after the second phase of annotation of the data,
no problem was found.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Analysis of iLISTEN2ISO</title>
      <p>The annotation layer of iLISTEN2ISO does not
change only the legacy iLISTEN schema, but also
the internal structure of turns due to the
segmentation process. On average in iLISTEN2ISO we
have 1.61 FUs per turn, which become 1.81 on
system side and 1.5 on user side if we consider
them separately (this difference is justified by the
fact that the system turns are on average longer
than user turns). In Figure 2, we report for each
legacy DA (user side), the number of segments per
turn on average. Furthermore, inside the bars we
list the 3 most common sequences of ISO DAs to
which each legacy DA is mapped. In Table 1, we
compare the number of DA tags between iLISTEN
and iLISTEN2ISO. We notice that iLISTEN2ISO
has a larger number of DAs in total, compared
to iLISTEN. Additionally, while in iLISTEN the
number of DAs in common (2) is much lower
compared to either user or system DAs, in
iLISTEN2ISO the common DAs between system and
user are larger than the independent ones, with the
advantage of potential better generalization across
the two. Looking at the distribution of the ISO
DAs regarding user turns, it can be noticed that the
four most common DAs are: inform 24.5%,
quesiLISTEN
iLISTEN2ISO
tion 21.3%, answer 15.3% and auto-positive 7%.
Moreover, the DA distribution has a tail composed
of 19 DAs with a frequency below the 5%.
However, this is not a drawback of the scheme since
it gives us a fine-grained representation of the
actions performed by the user. Additionally,
iLISTEN2ISO can be used in conjunction with any
other corpus annotated with ISO standard thus,
giving the possibility of augmenting the samples
for a specific low-represented class.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Models</title>
      <p>
        In this section, we describe the two baseline
models for Dialogue Act (DA) classification used in
our experiments. The first model is a Support
Vector Machine (SVM)
        <xref ref-type="bibr" rid="ref20">(Vapnik, 1995)</xref>
        with
linear kernel, with One versus One strategy. The
features used are: FastText word embeddings,
PartOf-Speech (POS) and dependency parsing tags
(DEP) (retrieved using Spacy), and the previous
DA tag. For word embeddings, the utterance
representation is computed using the average of the
relative word embeddings. The model was
implemented using scikit-learn
        <xref ref-type="bibr" rid="ref18">(Pedregosa et al., 2011)</xref>
        .
Hyperparameters and features selection was
performed using 3 folds cross-validation. The
feature vector that gave the best results for iLISTEN
is the concatenation of word embeddings, POS
tags, DEP tags and the previous DA. For
iLISTEN2ISO, the feature vector that gave the best
performances is the concatenation of word
embeddings and the previous DA.
      </p>
      <p>
        Our second model is a Convolutional Neural
Network (CNN), following
        <xref ref-type="bibr" rid="ref15 ref9">(Lee and Dernoncourt,
2016)</xref>
        . The utterance representation is computed
using a CNN taking as input FastText word
embeddings. This representation is then concatenated
with the previous DA and passed through a linear
and a softmax output layer. We use cross entropy
loss optimized with Adam and early stopping
according to best Macro F1 on a randomly
generated development set (6 dialogues), chosen for the
lowest tags distribution difference compared to the
full dataset. The learning rate is set to 10 3 and
the batch size to 128. The number of filters is 200
and the filters sizes are 1,2,3 and 4 times the word
embedding dimension (300).
6
      </p>
    </sec>
    <sec id="sec-7">
      <title>Experiments</title>
      <p>In this section, we report the results of Dialogue
Act (DA) tagging experiments, using our proposed
baselines on both legacy (using iLISTEN) and
multifunctional ISO standard DA schemes (using
iLISTEN2ISO).</p>
      <p>
        Experimental setup For comparison with
previous work, we follow the competition rules and
report results considering only user DAs, using
official splits. Additionally, we do not assume
gold DAs for the context for testing (which might
not be available at inference time), rather we use
predicted ones. In order to do this we train
a separate model for tagging system DAs used
only during inference. The performances of
system models are: Micro F1 96.1% and Macro F1
96.6% on iLISTEN; Micro F1 97.5% and Macro
F1 96.3% on iLISTEN2ISO. For iLISTEN, the
obtained classification results are compared with
Unitor, the winner of the EVALITA competition
        <xref ref-type="bibr" rid="ref17 ref2 ref22 ref7">(Basile and Novielli, 2018)</xref>
        and to the best of our
knowledge the SOTA on iLISTEN (we could not
perform comparisons for iLISTEN2ISO, as the
code is not publicly available). Given the larger
number of DA tags with few examples in
iLISTEN2ISO, for comparison with the legacy scheme
Dataset
iLISTEN
iLISTEN2ISO
      </p>
      <p>Model
Unitor
SVM
CNN
SVM
CNN
we group the least frequent DA tags to the label
“Other”. The final DA scheme for iLISTEN2ISO
consists of 7 DAs. In iLISTEN the number of
examples in training and testing is 1097 and 479
respectively; in iLISTEN2ISO we have 1609 and
777 respectively.</p>
      <p>Results As shown in Table 2, our proposed
models yield comparable results on both
nonmultifunctional (iLISTEN) and multifunctional
(iLISTEN2ISO) DA tagging. On iLISTEN, our
models even overcome previous SOTA
performances (Unitor) on both Micro and Macro F1. We
observe that while in terms of Micro F1 our
models achieve very similar results on both corpora, in
terms of Macro F1 they perform better on
multifunctional DA tagging.</p>
      <p>Error analysis To better understand the
performance of our models on iLISTEN and
iLISTEN2ISO, we look at the confusion matrices
depicted in Figure 3 and Tables 3 and 4 reporting the
performances computed for each DA.
(a) iLISTEN
(b) iLISTEN2ISO
Considering the CNN performance, looking at
confusion matrices in Figure 3, we notice that on
iLISTEN the worst class is reject where 48.7%
of examples are predicted as statement. This is
probably due to the similar structure of reject
utterances to statement ones, while the discriminant
is the semantic content that model fails to detect.
This problem can be seen also in Table 3, where
the reject DA is predicted with the worst
performances among other tags. An example of
error is given by the following interaction: the
system says “Mangiare ad orari fissi e’ un modo per
evitare di saltare i pasti e di trascurare sostanze che
spesso non vengono compensate nei pasti
successivi.” and the user responds “purtroppo spesso il
lavoro limita la possibilita` di fare una dieta sana
e regolare.”. This user’s turn is tagged with reject
but it is predicted by the model as statement. As it
can be seen, the structure of the user’s turn is
similar to a statement because the user expresses her
or his opinion, in this case regarding the difficulty
to follow an healthy diet.</p>
      <p>Another interesting mismatch in iLISTEN
regards info-request, 11.6% of which are predicted
as statement. This is interesting because the class
info-request is usually composed of questions,
however analyzing heuristically the examples we
notice that some of them contains other tags, such
as answers or statements, which are hidden in the
legacy annotation. In this regard, another potential
source of error is the lack of punctuation as it can
be seen in the utterance “e` necessario fare sport
per mantenersi in forma”. This utterance can be
interpreted as a statement, but if a question mark
is added at the end of the utterance it can be
interpreted as a question. This also highlights the
importance of punctuation or prosodic features in
order to detect the right DA.</p>
      <p>Another problem, that can be identified
looking at the iLISTEN confusion matrix in Figure
3, is that the kind-attitude-smalltalk DA is
confused with many different others DAs. This is
due to lack of segmentation since analysing the
ISO DAs distribution of the turns tagged with this
tag, it emerged there is not a predominant DA. In
fact, the four most common ISO DAs are: inform
21.3%, question 20.9%, thanking 13.5% and
autopositive 10.8%.</p>
      <p>Regarding the iLISTEN2ISO confusion matrix,
it can be seen that request is the most confused
class. Indeed, 48.8% of examples are predicted
as question, 16.3% as other and only 20.9% are
predicted correctly. The reason behind this
performance is that the model fails to distinguish a
request from a question since both of them are in
a question style.</p>
      <p>Another frequently mispredicted DA in
iLISTEN2ISO is answer, often confused with inform.
This is due to the fact that the model has
difficulties in representing and then distinguishing the
semantic content. Moreover, as it can be noticed
in Table 4 this problem is more highlighted in the
CNN’s rather than in SVM’s performances.</p>
      <p>Finally, comparing the iLISTEN2ISO results
presented in Table 4 with iLISTEN results
presented in Table 3, it can be seen that the
question DA is better predicted than info-request. In
this case, only 4.4% of question examples are
confused with inform. The reason of this
improvement is probably the segmentation process
that highlighted the multifunctionality of the
utterances augmenting the specificity of the classes.</p>
      <p>Interestingly, if we compare confusion matrices
for SVM (which we decided not to include in the
paper for lack of space) and CNN, shown in figure
3, we notice that the most confused classes are the
same for both models across both datasets.
7</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusions</title>
      <p>We presented iLISTEN2ISO, a resource for
Italian multifunctional DA tagging using ISO
246172. We argued the importance to consider turns as a
composition of multiple communicative functions,
in order to preserve important semantic
information. Moreover, we presented different baseline
DA tagging models, on both iLISTEN and
iLISTEN2ISO.</p>
      <p>We believe the presented resource could be
useful to the research community for experimenting
with multifunctional DA tagging in Italian, as well
as cross-corpora DA tagging. As future work, we
plan to explore joint DA segmentation and
classification in Italian, for example taking inspiration
from the work presented by Zhao and Kawahara
(2019).</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements</title>
      <p>The research leading to these results has received
funding from the European Union – H2020
Programme under grant agreement 826266:
COADAPT.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>James F Allen and C Raymond</surname>
          </string-name>
          <article-title>Perrault</article-title>
          .
          <year>1980</year>
          .
          <article-title>Analyzing intention in utterances</article-title>
          .
          <source>Artificial intelligence</source>
          ,
          <volume>15</volume>
          (
          <issue>3</issue>
          ):
          <fpage>143</fpage>
          -
          <lpage>178</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nicole</given-names>
            <surname>Novielli</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the evalita 2018 italian speech act labe ling (iliste n) task</article-title>
          .
          <source>EVALITA Evaluation of NLP and Speech Tools for Italian</source>
          ,
          <volume>12</volume>
          :
          <fpage>44</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Harry</given-names>
            <surname>Bunt</surname>
          </string-name>
          , Jan Alexandersson, Jean Carletta, JaeWoong Choe, Alex Chengyu Fang, Koiti Hasida,
          <string-name>
            <given-names>Kiyong</given-names>
            <surname>Lee</surname>
          </string-name>
          , Volha Petukhova,
          <string-name>
            <surname>Andrei</surname>
            <given-names>PopescuBelis</given-names>
          </string-name>
          , Laurent Romary, Claudia Soria, and
          <string-name>
            <given-names>David</given-names>
            <surname>Traum</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Towards an Iso standard for dialogue act annotation</article-title>
          .
          <source>Seventh conference on International Language Resources and Evaluation (LREC'10).</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Harry</given-names>
            <surname>Bunt</surname>
          </string-name>
          , Volha Petukhova, Emer Gilmartin, Catherine Pelachaud, Alex Fang, Simon Keizer, and Laurent Pre´vot.
          <year>2020</year>
          .
          <article-title>The iso standard for dialogue act annotation</article-title>
          .
          <source>In Proceedings of The 12th Language Resources and Evaluation Conference</source>
          , pages
          <fpage>549</fpage>
          -
          <lpage>558</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Jean</given-names>
            <surname>Carletta</surname>
          </string-name>
          , Stephen Isard,
          <string-name>
            <surname>Gwyneth</surname>
            <given-names>DohertySneddon</given-names>
          </string-name>
          , Amy Isard, Jacqueline C Kowtko, and
          <string-name>
            <surname>Anne H Anderson</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>The reliability of a dialogue structure coding scheme</article-title>
          .
          <source>Computational linguistics</source>
          ,
          <volume>23</volume>
          (
          <issue>1</issue>
          ):
          <fpage>13</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Alessandra</given-names>
            <surname>Cervone</surname>
          </string-name>
          and
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Riccardi</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Is this Dialogue Coherent? Learning from Dialogue Acts and Entities</article-title>
          .
          <source>In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue</source>
          , pages
          <fpage>162</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Alessandra</given-names>
            <surname>Cervone</surname>
          </string-name>
          , Evgeny Stepanov, and
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Riccardi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Coherence models for dialogue</article-title>
          .
          <source>In Proc. Interspeech</source>
          , pages
          <fpage>1011</fpage>
          -
          <lpage>1015</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Zheqian</given-names>
            <surname>Chen</surname>
          </string-name>
          , Rongqin Yang,
          <string-name>
            <given-names>Zhou</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Deng</given-names>
            <surname>Cai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Xiaofei</given-names>
            <surname>He</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Dialogue act recognition via crf-attentive structured network</article-title>
          .
          <source>In The 41st International ACM SIGIR Conference on Research &amp; Development in Information Retrieval</source>
          , pages
          <fpage>225</fpage>
          -
          <lpage>234</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Shammur</given-names>
            <surname>Absar</surname>
          </string-name>
          <string-name>
            <given-names>Chowdhury</given-names>
            , Evgeny A.
            <surname>Stepanov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Riccardi</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Transfer of corpusspecific dialogue act annotation to ISO standard: Is it worth it</article-title>
          ? In LREC.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Pierre</given-names>
            <surname>Colombo</surname>
          </string-name>
          , Emile Chapuis, Matteo Manica, Emmanuel Vignon, Giovanna Varni, and
          <string-name>
            <given-names>Chloe</given-names>
            <surname>Clavel</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Guiding attention in sequence-to-sequence models for dialogue act prediction</article-title>
          .
          <source>In AAAI</source>
          , pages
          <fpage>7594</fpage>
          -
          <lpage>7601</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Mark G.</given-names>
            <surname>Core and James F. Allen</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Coding dialogs with the damsl annotation scheme</article-title>
          .
          <source>In Proceedings of AAAI Fall Symposium on Communicative Action in Humans and Machines.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Behnam</given-names>
            <surname>Hedayatnia</surname>
          </string-name>
          , Seokhwan Kim, Yang Liu, Karthik Gopalakrishnan, Mihail Eric, and
          <string-name>
            <surname>Dilek</surname>
          </string-name>
          Hakkani-Tur.
          <year>2020</year>
          .
          <article-title>Policy-driven neural response generation for knowledge-grounded dialogue systems</article-title>
          . arXiv preprint arXiv:
          <year>2005</year>
          .12529.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Yan</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>The Oxford handbook of pragmatics</article-title>
          . Oxford University Press.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Dan</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Switchboard swbd-damsl shallow-discourse-function.</article-title>
          <string-name>
            <surname>Annotation</surname>
          </string-name>
          ,
          <source>Technical Report</source>
          ,
          <fpage>97</fpage>
          -
          <lpage>02</lpage>
          , University of Colorado, CO, USA.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Ji</given-names>
            <surname>Young Lee</surname>
          </string-name>
          and
          <string-name>
            <given-names>Franck</given-names>
            <surname>Dernoncourt</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Sequential short-text classification with recurrent and convolutional neural networks</article-title>
          .
          <source>In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , pages
          <fpage>515</fpage>
          -
          <lpage>520</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Mezza</surname>
          </string-name>
          , Alessandra Cervone, Evgeny Stepanov, Giuliano Tortoreto, and
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Riccardi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational Agents</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Computational Linguistics</source>
          , pages
          <fpage>3539</fpage>
          -
          <lpage>3551</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Thi-Lan</surname>
            <given-names>Ngo</given-names>
          </string-name>
          , Pham Khac Linh, and
          <string-name>
            <given-names>Hideaki</given-names>
            <surname>Takeda</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A vietnamese dialog act corpus based on iso 24617-2 standard</article-title>
          .
          <source>In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Silvia</given-names>
            <surname>Quarteroni</surname>
          </string-name>
          , Alexei V Ivanov,
          <string-name>
            <given-names>and Giuseppe</given-names>
            <surname>Riccardi</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Simultaneous dialog act segmentation and classification from human-human spoken conversations</article-title>
          .
          <source>In Acoustics, Speech and Signal Processing (ICASSP)</source>
          ,
          <year>2011</year>
          IEEE International Conference on, pages
          <fpage>5596</fpage>
          -
          <lpage>5599</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>V.N.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>The Nature of Statistical Learning Theory</article-title>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Koichiro</given-names>
            <surname>Yoshino</surname>
          </string-name>
          , Hiroki Tanaka, Kyoshiro Sugiyama, Makoto Kondo, and
          <string-name>
            <given-names>Satoshi</given-names>
            <surname>Nakamura</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Japanese dialogue corpus of information navigation and attentive listening annotated with extended iso24617-2 dialogue act tags</article-title>
          .
          <source>In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Lin</given-names>
            <surname>Zhao</surname>
          </string-name>
          and
          <string-name>
            <given-names>Zhe</given-names>
            <surname>Feng</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Improving slot filling in spoken language understanding with joint pointer and attention</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)</source>
          , pages
          <fpage>426</fpage>
          -
          <lpage>431</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Tianyu</given-names>
            <surname>Zhao</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tatsuya</given-names>
            <surname>Kawahara</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Joint dialog act segmentation and recognition in human conversations using attention to dialog context</article-title>
          .
          <source>Computer Speech &amp; Language</source>
          ,
          <volume>57</volume>
          :
          <fpage>108</fpage>
          -
          <lpage>127</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>