=Paper= {{Paper |id=Vol-2263/paper007 |storemode=property |title=Overview of the Evalita 2018 itaLIan Speech acT labEliNg (iLISTEN) Task |pdfUrl=https://ceur-ws.org/Vol-2263/paper007.pdf |volume=Vol-2263 |authors=Pierpaolo Basile,Nicole Novielli |dblpUrl=https://dblp.org/rec/conf/evalita/BasileN18 }} ==Overview of the Evalita 2018 itaLIan Speech acT labEliNg (iLISTEN) Task== https://ceur-ws.org/Vol-2263/paper007.pdf
                        Overview of the Evalita 2018
                itaLIan Speech acT labEliNg (iLISTEN) Task
                      Pierpaolo Basile and Nicole Novielli
                     Università degli Studi di Bari Aldo Moro
                           Dipartimento di Informatica
                     Via E. Orabona, 4 - 70125 Bari (ITALY)
             {pierpaolo.basile|nicole.novielli}@uniba.it


                Abstract                           gente all’informazione basati su dialogo.
                                                   Il task ha visto la partecipazione di due
English. We describe the first edition of          team, uno accademico e uno industriale.
the “ itaLIan Speech acT labEliNg” (iLIS-          Nonostante la complessità del task pro-
TEN) task at the EVALITA 2018 cam-                 posto, entrabi i team hanno ampiamente
paign (Caselli et al., 2018). The task con-        superato la baseline.
sists in automatically annotating dialogue
turns with speech act labels, i.e. with the
communicative intention of the speaker,        1   Introduction
such as statement, request for information,    Speech acts have been extensively investigated in
agreement, opinion expression, or general      linguistics (Austin, 1962; Searle, 1969), and com-
answer. The task is justified by the large     putational linguistics (Traum, 2000; Stolcke et al.,
number of applications that could benefit      2000) since long. Specifically, the task of auto-
from automatic speech act annotation of        matic speech act recognition has been addressed
natural language interactions such as tools    leveraging both supervised (Stolcke et al., 2000;
for the intelligent information access, that   Vosoughi and Roy, 2016) and unsupervised ap-
is by relying on natural dialogues. We re-     proaches (Novielli and Strapparava, 2011). This
ceived two runs from two teams, one from       interest is justified by the large number of applica-
academia and the other one from industry.      tions that could benefit from automatic speech act
In spite of the inherent complexity of the     annotation of natural language interactions.
tasks, both systems largely outperformed          In particular, a recent research trend has
the baseline.                                  emerged to investigate methodologies to enable
                                               intelligent access to information, that is by rely-
Italiano. Descriviamo la prima edizione
                                               ing on natural dialogues as interaction metaphor.
del task di “itaLIan Speech acT labEl-
                                               In this perspective, chat-oriented dialogue systems
iNg” (iLISTEN) organizzato nell’ambito
                                               are attracting the increasing attention of both re-
della campagna di valutazione EVALITA
                                               search and practitioners interested in the simula-
2018. Il task consiste nell’annotazione
                                               tion of natural dialogues with embodied conversa-
automatica di turni di dialogo con
                                               tional agents (Klüwer, 2011), conversational inter-
la label di speech act corrispondente.
                                               faces for smart devices (McTear et al., 2016) and
Ciascuna categoria di speech act de-
                                               the Internet of Things (Kar and Haldar, 2016). As
nota l’intenzione comunicativa del par-
                                               a consequence, we are assisting to the flourishing
lante, ossia l’intenzione di formulare
                                               of dedicated research venues on chat-oriented in-
un’affermazione oggettiva, l’espressione
                                               teraction. It is the case of WOCHAT1 , the Special
di un’opinione, la richiesta di infor-
                                               Session on Chatbots and Conversational Agents,
mazioni, una risposta, un’espressione
                                               now at its second edition, as well as the Nat-
di consenso.     Riteniamo che il task
                                               ural Language Generation for Dialogue Systems
sia rilevante per la il dominio della
                                               special session2 , both co-located with the Annual
linguistica computazionale e non solo,
                                                 1
alla luce del recente interesse da parte           http://workshop.colips.org/wochat/
                                               @sigdial2017/
della comunità scentifica nei confronti dei      2
                                                   https://sites.google.com/view/
paradigmi di interazione e accesso intelli-    nlg4ds2017
SIGdial Meeting on Discourse and Dialogue.                  The dialogues were collected using a Wizard
   While not representing any deep understanding         of Oz tool as dialogue manager. Sixty subjects
of the interaction dynamics, speech acts can be          (aged between 21–28) were involved in the study,
successfully employed as a coding standard for           in two interaction mode conditions: thirty of them
natural dialogues tasks. In this report, we describe     interacted with the system in a written-input set-
the first edition of the “itaLIan Speech acT labEl-      ting, using keyboard and mouse; the remaining
iNg” (iLISTEN) task at the EVALITA 2018 cam-             thirty dialogues were collected with users interact-
paign (Caselli et al., 2018). Among the various          ing with the ECA in a spoken-input condition. The
challenges posed by the problem of enabling con-         dialogues collected using the spoken interaction
versational access to information, this shared task      mode were manually transcribed based on audio-
tackles the problem of recognition of the illocu-        recording of the dialogue sessions.
tionary force, i.e. the speech act, of a dialogue           During the interaction, the ECA played the role
turn, that is the communicative goal of the speaker.     of an artificial therapist and the users were free to
   The remainder of the paper is organized as fol-       interact with it in natural language, without any
lows. We start by explaining the task in Sec-            particular constraint: they could simply answer the
tion 2. In Section 3, we provide a detailed de-          question of the agent or taking the initiative and
scription of the dataset of dialogues, the annota-       ask questions in their turn, make comments about
tion schema, and the data format and distribution        the agent behavior or competence, argument in fa-
protocol. Then, we report about the evaluation           vor or against the agent’s suggestion or persua-
methodology (see Section 4) and describe the par-        sion attempts. The Wizard, on his behalf, had to
ticipating systems and their performance (see Sec-       choose among a set of about 80 predefined pos-
tion 5). We provide final remarks in Section 6.          sible system moves. As such, the system moves
                                                         (see Table 2) are provided only as a context in-
2     Task Description                                   formation but are not subject to evaluation and do
The task consists in automatically annotating di-        not contribute to the final ranking of the partici-
alogue turns with speech act labels, i.e. with           pant systems. Conversely, the participating sys-
the communicative intention of the speaker, such         tems are evaluated on the basis of the performance
as statement, request for information, agreement,        observed for the user dialogue turns (see Table 1).
opinion expression, general answer, etc. Table 1         3.2   Annotation Schema
reports the full set of speech act labels used for the
classification task, with definition, examples, and      Speech acts can be identified with the commu-
distribution in our corpus. Regarding the evalua-        nicative goal of a given utterance, i.e. it rep-
tion procedure, we assess the ability of each sys-       resents its meaning at the level of its illocution-
tem to issue the correct speech act label among          ary force (Austin, 1962). In defining dialogue
those included in the taxonomy used for annota-          act taxonomies, researchers have been trying to
tion, described in the Section 3. Please, note that      solve the trade-off between the need for formal
the participating systems are requested to issue la-     semantics and the need for computational feasi-
bels only for the speech act used for labeling the       bility, also taking into account the specificity of
user’s dialogue turns, as futher detailed in the fol-    the many application domains that have been in-
lowing.                                                  vestigated (see (Traum, 2000) for an exhaustive
                                                         overview). The Dialogue Act Markup in Several
3     Development and Test Data                          Layers (DAMSL) represents an attempt by (Core
                                                         and Allen, 1997) to define a domain independent
3.1    A Dataset of Dialogues                            framework for speech act annotation.
We leverage the corpus of natural language dia-             Defining a speech act markup language is out
logues collected in the scope of previous research       of the scope of the present study. Therefore, we
about interaction with Embodied Conversational           adopt the original annotation of the Italian advice-
Agents (ECAs) (Clarizio et al., 2006), in order to       giving dialogues. Table 1 shows the set of nine
speed up the process of building a gold standard.        labels employed for the purpose of this study, with
The corpus contains overall transcripts of 60 di-        definitions and examples. These labels are used
alogues, 1,576 user dialogue turns, 1,611 system         for the annotation of the users’ dialogue turns and
turns and about 22,000 words.                            are the object of classification for this task. In ad-
Table 1: The set of user speech act labels employed in our annotation schema. The participating systems
are required to issue a label for the user moves only.

  Speech Act                Description                                 Example                                        Freq.
  OPENING                   Dialogue opening or self-introduction       ‘Ciao, io sono Antonella’                      2%
  CLOSING                   Dialogue closing, e.g.         farewell,    ‘Va bene, ci vediamo prossimamente’            2%
                            wishes, intention to close the conver-
                            sation
  INFO-REQUEST              Utterances that are pragmatically, se-      ‘E cosa mi dici delle vitamine?’               25%
                            mantically, and syntactically ques-
                            tions
  SOLICIT-REQ-CLARIF        Request for clarification (please ex-       ‘Mmm, si ma in che senso?’                     7%
                            plain) or solicitation of system reac-
                            tion
  STATEMENT                 Descriptive, narrative, personal state-     ‘Penso che dovrei controllare maggior-         33%
                            ments                                       mente il consumo di dolciumi.’
  GENERIC-ANSWER            Generic answer                              ‘Si’, ‘No’, ‘Non so.’                          10%
  AGREE-ACCEPT              Expression of agreement, e.g. accep-        ‘Si, so che è importante.’                     5%
                            tance of a proposal, plan or opinion
  REJECT                    Expression of disagreement, e.g. re-        ‘Ho sentito tesi contrastanti al proposito.’   5%
                            jection of a proposal, plan, or opinion
  KIND-ATT-SMALLTALK        Expression of kind attitude through         ‘Thank you.’, ‘Sei per caso offesa per         11%
                            politeness, e.g. thanking, apologizing      qualcosa che ho detto?’
                            or smalltalk


dition, in Table 1 we report the speech act labels            3.3      Data Format and Distribution
used for the dialogue moves of the system, i.e. the
                                                              We provide both the training and testing dialogues
conversational agent playing the role of the artifi-
                                                              in the XML format following the structure pro-
cial therapist. The speech act taxonomy refines the
                                                              posed in Figure 1. Each participating initially had
DAMSL categories to allow appropriate tagging
                                                              access to the training data only. Later, the unla-
of the communicative intention with respect to the
                                                              beled test data were released during the evaluation
application domain, i.e. persuasion dialogues in
                                                              period. The development and test data set con-
the healthy eating domain.
                                                              tain 40 and 20 dialogues, respectively, equally dis-
                                                              tributed with respect to the interaction mode (text-
   In Table 3 we provide an excerpt from a dia-               vs. speech-based interaction).
logue from our gold standard. The system moves
(dialogue moves and corresponding speech act la-              4     Evaluation
bels) are chosen from a set of predefined dialogue
moves that can be played by the ECA. As such,                 Regarding the evaluation procedure, we assess the
they are not interesting for the evaluation and rank-         ability of each system to issue the correct speech
ing of participating systems and are provided only            act label for the user moves. The speech act label
as contextual information. Conversely, the final              used for annotation of the user moves are reported
ranking of the participating systems is based on              in Table 1.
the performance observed only on the prediction                  Specifically, we compute precision, recall and
of speech acts for the users’ move, with respect              F1-score (macroaveraging) with respect to our
to the set of labels provided in Table 1. Please,             gold standard. This approach, while more verbose
note that the two sets of speech act labels for the           than a simple accuracy test, arise from the need to
user and the system moves, in Table 1 and Table               correctly address the unbalanced distribution of la-
2, respectively, only partially overlap. This is due          bels in the dataset. Furthermore, by providing de-
to the fact that the set of agent’s moves includes            tailed performance metrics, we intend to enhance
also speech acts (such as persuasion attempts) that           interesting discussion on the nature of the problem
are observed only for the agent, given its caregiver          and the data, as they might emerge from the par-
role in the dialogue systems. Vice versa, some                ticipants’ final reports. As a baseline, we use the
speech act labels (such as clarification questions)           most frequent label for the user speech acts (i.e.,
are relevant only for the user moves.                         STATEMENT).
Table 2: The set of system speech act labels in our annotation schema. These labels are provided as
context information, i.e. the participating systems are not required to issue a label for the system moves.

 Speech Act                 Description                                Example                                      Freq.
 OPENING                    Initial self-introduction by the ECA       ‘Ciao, il mio nome è Valentina e sono qui    4%
                                                                       per darti suggerimenti su come miglio-
                                                                       rare la tua dieta.’
 CLOSING                    Dialogue closing, e.g.        farewell,    ‘Grazie e arrivederci!’                      4%
                            wishes, intention to close the conver-
                            sation
 QUESTION                   Question about the user eating habits      ‘...’                                        32%
                            or information interests
 TALK-ABOUT-SELF            Statement describing own abilities,        ‘Non sono in grado di mangiare, e quindi     3%
                            role and skills                            non seguo diete particolari.’
 ANSWER                     Provision of generic information after     ‘Purtroppo sono poco esperta in questo       17%
                            a user request                             campo.’
 OFFER-GIVE-INFO            Generic offer of help or specific infor-   ‘Idealmente un pasto dovrebbe contenere      8%
                            mation about the diet domain.              da 75 a 100 grammi di carne, un frutto
                                                                       di media grandezza, un piccolo piatto di
                                                                       pasta o 50 grammi di gelato. I cibi pronti
                                                                       permettono di controllare le porzioni ri-
                                                                       portando sulla confezione il valore ener-
                                                                       getico in calorie.’
 PERSUASION-SUGGEST         Persuasion attempt performed by the        ‘Lo so: mangiare, e forse vivere, in modo    24%
                            conversational agent about dieting.        disordinato, sembra a molti piacevole,e
                                                                       forse hanno ragione. Ma a lungo andare
                                                                       gli effetti negativi si fanno sentire.’
 ENCOURAGE-SORRY            Statements aimed at enhancing the          ‘Questa e’ una buona idea! ’                 9%
                            user motivation or to apologize.


                       Table 3: An excerpt from one of the dialogues in our dataset.

 Actor        ID     Speech Act              Dialogue Move
 SYSTEM       5-S1   OPENING                 Ciao, il mio nome è Valentina. Sono qui per darti dei suggerimenti su come
                                             migliorare la tua dieta
 USER         5-U1   OPENING                 Ciao, il mio nome è Oriana.
 SYSTEM       5-S2   INFO-REQUEST            Quali sono le tue abitudini alimentari?
 USER         5-U2   STATEMENT               Ho delle abitudini disordinate, mangio anche fuori orario.
 SYSTEM       5-S3   INFO-REQUEST            Ti piace mangiare?
 USER         5-U3   STATEMENT               Si, ma dipende dal periodo: se sono sotto stress mangio meno, quando sono
                                             più rilassata mangio molto
 SYSTEM       5-S4   INFO-REQUEST            Segui una dieta variata?
 USER         5-U3   GENERIC-ANSWER          No.




                                              Figure 1: Data format
Table 4: Overall micro- and macro-averaged Precision, Recall, and F-score for the participating systems
                                         Micro                        Macro
               System           Prec     Rec      F         Prec      Rec       F
               UNITOR.kelp 0.7328 0.7328 0.7328 0.6810 0.6274 0.6531
               X2Check.c2c 0.6848 0.6848 0.6848 0.6076 0.5844 0.5957
               Baseline         0.3403 0.3403 0.3403 0.0378 0.1111 0.0564


                   Table 5: Precision, Recall, and F-score values by speech act labels
                                                   Unitor                       X2Check
         Class                           Prec      Rec       F         Prec     Rec             F
         OPENING                         1.0000 1.0000 1.0000 1.0000 0.7273                     0.8421
         CLOSING                         0.7778 0.7000 0.7368 0.8182 0.9000                     0.8571
         INFO-REQUEST                    0.7750 0.8304 0.8017 0.7355 0.7946                     0.7639
         SOLICITATION-REQ-CLARIF 0.4000 0.3333 0.3636 0.4444 0.3333                             0.3810
         STATEMENT                       0.7500 0.9444 0.8361 0.6667 0.8957                     0.7644
         GENERIC-ANSWER                  0.8571 0.9231 0.8889 0.7581 0.9038                     0.8246
         AGREE-ACCEPT                    0.6471 0.4583 0.5366 0.5714 0.5000                     0.5333
         REJECT                          0.4286 0.0769 0.1304 0.0000 0.0000                     0.0000
         KIND-ATT-SMALLTALK              0.5000 0.3864 0.4359 0.4737 0.2045                     0.2857


5       Participants and Results                             The best performance (0.6531) is provided by
                                                          the UNITOR system. Both systems are able
The task was open to everyone from industry and
                                                          to overcome the baseline also for micro-F. The
academia. Sixteen participants registered, but only
                                                          baseline has a low macro-F since it predicts al-
two teams actually submitted the results for the
                                                          ways the same class (STATEMENT) and for the
evaluation. A short description of each system fol-
                                                          other classes the F-measure is zero. As ex-
lows:
                                                          pected, the micro-F overcomes the macro-F since
UNITOR - The system described in (Croce and               some classes are hard to predict due to the low
   Basili, 2018) is a supervised system which             number of examples in the training data, such
   relies on a Structured Kernel-based Support            as AGREE, SOLICITATION-REQ-CLARIF and
   Vector Machine for making the classification           REJECT. Precision, Recall, and F-score values by
   of the dialogue turns sensitive to the syntac-         speech act labels are showed in Table 5.
   tic and semantic information of each utter-               We also provide the confusion matrix for each
   ance. The Structured Kernel is a Smoothed              system, respectively Table 6 for UNITOR and Ta-
   Partial Tree Kernel (Croce et al., 2011) that          ble 7 for X2Check. We observe that, for both
   exploits both the parse tree and the cosine            systems, the class REJECT is the most difficult
   similarity between the word vectors in a dis-          to classify. This evidence is consistent with the
   tributional semantics model. The authors use           findings from previous research on the same cor-
   the tree parser provided by SpaCy3 and the             pus of dialogues (Novielli and Strapparava, 2011).
   Kelp framework4 for SVM.                               In particular, we observe that dialogue moves be-
                                                          longing to the REJECT class are often misclassi-
X2Check - The team did not submit the report.
                                                          fied as STATEMENT. More in general, the main
   The performance of the participating systems is        cause of error is the misclassification as STATE-
evaluated based on the macro (and micro) preci-           MENT. One possible reason is that statements rep-
sion and recall (Sebastiani, 2002). However, the          resent the majority class, thus inducing a bias in
official task measure used to rank the systems is         the classifiers. Another possible explanation, is
the macro-F. Results are reported in Table 4.             that dialogue moves that appear to be linguistically
    3                                                     consistent with the typical structure of statements
   https://spacy.io/
    4
   KeLP is a Java Kernel-based Learning Platform: http:   have been annotated differently, according to the
//www.kelp-ml.org/                                        actual communicative role they play.
Table 6: Confusion Matrix of the UNITOR system w.r.t. gold standard. In column the number of classes
from the gold standard, while rows report the system decisions. In bold correct classifications.
                 STATEMENT   KIND-ATT.   GEN.-ANSW.   REJECT   CLOSING   SOL.-CLAR.   OPENING   AGREE   INFO-REQ.
    STATEMENT       153          6            3         24        0           3           0        2        13
    KIND-ATT.        4           17           0          5        1           2          0         3         2
    GEN.-ANSW.       1           0           48          0        0           1           0       6          0
    REJECT           0           3            0          3        0           0           0        0         1
    CLOSING          0           0            0          0        7           1          0         1         0
    SOL.-CLAR.       0           6            0          2        1           8           0       1          2
    OPENING          0           0            0          0        0           0          11        0         0
    AGREE            0           3            1          1        0           0           0       11         1
    INFO-REQ.        4           9            0          4        1           9           0       0         93



Table 7: Confusion Matrix of the X2Check system w.r.t. gold standard. In column the number of classes
from the gold standard, while rows report the system decisions. In bold correct classifications.
                 STATEMENT   KIND-ATT.   GEN.-ANSW.   REJECT   CLOSING   SOL.-CLAR.   OPENING   AGREE   INFO-REQ.
    STATEMENT       146          15           3         30        1           2          1         2        19
    KIND-ATT.        2           9            0          0        0           1          0         5         2
    GEN.-ANSW.       5           3           47          2        0           3          0        2          0
    REJECT           0           0            0          0        0           0          0        0          0
    CLOSING          0           0            0          1        9           0          0         1         0
    SOL.-CLAR.       1           4            0          2        0           8          1        0          2
    OPENING          0           0            0          0        0           0          8         0        0
    AGREE            2           5            1          0        0           1          0        12         0
    INFO-REQ.        7           8            1          4        0           9          1        2         89



6     Final Remarks and Conclusions                        the proposed approaches are able to generalize.
We presented the first edition of the new shared
task about itaLIan Speech acT labEliNg (iLIS-              References
TEN) at EVALITA 2018. The task fits in the fast-           John L. Austin. 1962. How to do things with words.
growing research trend focusing on conversational            William James Lectures. Oxford University Press.
access to the information, e.g. using chatbots or
                                                           Tommaso Caselli, Nicole Novielli, Viviana Patti, and
conversational agents. The task consists in auto-            Paolo Rosso. 2018. EVALITA 2018: Overview of
matically annotating dialogue turns with speech              the 6th Evaluation Campaign of Natural Language
act labels, representing the communicative inten-            Processing and Speech Tools for Italian. In Tom-
tion of the speaker. The corpus of dialogues has             maso Caselli, Nicole Novielli, Viviana Patti, and
                                                             Paolo Rosso, editors, Proceedings of Sixth Evalua-
been collected in the scope of previous research on          tion Campaign of Natural Language Processing and
natural language interaction with embodied con-              Speech Tools for Italian. Final Workshop (EVALITA
versational agents. Specifically, the participating          2018), Turin, Italy. CEUR.org.
systems had to annotate the speech acts associated         Giuseppe Clarizio, Irene Mazzotta, Nicole Novielli,
to the user dialogue moves while the agent’s dia-            and Fiorella De Rosis. 2006. Social attitude towards
logue turns were provided as context.                        a conversational character. pages 2–7.
   We received two runs from two teams, one from           Mark G. Core and James F. Allen. 1997. Coding Di-
academia and the other one from industry. In                alogs with the DAMSL Annotation Scheme.
spite of the inherent complexity of the tasks, both
                                                           Danilo Croce and Roberto Basili. 2018. A Marko-
systems largely outperformed the baseline, repre-            vian Kernel-based Approach for itaLIan Speech acT
sented by the trivial classifier always predicting           labEliNg. In Tommaso Caselli, Nicole Novielli,
the majority class for users’ moves. The best per-           Viviana Patti, and Paolo Rosso, editors, Proceed-
forming system leverages syntactic features and              ings of the 6th evaluation campaign of Natural
                                                             Language Processing and Speech tools for Italian
relies on a Structured Kernel-based Support Vec-             (EVALITA’18), Turin, Italy. CEUR.org.
tor Machine. Follow-up editions might involve ex-
tending the benchmark with dialogues from dif-             Danilo Croce, Alessandro Moschitti, and Roberto
                                                             Basili. 2011. Structured lexical similarity via con-
ferent domains. Similarly, dialogues in different            volution kernels on dependency trees. In Proceed-
languages might be also included in the gold stan-           ings of EMNLP.
dard, as done for Automatic Misogyny Identifica-
                                                           Elisabetta Fersini, Debora Nozza, and Paolo Rosso.
tion task at EVALITA 2018 (Fersini et al., 2018).             2018. Overview of the Evalita 2018 Task on Au-
This would enable to assess to what extent the task           tomatic Misogyny Identification (AMI). In Tom-
is inherently dependent on the language and how               maso Caselli, Nicole Novielli, Viviana Patti, and
  Paolo Rosso, editors, Proceedings of the 6th evalua-
  tion campaign of Natural Language Processing and
  Speech tools for Italian (EVALITA’18), Turin, Italy.
  CEUR.org.
Rohan Kar and Rishin Haldar. 2016. Applying Chat-
  bots to the Internet of Things: Opportunities and Ar-
  chitectural Elements. CoRR, abs/1611.03799.
Tina Klüwer. 2011. “I Like Your Shirt” - Dia-
  logue Acts for Enabling Social Talk in Conversa-
  tional Agents. In Intelligent Virtual Agents, pages
  14–27.
Michael McTear, Zoraida Callejas, and David
  Griol Barres. 2016. The Conversational Interface:
  Talking to Smart Devices. Springer International
  Publishing.
Nicole Novielli and Carlo Strapparava. 2011. Dia-
  logue act classification exploiting lexical semantics.
  In Conversational Agents and Natural Language In-
  teraction: Techniques and Effective Practices, chap-
  ter 4, pages 80–106. IGI Global.
John R. Searle. 1969. Speech Acts: An Essay in
  the Philosophy of Language. Cambridge University
  Press, Cambridge, London.
Fabrizio Sebastiani. 2002. Machine learning in auto-
  mated text categorization. ACM computing surveys
  (CSUR), 34(1):1–47.
Andreas Stolcke, Noah Coccaro, Rebecca Bates, Paul
  Taylor, Carol Van Ess-Dykema, Klaus Ries, Eliza-
  beth Shriberg, Daniel Jurafsky, Rachel Martin, and
  Marie Meteer. 2000. Dialogue Act Modeling for
  Automatic Tagging and Recognition of Conversa-
  tional Speech. Comput. Linguist., 26(3):339–373,
  September.
David R. Traum. 2000. 20 Questions for Dialogue Act
  Taxonomies. Journal of Semantics, 17(1):7–30.
Soroush Vosoughi and Deb Roy. 2016. A Semi-
  automatic Method for Efficient Detection of Stories
  on Social Media. In Proc. of the 10th AAAI Conf.
  on Weblogs and Social Media., ICWSM 2016, pages
  711–714.