<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Dialog Acts Classification for Question-Answer Corpora</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Saurabh Chakravarty</string-name>
          <email>saurabc@vt.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raja Venkata Satya Phanindra</string-name>
          <email>chrvsp96@vt.edu Virginia Tech Blacksburg, VA</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edward A. Fox</string-name>
          <email>fox@vt.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chava</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Virginia Tech</institution>
          ,
          <addr-line>Blacksburg, VA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>21</volume>
      <issue>2019</issue>
      <abstract>
        <p>Many documents are constituted by a sequence of question-answer (QA) pairs. Applying existing natural language processing (NLP) methods such as automatic summarization to such documents leads to poor results. Accordingly, we have developed classification methods based on dialog acts to facilitate subsequent application of NLP techniques. This paper describes the ontology of dialog acts we have devised through a case study of a corpus of legal depositions that are made of QA pairs, as well as our development of machine/deep learning classifiers to identify dialog acts in such corpora. We have adapted state-of-the-art text classification methods based on a convolutional neural network (CNN) and long short term memory (LSTM) to classify the questions and answers into their respective dialog acts. We have also used pre-trained BERT embeddings for one of our classifiers. Experimentation showed we could achieve an F1 score of 0.84 on dialog act classification involving 20 classes. Given such promising techniques to classify questions and answers into dialog acts, we plan to develop custom methods for each dialog act, to transform each QA pair into a form that would allow for the application of NLP or deep learning techniques for other downstream tasks, such as summarization.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Documents such as legal depositions contain conversations between
a set of two or more people, aimed at identifying observations and
the facts of a case. The conversational actors are aware of the
current context, so need not include important contextual clues
during their communication. Further, because of that awareness,
their conversations may exhibit frequent context shifts.</p>
      <p>These conversations are in the form of rapid fire question-answer
(QA) pairs. Like many conversations, these documents are noisy,
only loosely following grammatical rules. Often, people don’t speak
using complete or well-formed sentences that can be comprehended
in isolation. There are instances where a legal document is
transcribed by a court reporter and the conversation contains words
like “um” or “uh” that signify that the speaker is thinking. In many
of the instances, there is an interruption that leads to incomplete
sentences being captured or a sentence getting abandoned
altogether.</p>
      <p>These characteristics of QA conversations make it dificult to
apply popular NLP processing methods, including co-reference
resolution and summarization techniques. For example, there is the
challenge of identifying key concepts using NLP based rules. In
many corpora, the root words that are most prevalent in sentences
help identify the core concepts present in a document. These core
concepts help text processing systems capture information with
high precision. However, traditional NLP techniques like syntax
parsing or dependency trees sometimes struggle to find the root of
conversational sentences because of their form.</p>
      <p>Humans, on the other hand, readily understand such documents
since the number of types of questions and answers is limited, and
these types provide strong semantic clues that aid comprehension.
Accordingly, we seek to leverage the types found, to aid textual
analysis.</p>
      <p>Defining and identifying each QA pair type would ease the
processing of the text, which in turn would facilitate downstream tasks
like question answering, summarization, information retrieval, and
knowledge graph generation. This is because special rules could be
applied to each type of question and answer, allowing conversion
oriented to supporting existing NLP tools. This would facilitate text
parsing techniques like constituency and dependency parsing and
also enable us to break the text into diferent chunks based on part
of speech (POS) tags.</p>
      <p>
        Dialog Acts (DA) [
        <xref ref-type="bibr" rid="ref19 ref41">19, 41</xref>
        ] represent the communicative intention
behind a speaker’s utterance in a conversation. Identifying the DA
of each speaker utterance in a conversation thus is a key first step
in automatically determining intent and meaning. Specific rules can
be developed for each DA type to process a conversation QA pair
and transform it into a suitable form for subsequent analysis.
Developing methods to classify the DAs in a conversation thus would
help us delegate the transformation task to the right transformer
method.
      </p>
      <p>
        Text classification using deep learning techniques has rapidly
improved in recent years. Deep neural network based architectures
like Recurrent Neural Network (RNN) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Long Short Term
Memory (LSTM) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], and Convolutional Neural Network (CNN) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
now outperform traditional machine learning based text
classification systems. For example, LSTM and CNN networks help capture
the semantic and syntactic context of a word. This enables the
systems based on LSTM and CNN to model word sequences better.
There have been various architectures in the area of text
classification which use an encoder-decoder [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] based model for learning.
Systems using CNNs [
        <xref ref-type="bibr" rid="ref10 ref2 ref22 ref34 ref7">2, 7, 10, 22, 34</xref>
        ] or LSTMs [
        <xref ref-type="bibr" rid="ref39 ref7">7, 39</xref>
        ] have had
significant performance improvements over the previously established
baselines in text classification tasks like sentiment classification,
machine translation, information retrieval, and polarity detection.
Accordingly, we focus on deep learning based text classification
techniques and fine-tune them for our task of DA classification.
      </p>
      <p>The core contributions of this paper are as follows.
(1) A Dialog Act ontology that pertains to the conversations in
the legal domain.
(2) An annotated dataset that will be available for the research
community.
(3) Classification methods that use state-of-the-art techniques
to classify Dialog Acts, and which have been fine-tuned for
this specific task.
2</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Early work on Dialog Act Classification [
        <xref ref-type="bibr" rid="ref1 ref14 ref18 ref23 ref25 ref28 ref38 ref40">1, 14, 18, 23, 25, 28, 38, 40</xref>
        ]
used machine learning techniques such as Support Vector
Machines (SVM), Deep Belief Network (DBN), Hidden Markov Model
(HMM), and Conditional Random Field (CRF). They used features
like speaker interaction and prosodic cues, as well as lexical,
syntactic, and semantic features, for their models. Some of the works
also included context features that were sourced from the previous
sentences. Work in [
        <xref ref-type="bibr" rid="ref36 ref38">36, 38</xref>
        ] used HMM for modeling the dialog act
probabilities with words as observations, where the context was
defined using the probabilities of the previous utterance dialog acts.
Work in [
        <xref ref-type="bibr" rid="ref12 ref18">12, 18</xref>
        ] used DBN for decoding the DA sequences and used
both the generative and the conditional modeling approaches to
label the dialog acts. Work in [
        <xref ref-type="bibr" rid="ref12 ref21 ref32 ref6">6, 12, 21, 32</xref>
        ] used CRF to label the
sequences of dialog acts.
      </p>
      <p>
        The sentences in the QA pairs need to be modeled into a vector
representation so that we can use them as features for text
classiifcation. Availability of rich word embeddings like word2vec [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]
and GloVe [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] have been efective in text classification tasks. These
embeddings are learned from large text corpora like Google News
or Wikipedia. They are generated by training a neural network on
the text, where the objective is to maximize the probability of a
word given its context, or vice-versa. This objective helps the
neural network to group words that are similar in a high-dimensional
vector space. Work based on averaging of the word vectors [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] in a
sentence has given good performance in text classification.
      </p>
      <p>
        In late 2018, Google developed BERT (Bidirectional Encoder
Representations from Transformers) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], a powerful method for
sentence embeddings. It was pre-trained on a massive corpus of
unlabeled data to build a neural network based language model. This
allows BERT to achieve significantly higher performance for
classiifcation tasks which have a small task-specific data-set. The authors
argued that the current deep learning based language models to
generate embeddings are unidirectional and there are challenges
when we need to model sentences. Tasks such as attention based
question answering require the architecture to attend to tokens
before and after, during the self-attention stage. The core
contribution was the generation of pre-trained sentence embeddings that
were learned using the left and right context of each token in the
sentence. The authors also proposed that these pre-trained
embeddings can be used to model any custom NLP task by adding a final
fully connected neural network layer and modeling the network
output to the task at hand. There is no need to create a complex
network architecture. BERT internally uses the multi-layer
network or “transformer” presented in [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] to model the input text
and the output embedding. The transformer involves six layers of
attention, followed by normalization and a feed-forward layer as an
encoder, and the same layers plus an added masked attention layer
for the decoder. The attention layer in the encoder and decoder
builds self-attention on the input and output words, respectively,
to learn what words are important. The masked attention layer in
the decoder learns the attention only until the token in the output
that has already been generated by the decoder so far. To train
the model, the work involved learning on two tasks. The first task
was to guess a masked word in a sentence, where each sentence
was from a large corpus. The authors removed a word randomly
from a sentence and trained the model to predict the right word.
The second task was to predict the following sentence for a given
sentence, from a choice of four sentences. The training was
performed using the Google Books Corpus (with 800M words) [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] and
English Wikipedia (with 2,500M words) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The work obtained new
state-of-the-art results on 11 NLP tasks as part of General Language
Understanding Evaluation (GLUE), and was very competitive in
other tasks.
      </p>
      <p>
        Recent works like [
        <xref ref-type="bibr" rid="ref20 ref26 ref33">20, 26, 33</xref>
        ] use deep neural networks to
classify the dialog acts. These works used models like CNN and LSTM to
model the context for a sentence. Work in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] used a CNN+LSTM
model for the DA classification and slot-filling task using two
diferent datasets. They obtained a negligible improvement for one of the
datasets and a significant improvement for the other. Work in [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]
used a recurrent CNN based model to classify the DAs, and obtained
a 2.9% improvement over the LM-HMM baseline. Work in [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] used
RNN and CNN based models for DA classification along with the
DA labels of the previous utterances to achieve state-of-the-art
results in the DA classification task.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>METHODS</title>
      <p>As part of our methods, we defined an ontology of dialog acts for
the legal domain. Each sentence in the conversation was classified
into one of the classes. The following sections describe the ontology
and classification methods in more detail.
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Dialog Act Ontology</title>
      <p>
        After a thorough analysis of the conversation QA pairs in our
dataset of depositions, two researchers refined a subset of the dialog
acts found in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. These researchers also added additional dialog
acts to our ontology for the questions and answers, again based on
their analysis of the depositions. The following sections present
more details.
3.1.1 uQestion specific dialog acts. Table 1 shows the diferent
dialog acts that we have defined for the questions in the depositions.
      </p>
      <p>We expanded the “wh” category, which covers many of the DAs
in a deposition, into sub-categories. This would enable specific
comprehension techniques to be used on each sub-category as the
sentences are varied for each of the sub-categories. Table 2 lists and
describes each sub-category for the “wh” parent category.
3.1.2 Answer specific dialog acts. Table 3 shows the diferent dialog
acts that we have defined for the questions in the depositions.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Dialog Act Classification</title>
      <p>We used diferent classifiers based on deep learning that have
achieved state-of-the-art results in multiple other tasks. We also
used simple classifiers that used sentence embeddings followed by</p>
      <p>Category
wh
wh-d
bin
bin-d
qo
or</p>
      <p>Category
y-followup
n-followup
y
y-d
n
n-d
sno
so
ack
dno
confront</p>
      <p>Description
This is a wh-* kind of question. These questions generally start with question words like who,
what, where, when, why, how, etc.</p>
      <p>This is also a wh-* kind of question. But if there is more than one statement in a what question, it
is a what-declarative question. These questions have some information prior to the actual question
which relates to the question.</p>
      <p>This is a binary question. These are questions that can be answered with a simple “yes” or “no”.
This is a binary-declarative question which can also be answered with a “yes” or a “no”. But, in
a binary-declarative question, the person who asks the question knows the answer but asks for
verification. In contrast, a binary question indicates the examiner seeks to know which is the
actual answer.</p>
      <p>This is an open question. These questions are general questions which are not specific to any
context. These questions are asked to know the opinions of the person who is answering.
This is a choice question. Choice questions are questions that ofer a choice of several options as
an answer. They are made up of two parts, which are connected by the conjunction “or”.</p>
      <p>Example
What time did you wake up on the morning the
incident took place?
You said generally wake up at 7:00 am in the morning.</p>
      <p>But what time did you wake up on the morning the
incident took place?
Is that where you live?
That is where you live, right?
Do you think Mr. Pace made a good decision?
Were you working out for fun or were you into body
building?</p>
      <p>Description
It is a category when a person answering the question means yes. The answer sentence can take
various forms and the answer need not be exactly “yes”.</p>
      <p>It is a category when a person answering the binary question not only says yes but also given an
explanation for this answer.</p>
      <p>The answer is yes, but in the answer, there is another question which pertains to the question
asked.</p>
      <p>It is a category when a person answering the question means no. Again, the answer need not be
exactly “no”.</p>
      <p>It is a category when a person answering the binary question not only says no but also given an
explanation for this answer.</p>
      <p>The answer is no, but in the answer, there is another question which pertains to the question
asked.</p>
      <p>It is a statement which is a non-opinion. This is an informative statement made by the person
answering the question.</p>
      <p>It is a statement which is an opinion. It is a statement which is actually an opinion of the person
answering rather than a general statement.</p>
      <p>It is a response which indicates acknowledgment.</p>
      <p>It is a response given when the person doesn’t know, or doesn’t recall, or is unsure about the
answer to the question asked.</p>
      <p>
        The answer contains no information. It is a confrontation by the deponent to the question asked.
a fully connected neural network to check for eficacy of sentence
embeddings like BERT in dialog act classification. The following
sections describe the diferent classification methods we used to
classify the dialog acts.
3.2.1 Classification using CNN. Work in [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] used CNN to capture
the n-gram representation of a sentence using convolution. A
window size provided as a parameter was used to define the number
of words to be included in the convolution filter. Figure 1 shows
the convolution operation capturing a bi-gram representation. We
used the architecture from the original work in [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] for learning
the sentence representation using a CNN. We added a feed-forward
neural network layer in front of the representation layer to finally
classify the dialog act for a given sentence. Tokens from a sentence
are transformed into word vectors using word2vec, and fed into
the network. This is followed by the convolution and max-pooling
operations. The final sentence has a fixed size representation
irrespective of sentence length. As the system trains, the network
is able to learn a sentence embedding as part of this layer. This
representation is rich since it captures the semantic and syntactic
relations between the words. Figure 2 shows a reference architecture
of the whole CNN based approach for two classes.
3.2.2 Classification using LSTM with atention. Work in [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ] used
a bi-directional LSTM with an attention mechanism to capture the
most important information contained in a sentence. It did not
use any classical NLP system based features. Even though CNN
can capture some semantic and syntactic dependencies between
words using a larger feature map, it struggles to capture the long
term dependencies between words if the sentences are long. LSTM
based network architectures are better equipped to capture these
long term dependencies since they employ a recurrent model. The
context of the initial words can make their way down the recurrent
chain based on the activation of the initial words and their gradients,
during the back propagation phase.
      </p>
      <p>
        Figure 3 shows the network architecture of the system. The
words are fed into the network using their vector representation.
The network processes the words in both directions. This helps
the network learn the semantic information not only from the
words in the past, but also from the words in the future. The output
layers of both the directional LSTMs are combined as one, using
an element-wise sum. An attention layer is added to this combined
output, with coeficients for each output unit. These coeficients act
as the attention mechanism; attention priorities are learned by the
system during the training phase. These coeficients capture the
relative importance of the terms in the input sentence. The word
embeddings were also learned as part of the training. Dropout [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]
was applied to the embedding, LSTM, and penultimate layers.
L2norm based penalties were also applied as part of the regularization.
3.2.3 Classification using BERT. In this method, we generate the
sentence embeddings of the questions and answers via the BERT
pre-trained model. BERT can be fine-tuned to any NLP task by
adding a layer on the top of this architecture which makes it suitable
for the task. Figure 4 shows the high-level architecture consisting
of various components like embeddings and transformers.
      </p>
      <p>In our system implementation, we used the BERT reference
architecture and added a feed-forward neural network layer on top of
BERT sentence embeddings. We want to classify text with length
that varies from roughly a portion of one sentence to a large
paragraph. Further, we are performing a single sentence classification
and not a sentence pair classification, as was mentioned in the
BERT paper. We use the BERT-Base, Cased pre-trained model for
our classification esperiments. Figure 5 shows the architecture for
our classifier.</p>
      <p>In our experiment section, we will refer to the introduced
classiifcation methods as CNN, Bi-LSTM, and BERT, respectively.</p>
    </sec>
    <sec id="sec-6">
      <title>DATASET</title>
      <p>Legal depositions and trial testimonies represent a type of
conversations which have a specific format, where the attorney asks
questions and the deponent or witness answers those questions.
Figure 6 shows an example of a page in a legal deposition. Proper
parsing of legal depositions is necessary to perform analysis for
downstream tasks like summarization.
4.1</p>
    </sec>
    <sec id="sec-7">
      <title>Proprietary Dataset</title>
      <p>For our dialog acts classification experiments, we performed all our
work on a proprietary dataset, provided by Mayfair Group LLC.
This dataset was made available to us as a courtesy by several law
ifrms. Our classification experiments were performed on this dataset
and results of this paper reflect the same. This dataset consists of
around 350 depositions. The format of these documents follows
conventional legal deposition standards.
4.2</p>
    </sec>
    <sec id="sec-8">
      <title>Tobacco Dataset</title>
      <p>
        The roughly 14 million Truth Tobacco Industry Documents
constitutes a public dataset, which contains legal documents, related
to the settlement of court cases between US states and the seven
major tobacco industry organizations, on willful actions of tobacco
companies to sell tobacco products despite their knowledge of the
harmful efects. It was created in 2002 by the UCSF Library and
Center for Knowledge Management to provide public access to the
many legal documents related to that settlement. This dataset
includes around 12,300 publicly available legal deposition documents
which can be accessed from the website maintained by UCSF [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
Our analysis and results can also be reproduced on this publicly
available dataset.
      </p>
      <p>Due to client privacy and confidentiality concerns, we are unable
to share the proprietary dataset. The annotated tobacco dataset is
available publicly1 for the research community to use.
4.3</p>
    </sec>
    <sec id="sec-9">
      <title>Data pre-processing</title>
      <p>
        Legal depositions can be in a wide variety of formats like .pdf, .docx,
.rtf, .txt, etc. Implementing a separate functionality for parsing
diferent formats can be dificult and time-consuming. So, a common
platform which can be used to parse deposition transcripts across
all the formats in a generalized way is needed. Apache Tika [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ],
developed by the Apache Software Foundation, can be used to
extract metadata and content from across hundreds of file types
through a single interface. Apache Tika has Python support through
a library called tika.
      </p>
      <p>Though there is a standard format for deposition documents,
diferent challenges were encountered while parsing the documents.
Challenges faced in legal deposition document parsing include:
(1) Varying number of columns per page,
(2) Header and footer elimination, and
1The dataset can be downloaded from https://github.com/saurabhc123/asail_dataset
(3) Determining the starting and ending points of the actual
deposition conversation within the entire document.</p>
      <p>Generally, the PDF versions of legal depositions have multiple
columns per page. Apache Tika reads multiple columns in a page
separately by recognizing column separations which are encoded
as extended ASCII codes. Hence, text from separate columns are
parsed in the correct sequence.</p>
      <p>Header and footer in legal depositions constitute several things
like the name of the person being deposed, name of the attorney,
name of the law firm, e-mail IDs, phone numbers, page numbers,
etc. Figure 6 shows an example of a page in a legal deposition with
header and footer. We read the content parsed by Apache Tika line
by line and use regular expressions (regex) in Python to search
for a pattern within each line of the text. Using regex in Python,
we convert every line to a string which contains only alphabets,
periods, and question marks. Then, we use a dictionary in Python
to store all the patterns and the list of indices of the lines in which
those pattern has appeared. Finally, we check for the patterns which
satisfy the below constraints and remove those lines from the text.
(1) The number of times these patterns appear must be greater
than or equal to the number of pages of the document.
(2) Those lines must not begin with the answer or question tags
(‘A.’ and ‘Q.’) and must not end with a question mark.
For example, in the document which is represented by Figure 6,
patterns “sourcehttpswww.industrydocuments.ucsf.edudocspsmw”,
“january”, “jamesfiglar”, “u.s.legalsupport” satisfy all of the above
constraints, and hence the lines containing these patterns are
removed from the entire text with the help of their indices which are
stored in the dictionary.</p>
      <p>After cleaning the text, pre-processing of data had to be done to
extract the needed data in the required format. A deposition
transcript can contain multiple segments within it (like “INDEX”,
“EXHIBITS”, “APPEARANCES”, “EXAMINATION”, “STIPULATIONS”,
“CERTIFICATIONS”, etc). For our work, we only needed the
“EXAMINATION” segment where the actual conversation between
attorney(s) and deponent takes place. Figures 7 and 8 represent
the starting and ending of the “EXAMINATION” segment. We only
extract the “EXAMINATION” segment based on the observed
patterns that represent beginning and ending of this segment that hold
across our various depositions.</p>
      <p>Finally, our pre-processing methods removed the noise from the
text and only extracted the conversation part of the deposition.</p>
    </sec>
    <sec id="sec-10">
      <title>EXPERIMENTAL SETUP AND RESULTS</title>
    </sec>
    <sec id="sec-11">
      <title>Experimental Setup</title>
      <p>The overall size of the derived dataset developed from the public
dataset for dialog acts classification was a total of about 2500
questions and answers. This entire dataset was manually annotated, to
provide a ground truth for evaluation. The dataset then was
randomly divided into train, validation, and test datasets in the ratio
70:20:10, respectively, to be studied using each of the three
classiifers. Table 4 shows the distribution of the classes for the whole
dataset.
5.1.1 Environment setup. All the classification experiments were
run on a Dell server running Ubuntu 16.04, with 32 GB RAM and
two Tesla P40 NVIDIA GPUs.
5.1.2 CNN classifier. Parameters that were fine-tuned for the CNN
with word2vec embeddings classifier are:
(1) hidden layer size: This was varied from 100 to 500 in steps
of 100.
(2) dropout: This was varied from 0.1 to 0.5 in steps of 0.1.
(3) output layer activation function: sigmoid, tanh, and relu.
(4) n-gram: window size base on unigram, bi-gram, and tri-gram
groupings.
(5) max-sequence length: It was kept constant at 32.
(6) batch-size: It was kept constant at 100.
(7) number of epochs: It was varied from 10 to 50 until the
validation accuracy stopped improving any further.
5.1.3 LSTM classifier. Parameters that were fine-tuned for the
Bidirectional LSTM with attention classifier are:
(1) hidden layer size: This was varied between the values 32, 64,
128, and 256.
(2) embedding size: This was varied between the values 32, 64,
128, and 256.
(3) learning rate: This was varied between the values 0.0001,
0.001, 0.01, and 0.1.
(4) max-sequence length: It was kept constant at 32.
(5) batch-size: It was kept constant at 100.
(6) number of epochs: It was varied from 10 to 50 until the
validation accuracy stopped improving any further.
5.1.4 BERT classifier. Parameters that were fine-tuned for the
BERT single sentence classifier are:
(1) learning rate: This was varied between the values 0.00005,
0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, and 0.1.
(2) max-sequence length: It was kept constant at 32.
(3) batch-size: It was kept constant at 100.
(4) number of epochs: It was varied from 10 to 50 until the
validation accuracy stopped improving any further.
5.2</p>
    </sec>
    <sec id="sec-12">
      <title>Results</title>
      <p>5.2.1 System Comparisons. Table 5 lists each of the three classifiers
and their corresponding best test F1 score. BERT outperformed the
other methods by a significant margin and achieved an F1 score of
0.84.</p>
      <p>Classifier</p>
      <p>F1-score</p>
      <p>We observe from Figures 9, 10, and 11 that after 15 epochs, the
training accuracy is still increasing but the validation accuracy
remains almost constant. This indicates that after 15 epochs, the
Precision</p>
      <p>Recall</p>
    </sec>
    <sec id="sec-13">
      <title>Error Analysis</title>
      <p>We chose the best performing classification results and performed a
detailed error analysis on the misclassifications. Table 12 discusses
the errors associated with each dialog act. We have not included the
dialog acts that had fewer than 3 test samples or misclassifications.</p>
    </sec>
    <sec id="sec-14">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>We parsed legal depositions in a wide variety of formats and
extracted the necessary conversation information, also removing
bin
much of the noise, allowing for natural language processing (NLP)
and deep learning techniques to be employed for further processing.</p>
      <p>State-of-the-art summarization methods and NLP techniques
are dificult to apply to question-answer pairs. Our preliminary
testing with summarization methods applied to QA pairs led to
poor results. Hence we desire a semantically equivalent,
grammatically correct, and linguistically uflent representation to replace
each QA pair. This should retain key information from the QA
pair so that summaries generated from that representation do not
lose any important information from the actual conversation. To
achieve this, we carefully defined and developed a dialog act
ontology which contains 20 dialog acts to capture the intention of the
speaker behind the utterance. The quality of the set of dialog acts
is also enriched based on our study of the legal deposition domain.
Classification of each question and answer into these dialog acts
should aid in developing specific NLP rules or techniques to
convert each question-answer pair into an appropriate representation.
For classification purposes, we have created our own dataset by
manually annotating around 2500 questions and answers into their
corresponding dialog acts. This dataset helped us in training the
classifiers and also in evaluating the performance of the classifiers.</p>
      <p>We have developed three deep learning based classification
methods for dialog acts classification:
• Convolutional Neural Network (CNN) with word2vec
embeddings,
• Bi-directional Long Short Term Memory (LSTM) with
attention mechanism, and
• Bidirectional Encoder Representations from Transformers
(BERT).</p>
      <p>
        We experimented with these three classifiers and fine-tuned their
various parameters. We performed training, validation, and testing
with each of the three classifiers. We achieved F1 scores of 0.57 and
0.71 using the CNN and the LSTM based classifiers, respectively.
The highest F1 score of 0.84 was achieved using the BERT sentence
embeddings based classifier on the dialog act classification task.
We plan to extend this work in the following ways.
(1) Use context information for Dialog Acts classification such as
using the dialog acts from previous utterances [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to classify
the current dialog act, to improve the classification accuracy.
(2) Develop NLP and deep learning techniques to convert a
question-answer pair to a semantically equivalent
representation, to which it will be easy to apply a variety of NLP
tools.
(3) Use state-of-the-art deep learning based abstractive
summarization methods to generate summaries from those
representations.
(4) Develop explainable AI methods so it will be clear how
summaries were generated.
      </p>
    </sec>
    <sec id="sec-15">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was made possible by the Virginia Tech’s Digital Library
Research Laboratory (DLRL). We would also like to thank Ashin
Marin Thomas for her help with data annotation and running the
experiments. Data in the form of legal depositions was provided
by Mayfair Group LLC. In accordance with Virginia Tech policies
and procedures and my ethical obligation as a researcher, we are
reporting that Dr. Edward Fox has an equity interest in Mayfair
Group, LLC. whose data was used in this research. Dr. Fox has
disclosed those interests fully to Virginia Tech, and has in place an
approved plan for managing any potential conflicts arising from
this relationship.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Jeremy</surname>
            <given-names>Ang</given-names>
          </string-name>
          , Yang Liu, and
          <string-name>
            <given-names>Elizabeth</given-names>
            <surname>Shriberg</surname>
          </string-name>
          .
          <article-title>Automatic dialog act segmentation and classification in multiparty meetings</article-title>
          .
          <source>In Proceedings.(ICASSP'05)</source>
          .
          <source>IEEE International Conference on Acoustics, Speech, and Signal Processing</source>
          ,
          <year>2005</year>
          ., volume
          <volume>1</volume>
          ,
          <string-name>
            <surname>pages</surname>
            <given-names>I</given-names>
          </string-name>
          -
          <fpage>1061</fpage>
          . IEEE,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Phil</given-names>
            <surname>Blunsom</surname>
          </string-name>
          , Edward Grefenstette, and
          <string-name>
            <given-names>Nal</given-names>
            <surname>Kalchbrenner</surname>
          </string-name>
          .
          <article-title>A convolutional neural network for modelling sentences</article-title>
          .
          <source>In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. ACL</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Chandrakant</given-names>
            <surname>Bothe</surname>
          </string-name>
          , Cornelius Weber,
          <string-name>
            <given-names>Sven</given-names>
            <surname>Magg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Wermter</surname>
          </string-name>
          .
          <article-title>A contextbased approach for dialogue act recognition using simple recurrent neural networks</article-title>
          .
          <source>In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Denny</given-names>
            <surname>Britz</surname>
          </string-name>
          .
          <article-title>Understanding convolutional neural networks for NLP</article-title>
          . URL: http://www. wildml. com/
          <year>2015</year>
          /11/understanding-convolutional
          <string-name>
            <surname>-</surname>
          </string-name>
          neuralnetworks-fornlp/(visited on 11/07/2015),
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Eduardo</surname>
            <given-names>PS Castro</given-names>
          </string-name>
          , Saurabh Chakravarty, Eric Williamson, Denilson Alves Pereira, and
          <article-title>Edward A Fox. Classifying short unstructured data using the Apache Spark platform</article-title>
          .
          <source>In Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries</source>
          , pages
          <fpage>129</fpage>
          -
          <lpage>138</lpage>
          . IEEE Press,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Lin</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <article-title>Barbara Di Eugenio. Multimodality and dialogue act classification in the RoboHelper project</article-title>
          .
          <source>In Proceedings of the SIGDIAL 2013 Conference</source>
          , pages
          <fpage>183</fpage>
          -
          <lpage>192</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Kyunghyun</given-names>
            <surname>Cho</surname>
          </string-name>
          , Bart Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>Learning phrase representations using rnn encoderdecoder for statistical machine translation</article-title>
          .
          <source>In EMNLP</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Kyunghyun</given-names>
            <surname>Cho</surname>
          </string-name>
          , Bart van Merrienboer,
          <string-name>
            <surname>Dzmitry Bahdanau</surname>
            , and
            <given-names>Yoshua</given-names>
          </string-name>
          <string-name>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>On the properties of neural machine translation: Encoder-decoder approaches</article-title>
          .
          <source>In Proceedings of SSST-8</source>
          , Eighth Workshop on Syntax,
          <source>Semantics and Structure in Statistical Translation</source>
          , pages
          <fpage>103</fpage>
          -
          <lpage>111</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>William</given-names>
            <surname>Coster</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Kauchak</surname>
          </string-name>
          .
          <article-title>Simple English Wikipedia: a new text simplification task</article-title>
          .
          <source>In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume</source>
          <volume>2</volume>
          , pages
          <fpage>665</fpage>
          -
          <lpage>669</lpage>
          . Association for Computational Linguistics,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Misha</surname>
            <given-names>Denil</given-names>
          </string-name>
          , Alban Demiraj, Nal Kalchbrenner, Phil Blunsom, and Nando de Freitas.
          <article-title>Modelling, visualising and summarising documents with a single convolutional neural network</article-title>
          .
          <source>arXiv preprint arXiv:1406.3830</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Jacob</surname>
            <given-names>Devlin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          . BERT:
          <article-title>Pretraining of deep bidirectional transformers for language understanding</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>1810</year>
          .04805,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Alfred</given-names>
            <surname>Dielmann</surname>
          </string-name>
          and
          <string-name>
            <given-names>Steve</given-names>
            <surname>Renals</surname>
          </string-name>
          .
          <article-title>Recognition of dialogue acts in multiparty meetings using a switching DBN</article-title>
          .
          <source>IEEE transactions on audio, speech, and language processing</source>
          ,
          <volume>16</volume>
          (
          <issue>7</issue>
          ):
          <fpage>1303</fpage>
          -
          <lpage>1314</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>L</given-names>
          </string-name>
          <string-name>
            <surname>Elman</surname>
          </string-name>
          .
          <article-title>Finding structure in time</article-title>
          .
          <source>Cognitive science</source>
          ,
          <volume>14</volume>
          (
          <issue>2</issue>
          ):
          <fpage>179</fpage>
          -
          <lpage>211</lpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Raul</given-names>
            <surname>Fernandez</surname>
          </string-name>
          and Rosalind W Picard.
          <article-title>Dialog act classification from prosodic features using support vector machines</article-title>
          .
          <source>In Speech Prosody</source>
          <year>2002</year>
          , International Conference,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Yoav</given-names>
            <surname>Goldberg</surname>
          </string-name>
          .
          <article-title>Neural network methods for natural language processing</article-title>
          .
          <source>Synthesis Lectures on Human Language Technologies</source>
          ,
          <volume>10</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>309</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Simon</given-names>
            <surname>Haykin</surname>
          </string-name>
          .
          <article-title>Neural networks</article-title>
          , volume
          <volume>2</volume>
          . Prentice Hall New York,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jürgen</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Gang</given-names>
            <surname>Ji</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jef</given-names>
            <surname>Bilmes</surname>
          </string-name>
          .
          <article-title>Dialog act tagging using graphical models</article-title>
          .
          <source>In Proceedings.(ICASSP'05)</source>
          .
          <source>IEEE International Conference on Acoustics, Speech, and Signal Processing</source>
          ,
          <year>2005</year>
          ., volume
          <volume>1</volume>
          ,
          <string-name>
            <surname>pages</surname>
            <given-names>I</given-names>
          </string-name>
          -
          <fpage>33</fpage>
          . IEEE,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          , Elizabeth Shriberg, Barbara Fox, and
          <string-name>
            <given-names>Traci</given-names>
            <surname>Curl</surname>
          </string-name>
          . Lexical, prosodic, and
          <article-title>syntactic cues for dialog acts</article-title>
          .
          <source>Journal on Discourse Relations and Discourse Markers</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Nal</given-names>
            <surname>Kalchbrenner</surname>
          </string-name>
          and
          <string-name>
            <given-names>Phil</given-names>
            <surname>Blunsom</surname>
          </string-name>
          .
          <article-title>Recurrent convolutional neural networks for discourse compositionality</article-title>
          .
          <source>In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality</source>
          , pages
          <fpage>119</fpage>
          -
          <lpage>126</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Su</given-names>
            <surname>Nam</surname>
          </string-name>
          <string-name>
            <surname>Kim</surname>
          </string-name>
          , Lawrence Cavedon, and
          <string-name>
            <given-names>Timothy</given-names>
            <surname>Baldwin</surname>
          </string-name>
          .
          <article-title>Classifying dialogue acts in one-on-one live chats</article-title>
          .
          <source>In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>862</fpage>
          -
          <lpage>871</lpage>
          . Association for Computational Linguistics,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Yoon</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <article-title>Convolutional neural networks for sentence classification</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>1746âĂŞ</fpage>
          -
          <lpage>1751</lpage>
          . ACL,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Pavel</given-names>
            <surname>Král</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christophe</given-names>
            <surname>Cerisara</surname>
          </string-name>
          .
          <article-title>Automatic dialogue act recognition with syntactic features</article-title>
          .
          <source>Language resources and evaluation</source>
          ,
          <volume>48</volume>
          (
          <issue>3</issue>
          ):
          <fpage>419</fpage>
          -
          <lpage>441</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>UCSF</given-names>
            <surname>Library</surname>
          </string-name>
          and
          <article-title>Center for Knowledge Management</article-title>
          .
          <source>Truth Tobacco Industry Documents</source>
          ,
          <year>2002</year>
          . https://www.industrydocuments.ucsf.edu/tobacco.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Yang</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Using SVM and error-correcting codes for multiclass dialog act classification in meeting corpus</article-title>
          .
          <source>In Ninth International Conference on Spoken Language Processing</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Yang</surname>
            <given-names>Liu</given-names>
          </string-name>
          , Kun Han,
          <string-name>
            <surname>Zhao</surname>
            <given-names>Tan</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>and Yun</given-names>
            <surname>Lei</surname>
          </string-name>
          .
          <article-title>Using context information for dialog act classification in DNN framework</article-title>
          .
          <source>In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>2170</fpage>
          -
          <lpage>2178</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Mark</given-names>
            <surname>Davies</surname>
          </string-name>
          .
          <source>Google books corpora</source>
          ,
          <year>2011</year>
          . [Online; accessed 28-April-2019].
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Marion</surname>
            <given-names>Mast</given-names>
          </string-name>
          , Ralf Kompe, Stefan Harbeck, Andreas Kießling, Heinrich Niemann, Elmar Noth,
          <string-name>
            <surname>Ernst Günter</surname>
            Schukat-Talamazzini, and
            <given-names>Volker</given-names>
          </string-name>
          <string-name>
            <surname>Warnke</surname>
          </string-name>
          .
          <article-title>Dialog act classification with the help of prosody</article-title>
          .
          <source>In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP'96</source>
          , volume
          <volume>3</volume>
          , pages
          <fpage>1732</fpage>
          -
          <lpage>1735</lpage>
          . IEEE,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Chris</given-names>
            <surname>Mattmann</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jukka</given-names>
            <surname>Zitting</surname>
          </string-name>
          . Tika in Action. Manning Publications Co.,
          <string-name>
            <surname>Greenwich</surname>
            ,
            <given-names>CT</given-names>
          </string-name>
          , USA,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Jef</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Manning</surname>
          </string-name>
          . Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          , pages
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Silvia</surname>
            <given-names>Quarteroni</given-names>
          </string-name>
          , Alexei V Ivanov,
          <string-name>
            <given-names>and Giuseppe</given-names>
            <surname>Riccardi</surname>
          </string-name>
          .
          <article-title>Simultaneous dialog act segmentation and classification from human-human spoken conversations</article-title>
          .
          <source>In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          , pages
          <fpage>5596</fpage>
          -
          <lpage>5599</lpage>
          . IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>LM</given-names>
            <surname>Rojas-Barahona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Gašić</surname>
          </string-name>
          ,
          <string-name>
            <surname>N Mrkšić</surname>
          </string-name>
          ,
          <source>PH Su, S Ultes, TH Wen</source>
          , and
          <string-name>
            <given-names>S</given-names>
            <surname>Young</surname>
          </string-name>
          .
          <article-title>Exploiting sentence and context representations in deep neural models for spoken language understanding</article-title>
          .
          <source>In COLING 2016-26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers</source>
          , pages
          <fpage>258</fpage>
          -
          <lpage>267</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Yelong</surname>
            <given-names>Shen</given-names>
          </string-name>
          , Xiaodong He,
          <string-name>
            <surname>Jianfeng Gao</surname>
            ,
            <given-names>Li</given-names>
          </string-name>
          <string-name>
            <surname>Deng</surname>
            , and
            <given-names>Grégoire</given-names>
          </string-name>
          <string-name>
            <surname>Mesnil</surname>
          </string-name>
          .
          <article-title>A latent semantic model with convolutional-pooling structure for information retrieval</article-title>
          .
          <source>In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management</source>
          , pages
          <fpage>101</fpage>
          -
          <lpage>110</lpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Nitish</surname>
            <given-names>Srivastava</given-names>
          </string-name>
          , Geofrey Hinton, Alex Krizhevsky, Ilya Sutskever, and
          <string-name>
            <given-names>Ruslan</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          .
          <article-title>Dropout: a simple way to prevent neural networks from overfitting</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          ,
          <volume>15</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Andreas</surname>
            <given-names>Stolcke</given-names>
          </string-name>
          , Klaus Ries, Noah Coccaro, Elizabeth Shriberg, Rebecca Bates, Daniel Jurafsky, Paul Taylor, Rachel Martin,
          <string-name>
            <surname>Carol Van</surname>
            Ess-Dykema, and
            <given-names>Marie</given-names>
          </string-name>
          <string-name>
            <surname>Meteer</surname>
          </string-name>
          .
          <article-title>Dialogue act modeling for automatic tagging and recognition of conversational speech</article-title>
          .
          <source>Computational linguistics</source>
          ,
          <volume>26</volume>
          (
          <issue>3</issue>
          ):
          <fpage>339</fpage>
          -
          <lpage>373</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>Ashish</surname>
            <given-names>Vaswani</given-names>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
          <article-title>Ł ukasz Kaiser, and Illia Polosukhin. Attention is All you Need</article-title>
          . In I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , and R. Garnett, editors,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , pages
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          . Curran Associates, Inc.,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>Anand</surname>
            <given-names>Venkataraman</given-names>
          </string-name>
          , Luciana Ferrer, Andreas Stolcke, and
          <string-name>
            <given-names>Elizabeth</given-names>
            <surname>Shriberg</surname>
          </string-name>
          .
          <article-title>Training a prosody-based dialog act tagger from unlabeled data</article-title>
          .
          <source>In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing</source>
          ,
          <year>2003</year>
          . Proceedings.
          <source>(ICASSP'03)</source>
          ., volume
          <volume>1</volume>
          ,
          <string-name>
            <surname>pages</surname>
            <given-names>I-I. IEEE</given-names>
          </string-name>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Xin</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Yuanchao Liu, SUN Chengjie, Baoxun
          <string-name>
            <surname>Wang</surname>
            , and
            <given-names>Xiaolong</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Predicting polarities of tweets by composing word embeddings with long shortterm memory</article-title>
          .
          <source>In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , volume
          <volume>1</volume>
          , pages
          <fpage>1343</fpage>
          -
          <lpage>1353</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>N</given-names>
            <surname>Webb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Hepple</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y</given-names>
            <surname>Wilks</surname>
          </string-name>
          .
          <article-title>Dialog act classification based on intra-utterance features</article-title>
          .
          <source>cs-05-01</source>
          . Dept. of Computer Science, University of Shefield, UK ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>Jason</given-names>
            <surname>Williams</surname>
          </string-name>
          .
          <article-title>A belief tracking challenge task for spoken dialog systems</article-title>
          .
          <source>In NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD</source>
          <year>2012</year>
          ), pages
          <fpage>23</fpage>
          -
          <lpage>24</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <surname>Peng</surname>
            <given-names>Zhou</given-names>
          </string-name>
          , Wei Shi, Jun Tian, Zhenyu Qi,
          <string-name>
            <given-names>Bingchen</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Hongwei</given-names>
            <surname>Hao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Bo</given-names>
            <surname>Xu</surname>
          </string-name>
          .
          <article-title>Attention-based bidirectional long short-term memory networks for relation classification</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , volume
          <volume>2</volume>
          , pages
          <fpage>207</fpage>
          -
          <lpage>212</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>