Extracting Dialog Structure and Latent Beliefs from Dialog Corpus

                              Aishwarya Chhabra , Pratik Saini and C. Anantaram
                       TCS Research, Tata Consultancy Services Ltd, Gwal Pahari, Gurgaon, India
                               {aishwarya.chhabra, pratik.saini, c.anantaram}@tcs.com


                                 Abstract                         on asking for the change of dates or budget instead of sug-
                                                                  gesting a new place. This behaviour of the system leads to a
      Dialog corpus captures various real world human             significant downturn in customer satisfaction.
      interactions in a particular domain. However, to               In this work, the focus is on taking dialog corpus captured
      build a task-based chat-bot for carrying out human-         in human-human interactions, and use that to learn the un-
      machine interactions in that domain, it is essen-           derlying dialog model for conversations in a domain. In ad-
      tial to extract the dialog structure and the latent         dition to the extraction of the dialog model, our method also
      beliefs in that corpus to build an effective chat-          identifies and learns the latent beliefs of the user to drive the
      bot. We examine this problem and propose a ma-              conversation in a meaningful direction.
      chine learning based solution. Our method catego-
      rizes the utterances into corresponding dialog states        User: I’m looking for a trip to Gotham City leaving from
      (sub-tasks) using a domain ontology, extracts the            Kakariko Village on Saturday, August 13, 2016. 3 adults for
      required information using machine learning based            no more than $2400
      approach, maps it with the appropriate state, and            Bot: I have a trip available to Vancouver for these dates
      automatically builds a finite-state-machine based            within your price range. Would you like to increase your
      dialog model. Further, since each human utterance            budget or adjust the dates?
      is in a context, a set of utterances consists of latent      User: I’d like to adjust the departure city to Caprica but I
      beliefs that the human uses while conversing on a            cannot adjust the dates or budget.
      topic. Our method identifies the latent beliefs in           Bot: Still no availability for this destination. Would you like
      conversations and uses them to appropriately tailor          to increase your budget or adjust the dates?
      the chat-bot’s responses based on the extracted fi-          User: The dates cannot be changed. How about going to
      nite state machine. We show how our method can               Theed with 2 adults, leaving from Kakariko Village, on a bud-
      lead to better conversational experience with a chat-        get of $2400?
      bot.                                                         Bot: Still no availability.
                                                                   User:Then I will bring my business elsewhere. Thank you.
1    Introduction                                                          Figure 1: A sample conversation without beliefs
Customer support systems and Planning systems in domains
such as Product support, Travel planning, Student-advising           In a way, dialog transcript data-sets encode the domain
etc. have transcribed dialog corpus capturing human-human         structure information. Our framework automatically learns
conversations that are largely task-oriented. In order to im-     this domain structure information using deep learning mod-
plement chatbot service in such domains, it is essential to ex-   els. We make use of domain ontology to enhance the accu-
tract the dialog model that captures the information regarding    racy of this learned dialog model. There have been number of
the states of the dialog. Most of the task-oriented systems       attempts to build end to end dialog systems. However, such
still use significant engineering and expert knowledge to im-     systems have not focused on extracting the latent beliefs in
plement the backbone of the dialog manager that carries out       the conversations that is required to tailor the chatbot inter-
the dialogues. Usually, dialog systems are either trained on      action for each user. Our framework also learns the latent
a huge general corpus or driven through a rule base. For this     beliefs of the customer from these transcripts and effectively
reason, these tend to behave in a restricted way and fail to      incorporates these beliefs to tailor its dialog suitably.
capture beliefs and emotional state. It is observed that most        This remainder of the paper is organized as follows. Sec-
of the time chatbots behave mechanically and do not take cus-     tion 2 discusses the related work. Section 3 describes the
tomer beliefs into account while conversing. As shown in          proposed architecture; Section 4 contains details on extract-
Figure 1, we can see the user tells the bot repetitively that     ing latent beliefs. Section 5 evaluates our models qualitatively
dates and budget are not flexible for him, but the bot keeps      and quantitatively, and finally conclusion in Section 6.

 Copyright © 2019 for this paper by its authors.
 Use permitted under Creative Commons License
 Attribution 4.0 International (CC BY 4.0).

                                                                                                                             30
2   Related Work                                                   preference, number of people, dates,
Automatic extraction of dialog structure and latent beliefs        amenities, confirm booking. In this example, the
from a given corpus is a relatively less explored area. Pre-       first seven sub-tasks are independent from each other and
viously, most of the work has been done using supervised           can be performed in any order. The task ”Confirm booking”
learning [Feng et al., 2005].        [Bangalore et al., 2008]      will always be the last sub-task that needs to be performed to
uses a classification-based approach to automatically create       complete the task successfully. In our work, we are focusing
task structures for task-oriented dialogues. They use a di-        on finding all the valid orders of the sub tasks. This is in
alog modeling approach that tightly couples dialog act and         contrast to previous work where only a fixed ordering of
task/sub-task information. There has been some work done           sub-tasks is considered. Our approach consists of several
in direction of unsupervised learning for discovering the di-      steps. We initially split the utterances into agent utterances
alog model. [Zhai and Williams, 2014] proposed three mod-          and user utterances and then analyze these separately.
els to discover the structure of the dialogue. They synthesize
hidden Markov models and topic models to extract the under-
lying structure in dialogues. Their models achieve superior
performance on held-out log likelihood evaluation and an or-
dering task. [Negi et al., 2009] presented a method to build a
task-oriented conversational system from call transcript data
in an unsupervised manner. The work in [Shi et al., 2019]
focuses on task oriented dialog and use Variational Recurrent
Neural Network(VRNN) to extract the dialog structure and
dynamics in dialog.
   Neural dialogue generation has also shown promising re-
sults recently. [Serban et al., 2015; Henderson et al., 2014]
uses generative neural models to produce system responses
that are autonomously generated word-by-word. [Liu et al.,
2018] combined knowledge base with neural dialogue gen-                                  Figure 2: Architecture
eration for generating meaningful, diverse and natural re-
sponses for both factoid-questions and knowledge grounded
chit-chats. [Wu et al., 2018] has shown a method to represent
conversation session into memories upon which attention-           3.1   Cleaning and tagging
based memory reading mechanism can be performed multiple           We remove stop words and then identify the domain-specific
times for generating optimal responses step-by-step. [Bordes       and general-purpose tags from the agent utterances. For
and Weston, 2016] use the Memory Networks to build the di-         example, from the agent utterance ’I can also offer
alog system on DSTC2 dataset [Williams et al., 2016]. Al-          you 5 days at Scarlet Palms Resort, a
though quite a number of attempts have been made to build          3.5 star rating hotel, for 1358.78 USD’.
dialogue systems [Weston, 2016], the use of epistemic rules        In this sentence, we identify the tags like person, location, etc.
in driving the dialogue in a consistent way with the beliefs has   This utterance will be changed to ’I can also offer
not yet been tackled. Various approaches to dialog manage-         you n days at location, a n rating hotel, for
ment and discovering dialog structure have been proposed.          price’. We use Stanford Core NLP [Manning et al., 2014] to
But these approaches failed to take user’s beliefs into account    identify the general-purpose tags. For domain-specific tags,
to tailor the dialogues. [Prabhakaran et al., 2018] analyses       we make use of domain ontology and find domain-specific
how author commitment in text reveals the underlying power         terms from the utterance to replace it with the domain tags
relations and how to incorporate this information to detect the    corresponding to the terms. The tagged data helps us achieve
power direction in actual conversation. [Kawabata and Mat-         clean clusters.
suka, 2018] focuses on the construction of mutual belief in
spoken task oriented dialogues.                                    3.2   Clustering
   For the extraction of latent beliefs, [Chhabra et al., 2018]
and [Sangroya et al., 2018] have shown how beliefs can be             We have observed that the agent utterances follow a stan-
used to design a more meaningful conversation. However, no         dard sequence to help users accomplish a task. On the
work seems to have been done regarding extraction of dialog        other hand, the user utterances have a lot of variations in
structure with latent beliefs from dialog corpus.                  their responses. Hence, for clustering purposes, we are con-
                                                                   sidering the agent utterances exclusively. The idea behind
                                                                   clustering of agent utterances is to identify the states/sub
3   Architecture                                                   tasks in a dialog. For example, the task of booking a ho-
The problem of automatically discovering the dialog model          tel may consist of several sub-tasks. By using clustering,
can be viewed as extracting all the relevant sub-tasks and         we cluster together all the agent utterances that fall into one
its valid ordering. For example, the task of hotel book-           common state. For example, ’Which place are you
ing can have following sub tasks :        destination              planning to go?’ and ’Where would you like
city, budget, hotel rating, location                               to go?’ will be clustered together.


                                                                                                                             31
   We use K-means clustering, where k value is determined            state machine. This finite state machine is now able to tailor
by elbow method, to create clusters. For example, in the ho-         the dialog for the learnt task oriented dialog model.
tel booking domain we create 8 clusters. For each agent ut-
terance, we generate a feature vector, where n-grams words
(n ≤ 2) are used as features. The clusters are used together         4     Identifying Latent Beliefs
with extracted information to determine dialog states.               Understanding a user’s opinion is extremely important to ini-
                                                                     tiate and maintain a meaningful conversation. Every user can
3.3   Deep Learning based Information Extraction                     have a different sentiment, these dissimilar users need differ-
After clustering agent utterances, user utterances are consid-       ent conversational flow and a the set of dialog policy needs
ered. A information extraction model is trained through su-          to be tailored for each case. It is a challenging task to build
pervised learning to provide all the tags for given user utter-      an automatic system that can understand the latent beliefs of
ances. In order to efficiently extract tags, deep neural net-        users. It is possible to handcraft it, such an approach has sev-
works based sequence tagging model is used. Our architec-            eral flaws. An alternative to hand-crafting belief rules is to
ture consists of a bi-directional LSTM network along with a          automatically learn it from a large annotated corpus of ut-
CRF (conditional random field) output layer. For a given sen-        terances and corresponding labeled beliefs [Chhabra et al.,
tence (x1 , x2 , ..., xn ) containing n words, each represented as   2018; Sangroya et al., 2018].
a d-dimensional vector, a bi-LSTM computes the word repre-              Latent belief extraction module is implemented and
sentation by concatenating the left and right context represen-      evaluated in two domains: Student-Advisor domain [Chu-
                 →
                 − ←   −
tation, ht = [ht ; ht ]. We use ELMO embedding computed              laka Gunasekara and Lasecki, 2019] and the Frames
on top of two-layer bidirectional language models with char-         data [El Asri et al., 2017], also used to extract the dia-
acter convolutions as a linear function of the internal network      logue model. Better performance and reduced number
states [Peters et al., 2018]. Next, we map the extracted tags        of turns recorded with tailoring of dialog model. In the
to the states/ sub tasks extracted from the clusters identified in   Student-advisor domain, three belief classes were identi-
the previous step. For example, from an utterance ’I want            fied: curious, neutral, confused. For example,
to visit Denver for 4 days.’, "Denver" and                           a confused student wrote ’I have no sense of
"4 days" will be extracted and mapped to destination                 where I want my life to go and am unable
and number of days for travel. The annotation of this                to determine what classes to take. Can
example can be seen in Figure 3.                                     you help me decide what to do?’, who can
                                                                     be significantly disoriented and may require a one-on-one
                                                                     counseling session with an expert. This category of students
                                                                     need serious attention and a specific flow of questions to help
                                                                     them make a precise selection. An illustrative conversation is
                                                                     shown in Figure 4.

           Figure 3: Example of Input Output Sequence                 Student: Hi, my class selections for next semester are under
                                                                      consideration.What are some suggestions that can be given
                                                                      by you?
3.4   Find Valid Ordering of States                                   Advisor: As for requirements, do you have any left?
                                                                      Student: Not to my knowledge.
The next step is to find the valid order of the states for both       Advisor:Do you have a precise preference as to course selec-
the agent and user. For this, we are using a revised version          tion?
of the Apriori algorithm. Using this algorithm, we can find           Student: I do prefer classes with a lighter work load.
the implications like the sub-task ’Confirm booking’                  Advisor:What do you think about EECS183, Elementary
will always come as the last state, ’Booking’ is always               Programming Concepts? The class is entry level.
preceded by ’dates’ etc. by using the agent and user state
information to find out the implication rules by determining
                                                                         Figure 4: A sample conversation from Student Advisor Domain
the support in the corpus for such transitions. This provides
us with a more appropriate ordering of the sub tasks and what
to do when User provides a response for a particular state.             We identified the sentiments of the users in a hotel
                                                                     booking domain can be broadly classified into 5 cat-
3.5   Intent Classification                                          egories: Flexible, Satisfied, Neutral,
Intent classification needs to be done when a new user ut-           Disappointed, Inflexible. User responses are
terance has to be processed. The intent classifier was imple-        classified to one of these categories to tailor the dialog
mented as a Bi-directional LSTM classification model trained         with the user. For example, ’All I have left in
through supervised learning. Intent classification determines        this life is my burgeoning bank account.
the state in the Dialog model that the new user utterance starts     So no budget, just get me something I’ll
from.                                                                like.’ is categorized as disappointed and there is
   We merge all these components: states extracted, their            no budget constraint for booking, so the states of asking
valid order, and intent classification model to build the finite     hotel rating and amenity can be skipped, and user should


                                                                                                                              32
be suggested the options with high rating and luxurious            User: Hi there, I am from Vitoria and I want to go on a vaca-
amenities.                                                         tion.
                                                                   Wizard: Where would you like to go?
                                                                   User: I would like to go to Santo Domingo.
                                                                   Wizard: Would a 7 day trip work for you?
                                                                   User: Yes that sounds fine, looking to leave on the 19th
                                                                   Wizard: Great, I have a flight departing on the 19th and re-
                                                                   turning on the 25th of August.
                                                                   User: What is the hotel like?
                                                                   Wizard: It is called the Rose Sierra Hotel and it is a 3-star
                                                                   hotel that includes free breakfast, wifi and parking. The total
                                                                   cost would be 2170.90 USD.
                                                                   User: What type of flight is that going to be?
                                                                   Wizard: It is an economy class flight.
                                                                   User: Let’s book it please.
                                                                   Wizard: Perfect. Have a great trip.
               Figure 5: Extracting Latent Beliefs                 User: Thank you.

                                                                          Figure 6: A sample conversation from Frames data
   If an utterance belongs to more critical categories such as
disappointed, it is assigned a higher weight as compared          5.3    Clustering
to category such as of a flexible user. This is instinc-
tive that the user who is disappointed would need a different     For clustering, we evaluated results by comparing the results
response and dialog policy. We used LSTM based classifi-          against a manually tagged data-set for 7 clusters. We achieved
cation model on five categories. A high level architecture is     an accuracy of 89%.
illustrated in Figure 5                                           5.4    Latent Belief Extraction
                                                                  For the Student-Advisor domain, total number of 3500 utter-
4.1    Epistemic Reasoning over Latent Beliefs                    ances including the paraphrases of those utterances, labelled
The extracted latent beliefs and the domain knowledge             across 3 categories were trained on an LSTM based classifier.
trigger the epistemic rules.       For example "Belief            The model achieved an accuracy of 84%.
(disappointed) and Budget(high) =>                                   For Frames dataset, the similar classifier was used as in the
Knows-Agent (user to be given luxury                              student-advisor domain. In this domain, accuracy of 87% was
suite with all amenities), Knows-Agent                            achieved over 5 classes.
(skip-state(ask hotel rating))"asserts facts
about the current epistemic state of the agent. The epistemic     6     Conclusion
logic is written using prolog in the working system. The          This paper presents a framework to automatically extract a
beliefs and epistemic rules helped tailor the dialog to the       dialog model and latent beliefs from transcribed dialog cor-
customer expectations.                                            pus with good results at each component level. Our approach
                                                                  takes latent beliefs of customers into account to tailor the fi-
                                                                  nite state machine to give better and more personalized ex-
5     Experiments and Results                                     perience. Our experimental evaluation demonstrates the effi-
                                                                  cacy of the proposed methods.
5.1    Dataset
We used Frames data-set [El Asri et al., 2017] consists of        References
conversations for finding an appropriate vacation package.        [Bangalore et al., 2008] S. Bangalore, G. Di Fabbrizio, and
The corpus has 1369 human-human dialogs with an average             A. Stent. Learning the structure of task-driven hu-
of 15 turns per dialog, for a total of 19986 turns in the data-     man–human dialogs.         IEEE Transactions on Audio,
set. We used human-human task oriented conversations as             Speech, and Language Processing, 16(7):1249–1259, Sep.
they include the real-world contexts and are rich in terms of       2008.
user beliefs. A sample conversation from this dataset can be
                                                                  [Bordes and Weston, 2016] Antoine Bordes and Jason We-
seen in Figure 6.
                                                                    ston. Learning end-to-end goal-oriented dialog. CoRR,
                                                                    abs/1605.07683, 2016.
5.2    Information Extraction
                                                                  [Chhabra et al., 2018] Aishwarya Chhabra, Pratik Saini,
We have taken 10400 utterances from around 1400 dialogs.            Amit Sangroya, and C. Anantaram. Learning latent be-
We annotated these utterances with around 50 tags like bud-         liefs and performing epistemic reasoning for efficient and
get, str date, n adults, price, etc. An accuracy of 97.23%          meaningful dialog management. CoRR, abs/1811.10238,
achieved in training and 93.54% in testing phase.                   2018.


                                                                                                                             33
[Chulaka Gunasekara and Lasecki, 2019] Lazaros      Poly-      [Sangroya et al., 2018] Amit Sangroya, C. Anantaram,
  menakos Chulaka Gunasekara, Jonathan K. Kummerfeld              Pratik Saini, and Mrinal Rawat. Extracting latent beliefs
  and Walter S. Lasecki. Dstc7 task 1: Noetic end-to-end          and using epistemic reasoning to tailor a chatbot. In
  response selection. In 7th Edition of the Dialog System         Proceedings of the Twenty-Seventh International Joint
  Technology Challenges at AAAI 2019, January 2019.               Conference on Artificial Intelligence, IJCAI-18, pages
[El Asri et al., 2017] Layla El Asri, Hannes Schulz, Shikhar      5853–5855. International Joint Conferences on Artificial
                                                                  Intelligence Organization, 7 2018.
   Sharma, Jeremie Zumer, Justin Harris, Emery Fine, Rahul
   Mehrotra, and Kaheer Suleman. Frames: a corpus for          [Serban et al., 2015] Iulian Vlad Serban, Alessandro Sor-
   adding memory to goal-oriented dialogue systems. In Pro-       doni, Yoshua Bengio, Aaron C. Courville, and Joelle
   ceedings of the 18th Annual SIGdial Meeting on Discourse       Pineau. Hierarchical neural network generative models for
   and Dialogue, pages 207–219. Association for Computa-          movie dialogues. CoRR, abs/1507.04808, 2015.
   tional Linguistics, 2017.                                   [Shi et al., 2019] Weiyan Shi, Tiancheng Zhao, and Zhou
[Feng et al., 2005] Junlan Feng, Patrick Haffner, and Mazin       Yu. Unsupervised dialog structure learning. CoRR,
   Gilbert. A learning approach to discovering web page se-       abs/1904.03736, 2019.
   mantic structures. In Proceedings of the Eighth Interna-    [Weston, 2016] Jason Weston. Dialog-based language learn-
   tional Conference on Document Analysis and Recognition,        ing. In Proceedings of the 30th International Confer-
   ICDAR ’05, pages 1055–1059, Washington, DC, USA,               ence on Neural Information Processing Systems, NIPS’16,
   2005. IEEE Computer Society.                                   pages 829–837, USA, 2016. Curran Associates Inc.
[Henderson et al., 2014] Matthew Henderson, Blaise Thom-       [Williams et al., 2016] Jason D. Williams, Antoine Raux,
  son, and Steve Young. Word-based dialog state tracking          and Matthew Henderson. The dialog state tracking chal-
  with recurrent neural networks. In Proceedings of the           lenge series: A review. D&D, 7(3):4–33, 2016.
  15th Annual Meeting of the Special Interest Group on Dis-    [Wu et al., 2018] Xianchao Wu, Ander Martinez, and Momo
  course and Dialogue (SIGDIAL), pages 292–299. Associ-           Klyen. Dialog generation using multi-turn reasoning neu-
  ation for Computational Linguistics, 2014.                      ral networks. In Proceedings of the 2018 Conference of the
[Kawabata and Matsuka, 2018] Y. Kawabata and T. Mat-              North American Chapter of the Association for Computa-
  suka. How do people construct mutual beliefs in task-           tional Linguistics: Human Language Technologies, Vol-
  oriented dialogues? In 2018 Asia-Pacific Signal and Infor-      ume 1 (Long Papers), pages 2049–2059. Association for
  mation Processing Association Annual Summit and Con-            Computational Linguistics, 2018.
  ference (APSIPA ASC), pages 1299–1304, Nov 2018.             [Zhai and Williams, 2014] Ke Zhai and Jason D Williams.
[Liu et al., 2018] Shuman Liu, Hongshen Chen, Zhaochun            Discovering latent structure in task-oriented dialogues. In
   Ren, Yang Feng, Qun Liu, and Dawei Yin. Knowledge              Proceedings of the 52nd Annual Meeting of the Associa-
   diffusion for neural dialogue generation. In Proceedings       tion for Computational Linguistics (Volume 1: Long Pa-
   of the 56th Annual Meeting of the Association for Compu-       pers), pages 36–46, Baltimore, Maryland, June 2014. As-
   tational Linguistics (Volume 1: Long Papers), pages 1489–      sociation for Computational Linguistics.
   1498. Association for Computational Linguistics, 2018.
[Manning et al., 2014] Christopher D. Manning, Mihai Sur-
  deanu, John Bauer, Jenny Finkel, Steven J. Bethard, and
  David McClosky. The Stanford CoreNLP natural lan-
  guage processing toolkit. In Association for Computa-
  tional Linguistics (ACL) System Demonstrations, pages
  55–60, 2014.
[Negi et al., 2009] S. Negi, S. Joshi, A. K. Chalamalla, and
  L. V. Subramaniam. Automatically extracting dialog mod-
  els from conversation transcripts. In 2009 Ninth IEEE In-
  ternational Conference on Data Mining, pages 890–895,
  Dec 2009.
[Peters et al., 2018] Matthew E. Peters, Mark Neumann,
   Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton
   Lee, and Luke Zettlemoyer. Deep contextualized word
   representations. In Proc. of NAACL, 2018.
[Prabhakaran et al., 2018] Vinodkumar           Prabhakaran,
   Premkumar Ganeshkumar, and Owen Rambow. Author
   commitment and social power: Automatic belief tag-
   ging to infer the social context of interactions. CoRR,
   abs/1805.06016, 2018.


                                                                                                                     34