<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cluster-Based Graphs for Conceiving Dialog Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jean-Leon Bouraoui</string-name>
          <email>jeanleon.bouraoui@orange.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincent Lemaire</string-name>
          <email>vincent.lemaire@orange.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Orange Labs</institution>
          ,
          <addr-line>2, avenue Pierre Marzin 22300 Lannion</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>17</fpage>
      <lpage>32</lpage>
      <abstract>
        <p>We describe an unsupervised modeling of the structure of task-oriented dialogs. The dialogs are related to a given domain (for instance, train booking). This modeling is used as a basis for conceiving a dialog system architecture. The modeling is represented by a graph. It displays the main stages of the dialogs and the transitions between them. Our approach consists in applying coclustering to a representative dialog corpus. Thus we obtain main topics that appear in the corpus. We then compute the topics transitions within each dialog. The resulting graph describes the main topics in the corpus, and their overall sequential organization.</p>
      </abstract>
      <kwd-group>
        <kwd>Co-clustering</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Dialog Systems</kwd>
        <kwd>Graphs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        In artificial intelligence field, dialog systems are gaining popularity with the
general public; especially as they benefit from advances in understanding of the
contents and contexts. Mobile applications such as Siri (Apple), Google Now
(Google), Cortana (Microsoft) or Alexa (Amazon) are the most popular. To
quantify this growing interest in the technology of dialog interfaces, and dialog
systems in particular, let us cite the recent study by the analyst firm Gartner
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. It places dialog systems among the 10 strategic technologies for 2017.
      </p>
      <p>One of the current trends is to propose software devices to assist the design
of dialog systems. These devices are customizable according to the conceiver
needs, and the field of application (for example, reservation of trips, ordering of
products or services, etc.). One of the matters of these devices is that they can
hardly be set up quickly. Indeed, there is currently no generic system, and the
adaptation of a dialog system to a given application field takes time.</p>
      <p>In this context, we present a methodology to set up a semi-automatic
assistance solution for the creation or adaptation of a dialog system for any
application field. We describe the details of the process that we set up, and our position
in regard to the related works.</p>
    </sec>
    <sec id="sec-2">
      <title>Description of the Problem and Related Works</title>
      <p>In this article, we will denominate a dialog as an exchange of information between
two interlocutors (knowing that a dialog can involve more than two
interlocutors). An interlocutor can be a human or a machine (in a broad sense: an artificial
system, software or hardware). We are interested in the finalized dialogs, which
aim to achieve a goal: the interlocutors will collaborate to achieve this goal.</p>
      <p>We call “dialog corpus” a set of n dialogs relating to a particular domain. Such
a set can be composed, for instance, of transcripts of train reservation dialogs, or
of interaction chats between a phone provider advisor and a client. Each dialog
is composed of t speech turns; a speech turn corresponds to what is said by one
of the interlocutors without any interruption (usually one or more sentences).</p>
      <p>In a first step, we try to associate each speech turn with a given “class”
denoted Lc. A class corresponds to the intention which the interlocutor expresses
with his speech turn; let us take as an example a dialog between a customer and
the customer service of a phone provider: the speech turn where the agent asks
the customer to identify himself belongs to a specific class (named for example
AskIdent ); the one where the customer identifies himself is relative to another
class (named for example AnswerIdentCustomer ).</p>
      <p>Next, the topics Tt, which group a set of classes relative to a common subject,
are determined. Let us use again our example of the customer service: the topics
can be the identification of the client (IdentCustomer, the corresponding classes
being the request by the adviser, and the response of the client), the discussion
of the problem (ExpProb, the corresponding classes being the presentation by
the customer, and the request for precision by the advisor), etc.</p>
      <p>The association of the topics to the speech turns of the different dialogs can
be represented as in Fig. 1.</p>
      <p>The speech turns are grouped in classes, classes in topics, and topics in dialog
as represented, in a simplified way, in Fig. 2.</p>
      <p>Our first goal is to automatically determine: (i) classes; (ii) topics; (iii) the
transitions between topics within each dialog. Thus we could obtain a
representation of the typical patterns of the dialogs from the corpus. The desired
representation is a directed graph showing the main transitions between topics
among all dialogs. Such a (simplified) representation is illustrated in Fig. 3.</p>
      <p>This representation presents multiple interests. The main one is the
initialization of the design of the dialog system. It can be used as a basis for modeling
the architecture of a specialized dialog agent on the target domain. Its execution
is thus facilitated and accelerated. Usually, this task is mostly done manually.
Either from an a priori representation of the designer of the possible dialogs
related to a task and a given domain. Or a posteriori, from the consultation of
existing corpus; in both cases, the process is costly in time.</p>
      <p>Moreover, the graph, as well as the steps taken to obtain it, corresponds to
the most relevant information about the dialogs. They will enable the designer,
without prior knowledge of the field of application, to have a first understanding
of the thematic content of the dialogs, their structuring, for the realization of
the dialog system.</p>
      <p>Related Works It is possible to group in three categories the different approaches
used in the literature to tackle this problem. The grouping of the works is based
on whether they identify the topics and their sequentiality, or they use Deep
Learning, or they adopt an ad hoc approach.</p>
      <p>
        – Identification of the topics and sequences of topics: the works belonging to
this category differ mainly according to the method used to identify the
topics of the dialogs. For example, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] use clusters for this purpose,
whereas [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] use the Latent Dirichlet Allocation (LDA) to identify
the topics in a document or set of textual documents. In all these studies,
the authors use the Hidden Markov Models (HMM) to model the transitions
between topics;
– Deep Learning: to our knowledge, the first article that proposed the use of
techniques belonging to Deep Learning for modeling dialog systems is [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
The authors use the seq2seq model to model in a recurrent neural network
the sequence of speech turns between interlocutors. Two fields of
application are described, one of which pertaining to finalized dialogs (computer
troubleshooting);
– Other approaches: we group in this category the works that use ad hoc
methods instead of, or in addition to, data modeling. Thus, [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] apply software
heuristics to constitute a model of dialogs from an application database.
Heuristics are also used in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] to move from a cluster representation
to a dialog model. We also cite [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] which integrates Information Retrieval
techniques into the architecture of a dialog system.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>The Methodology Employed</title>
      <p>To summarize, our methodology consists, firstly to identify the main topics that
appear in the set of dialogs. In a second time we model the relations between
these topics among the different dialogs. Thus, we obtain a graph which represent
the prototypical dialog continuity in the application field to which the dialog
corpus belongs. This section describes the details of this methodology. But first,
we address the matter of the relevance of modeling a dialog as a graph.
3.1</p>
      <sec id="sec-3-1">
        <title>Preamble: Hypothesis of a Correspondence between Dialog and</title>
      </sec>
      <sec id="sec-3-2">
        <title>Graph</title>
        <p>
          Our approach relates to the first category of the state of the art mentioned above:
the identification of the topics and sequences of topics. In our perspective, each
speech turn from the dialogs corresponds to a text document, being composed
of words. They can be represented by a bipartite graph; in this type of graphs,
one set of nodes corresponds to the source of the edges, and the other set to
the target of the edges ([
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], p. 37-41). In the context of dialog, one set of the
bipartite graph would correspond to the set of speech turns, and the other to the
words set. This bipartite graph can reciprocally be represented as a texts/words
adjacency matrix, where the lines correspond to the speech turn set from the
dialog corpus, and the columns to the word set from the same corpus. Figure 4
illustrates this twofold perspective.
        </p>
        <p>
          It is also well-known that the co-clustering performs well on bipartite graph
problems. It has for instance been demonstrated on the Nova, Gina and Hiva
datasets ([
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], p. 31-32).
        </p>
        <p>
          Consequently, our assumption is that a co-clustering method would be
interesting to tackle the problem described above. We expect the turn of speech to
be distributed in different classes which differ in the vocabulary they use. We
also postulate that a classic clustering approach would only work to a single
dimension of the graph (either the axis of the words or of the speech turns),
and consequently, it does not take into account the underlying structure of the
texts/words graph. This hypothesis is supported by the superior predictive
performance of a co-clustering approach on the Nova text datasets ([
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]). The works
of [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] and [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] also confirm the interest of applying co-clustering to Natural
Language Processing tasks. Besides, the uses of co-clustering as an unsupervised
classifier, or for data interpretation in bio-informatics, have also been validated
(respectively [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]).
3.2
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Determining Classes and Topics</title>
        <p>We use here a technique of coclustering to obtain a “copartition” of the matrix
words x speech turns. The aim of co-clustering is to discover the “best
reordering and grouping” of lines and columns: grouping turns of speech and words in
clusters. The notion of “best reordering and grouping” is a notion of “contrast”:
(i) the “best reordering and grouping” has a maximum contrast with respect to
what would occur if counts were distributed at random keeping the marginals
of each cluster fixed; (ii) “best reordering and grouping” maximizes the mutual
information between the two clusterings. In the dialog context, one of the
partitions correspond to the group of speech turns, and the other to the group of
words.</p>
        <p>
          Given these two categorical variables, their simultaneous partitioning is
achieved: the values of categorical variables are grouped into clusters - which
amounts to a coclustering problem. The product of partitions forms a
multivariate partition of the representation space, i.e., a grid or a matrix of cells, as
represented in Fig.4. It also represents a joint density estimator of the variables.
In order to choose the “best” grid (knowing the data) of the model space, a
Bayesian approach called Maximum A Posteriori (MAP) is used. The method
used is based on the MODL approach described in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The analysis
of the observations linking both entities brings naturally mutually consistent
similarity notions on both entities.
        </p>
        <p>The MODL approach allows to obtain the best quality co-partition of the
speech turns/words matrix. The method finds by itself the right number (K⇤ )
of clusters: the number of coclusters is therefore an output; the method does
not require any parameter, which is useful for non-expert people. We discuss
in section 4.1 the notion of the quality of a co-cluster. As with any automatic
learning method, this approach requires a quantity of data: a minimum volume
is required for clustering to be relevant. We observed some relevant results on a
relatively small corpus of 73 dialogs (5.010 speech turns)1.</p>
        <p>
          A tool for the visualization of the generated coclusters is then used [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. It
allows a fine analysis and a “profiling” of the clusters obtained. We do not describe
here all the details, but the user may navigate using a hierarchical ascending
clustering from the best fine grid (using K⇤ clusters) to a granularity of the grid
needed for his analysis (using K0 coclusters, K0 &lt; K⇤ ) while controlling either
the number of parts or the rate of information. The tool also allows to tag the
obtained clusters (for example using keywords instead of numerical IDs).
3.3
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Determining Transitions between Topics</title>
        <p>In our approach of the problem, a topic corresponds to a cluster of “classes”,
itself constituted of speech turns (cf. section 2). A topic defined in this way can
be linked to one or more other topics, depending on the observed frequency of
their successions in the dialogs of the corpus.</p>
        <p>From there, each dialog can be browsed as a sequence of clusters/topics.
Therefore, it is possible to calculate, on the whole set of dialogs, the frequencies
of transition between each cluster. The beginning and the end of each dialog are
taken into account so as to avoid erroneous transitions.
1 The “Air France corpus”, available at the url: http://www.loria.fr/projets/
asila/corpus_en_ligne.html</p>
        <p>We assume that, depending on its position (its moment of occurrence) in the
dialog, a given turn of speech is more likely to belong to a given topic (i.e. a
cluster) than another; the position information is therefore taken into account
during the process. One of the objectives of our work is to verify the validity of
this assumption.</p>
        <p>The representation obtained is a directed graph, whose nodes are the clusters,
and the arcs are the successions between clusters. The co-clustering, applied to
the dialog corpus, removes any information about the sequential order of topics.
Therefore, the generation of the graph from the clusters requires several stages
to retrieve these information. They are described below:
1. The first phase consists in obtaining a standardized corpus. This is done
using a pre-processing tool, internal to Orange Labs and not described here.
This tool is parameterized with the list of stopwords used for the French
stemming by the NLTK library 2. This list of stopwords makes it possible to
delete words a priori useless for learning (e.g. articles, prepositions, etc.);
2. The representation obtained is then used as an input for the aforementioned
coclustering approach. It generates the clusters to which each speech turn of
the corpus belongs.</p>
        <p>Depending on the maximization of the contrast between the distribution of
the co-parts (here: the word coclusters), the utterances are associated to
the classes (here: clusters of speech turns) and the distribution expected
under the independence hypothesis (knowing the marginals). The partitions
induced by a coclustering on the two entities are clusterings. The notion of
associated similarity is related to how individuals in a cluster of one entity
distribute themselves on clusters of the other entity.</p>
        <p>Thus, a set of clusters of speech turns is obtained. Each one corresponds to a
given topic, and is labeled by a unique identifier (ID). The ID is automatically
attributed during the coclustering;
3. The cluster IDs are then projected onto the initial corpus, in order to retrieve
the sequentiality of the clusters. In other words, every speech turn in the
initial corpus is thus associated with the identifier of the cluster to which
the turn belongs.</p>
        <p>The transition frequencies between clusters are stored in a matrix. It is used
to generate the corresponding graph. It is possible to choose a minimum
threshold of transition frequencies, from which the transitions are displayed. This makes
it possible to generate graphs more or less complex to read: only the clusters
linked to others with a transition frequency equal to or above the threshold
are displayed. On the other hand, the possibility of displaying only transitions
greater than a threshold makes it possible to limit the possibility to take into
account some irrelevant transitions. Figure 5 displays the whole process chain of
our system.</p>
        <sec id="sec-3-4-1">
          <title>2 http://www.nltk.org/nltk_data</title>
          <p>Fig. 5. Processing phases
4.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Results</title>
      <sec id="sec-4-1">
        <title>Overview of Existing Evaluation Protocols</title>
        <p>
          Our approach, described above, relies heavily on the quality of the co-clusters
obtained. It is consequently mandatory to evaluate the quality of the clusters
to assess the quality of the whole generated dialog graph. There is hardly some
precise method to directly evaluate the quality of a text clustering, contrary
to supervised methods. Indeed, supervised methods rely on labeled data and
the corresponding set of labels. By definition, it is not the case of unsupervised
approaches such as co-clustering (cf. notably [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], p. 574, section 1 and [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], p.
244, section 6.3).
        </p>
        <p>The simplest method would be to use an already annotated corpus as a
baseline (such as the Dialog State Tracking Challenge (DSTC) corpus3). We
would perform a co-clustering on this corpus. Then, we would use the Adjusted
Rand Index (ARI) to assess the validity of the clusters thus obtained with regard
to the handcrafted annotations.</p>
        <p>
          Another possible but indirect way would be to carry out a clustering of the
same data by the k-means method. In that case, the Davies-Bouldin Index ([
          <xref ref-type="bibr" rid="ref10">10</xref>
          ])
would be used to select the number of clusters. The result would then be used
as a baseline: a domain expert would then determine if the baseline clusters
obtained by k-means are more or less interpretable than the co-clusters obtained
with the MODL approach.
        </p>
        <p>
          Another way to assess the quality of the coclusters is defined with respect
to the partition of the clusters, in regards to the independence hypothesis. The
most useful indicator in this context is the Mutual Information ([
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]). If
the documents (here, the dialogs) and the words were distributed randomly, the
mutual information would be equal to 0% (independence between all the words).
On the contrary, the Mutual Information would be equal to 100% if the clustering
would result in one cluster for each word. But then the results would not allow
any interpretation. The MODL approach remedies to this issue by the regulation
of the Mutual Information. It consists in partitioning the clusters in such a way
that the Mutual Information is maximized, whatever the number of clusters is.
This allows to select the best trade-off between the value of Mutual Information
and a human interpretable number of clusters. This number is “regularized” since
it is controlled (for more details, see [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]).
        </p>
        <p>
          This matter also relates to the issue of the error analysis of the model. There
can be some errors of construction of the cluster (average distance in regard to
a k-means). For the clustering by k-means of data such as text, the approach
Silhouette ([
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]) is often used. It consists in determining if a value in a given
cluster is adapted to this cluster, by swapping this value among all others
clusters. For co-clustering, this corresponds to the typicity of a value; more precisely,
there is one typicity for each variable of the co-clustering (for instance, one for
        </p>
        <sec id="sec-4-1-1">
          <title>3 https://www.microsoft.com/en-us/research/event/</title>
          <p>dialog-state-tracking-challenge
the speech turns and one for the words); usually only one of these typicity is
used (in our example, the typicity of the speech turns).
4.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Use Case</title>
        <p>The data are a corpus of chat dialogs of online assistance (phone company
customers speaking with human advisors). This corpus contains 3.012 dialogs. The
average length of the dialogs is about 15 speech turns, for a total of 49.004
different speech turns and 21.700 different words. We applied to this corpus
the pre-processing and the coclustering mentioned in the previous section. We
obtained 24 different clusters regrouping the speech turns of the corpus.</p>
        <p>After the co-clustering phase evaluated in section 3.2, we carried out the
generation of the graphs (phase 3 described in section 3.3).</p>
        <p>By applying different transitions threshold values to generate the graphs,
we obtained globally relevant graphical visualizations. We analyzed the contents
of the clusters obtained with a high threshold. We observed that most of the
selected clusters correspond to very specific phases of this category of dialogs:
phases of greetings, identification of the client, description of the problem
encountered, elaboration of a solution by the agent, thanks and end of the dialog.</p>
        <p>The scheduling of these clusters/topics in terms of dialog architecture also
displays a typical sequential continuity of these phases, notwithstanding some
clustering errors. These errors belongs to two main categories: cluster
heterogeneity (one cluster mixes several topics instead of a single one) or cluster redundancy
(several clusters correspond to the same topic).</p>
        <p>As an illustration, Fig.6 presents one of the graphs obtained. For the
purposes of readability, this simplified graph displays only 17 clusters: those which
are connected with more than 10% of the total number of observed transitions
between clusters in the initial graph. The numbers appearing next to each arc
correspond to the transition percentage. Each cluster is labeled by a set of 5 of
the words which participate the most to the cluster (in terms of Mutual
Information).</p>
        <p>We used a specific tool (internal to Orange Labs) for a better handling of these
graphs. This tool allowed us to rename the clusters, to merge some of them, and
to make the presentation even more readable. More generally, this tool allowed
us to carry out a fine-grained analysis of the contents of the clusters. An instance
of the most user-friendly visualization is displayed in Fig. 7; it is generated from
the same clusters and data described above.</p>
        <p>From a qualitative perspective, we evaluate the results obtained according to
two criteria. The first is the homogeneity of the clusters obtained. The second is
the quality and regularity of transitions between clusters; this second criterion
strongly depends on the first. Evaluation is underway by the authors.</p>
        <p>The results thus obtained were used by an ergonomist to bootstrap a first
architecture of a dialog system corresponding to this application field, without
any prior knowledge of the domain.</p>
        <p>Some of the clusters were also used to represent the intents of the users. In
dialog analysis, an intent corresponds to one of the tasks that the user wants to
BEGIN
57.18%
17.42%
17.28%
service__welcome__call__commercial__chat
thanks__waiting__line__having__glad
17.03%</p>
        <p>END
total: 40.21</p>
        <p>Fig. 6. Instance of an initial graph
perform (for instance, knowing the price of a phone call)4. Here, the
correspondence between an intent and a given cluster allows to consider the speech turns
of this cluster as some different formulations of the intent. From this point of
view also, our approach allows a fast bootstrap of the dialog system.</p>
        <p>As a side note, we can also mention the use of our approach for another
application field. This field is related to a personal home assistant conceived in
Orange. A first prototype has been used by several families. The collected data
notably include the vocal interactions between the human users and the assistant.
These interactions aim to perform tasks such as setting up alarms, giving the
weather, etc. We applied our approach on the interactions (3.322 speech turns).
We showed that most of the speech turns clusters are specific to the different
possible commands, and to the vocabulary associated. Further investigations are
still in progress.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Note on the results reproducibility: the actual implementation of the</title>
        <p>MODL approach is the Khiops suite. It is accessible through Internet (5). Thus,
any researcher who would want to reproduce our results could access to the same
implementation that we used.
4.3</p>
      </sec>
      <sec id="sec-4-4">
        <title>Note on Clustering</title>
        <p>
          We do not have enough room to detail the results but we also tried to use
a k-means as a baseline to benchmark the quality of the co-clusters obtained
with MODL. We used an usual k-means clustering approach on the same data;
center-reduction of features, Kmeans++ for initialization and 20 replicates. We
analyzed the results in regards to the Davies-Bouldin Index; the value of this
Index has to be the lowest possible to be optimal (cf. Fig. 8).
4 For more details on this topic, cf. [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], chap. 4.
5 https://khiops.predicsis.com
        </p>
        <p>The k-means does not clearly indicates a clear number of clusters since many
values of k, for k 2 { 3, 60} produce a close value for this index. Besides, the
kmeans seems to indicate that this corpus is only constituted of 2 “classes”6. That
is clearly not true for our dialog corpus: our analysis shows that it correspond to
a larger number of topics. Consequently, the k-means seems not be be the right
algorithm for this kind of purpose.</p>
        <p>
          The complexity of the algorithm must also be taken into account, notably
in terms of memory and time complexities. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] demonstrated that the MODL
co-clustering algorithm displays high performances in regard to the main other
discretization methods.
        </p>
        <p>Besides, k-means clustering needs, as an input parameter, the a priori desired
number of clusters. The user has to determine empirically this number; this
induces the necessity to possibly carry out several times the clustering, until
observation of relevant results. On the contrary, the MODL approach computes
itself the optimal number of cluster, and produces it as an output.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We presented an unsupervised method to obtain a first version of an architecture
of a dialog system, irrespective of its application field. The goal is to avoid a “cold
start” conception of the system. The method consists of two main steps: firstly,
the application of a coclustering algorithm on a corpus of dialogs belonging
to the concerned application field; on the other hand, taking into account the
sequentiality of the clusters obtained to represent the prototypical process of
a dialog as a graphical representation; finally, many graphs with more or less
fine grained level can be generated and used directly for the conception of the
system.</p>
      <p>We showed that our co-clustering approach of the issue shows better results
than a classical clustering method. We also observed that the sequential property
of the dialogue architecture is well captured by our approach.</p>
      <p>There are still many scientific problems to tackle. Several are related to the
clustering. We refer notably to the problematic of cluster homogeneity: how to
determine the most representative words or rounds of words of the cluster? The
matter of the selection of the most relevant clusters, (the most homogeneous
ones with regard to a given topic), also arises. It is correlated with the optimal
granularity/size of the clusters. It would also be interesting to study the efficiency
of linguistic optimizations of the corpus: for example, the lemmatization of words
or the neutralization of Named Entities. We have not used them in our work, but
they could be of interest to reduce the number of parameters used for clustering.
In a next version, we would also like to evaluate more thoroughly the quality
of the clusters, notably with the use of the DSTC corpus as a baseline, such as
mentioned in section 4.1.
6 Since it is with this number of classes that the Davies-Bouldin Index has its lowest
value (1).</p>
      <p>Concerning the modeling of the succession of clusters, many problems also
arise. As seen in the literature, it is often an HMM that is used for this purpose.</p>
      <p>As seen in section 4.2, the graphs obtained are always very complex initially.
To overcome this difficulty, we set up some transitions threshold to limit the
complexity of the displayed graphs. But there are more systematic and robust
techniques to reduce the initial complexity of the graphs. For instance, we
consider using the spectral clustering technique.</p>
      <p>Finally, the question arises of the reusability of the information obtained (in
particular the classes and topics) from one domain to another. A deployment of
this information to domains close to the initial domain is feasible and provided
by the Khiops suite. For more remote domains, we envision a transfer learning
approach.</p>
      <p>Acknowledgements We thanks Nicolas Voisine, who provided us the pre
proccessing tool for Khiops, Fabien Dupont and Laurent Roussarie for their essential
participation to the clustering and graph visualization process chain, and Marc
Boulle for his advices. We also thanks Nathalie Legay for her participation to
the graph analysis. Thanks also to Romain Laroche, who contributed to the
introduction of this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abdun</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maher</surname>
            ,
            <given-names>M. J.: Heart</given-names>
          </string-name>
          <string-name>
            <surname>Disease Diagnosis Using</surname>
          </string-name>
          Co-clustering.
          <source>Scalable Information Systems: 5th International Conference, INFOSCALE 2014</source>
          , Seoul, South Korea,
          <source>September 25-26</source>
          ,
          <year>2014</year>
          , Revised Selected Papers,
          <fpage>61</fpage>
          -
          <lpage>70</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alexandersson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reithinger</surname>
          </string-name>
          , N. :
          <article-title>Learning Dialogue Structures From A Corpus</article-title>
          .
          <source>Eurospeech</source>
          <year>1997</year>
          ,
          <fpage>8</fpage>
          -
          <lpage>15</lpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bangalore</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , DiFabbrizio, G.,
          <string-name>
            <surname>Stent</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Learning the Structure of Task-driven Human-human Dialogs</article-title>
          .
          <source>Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL</source>
          ,
          <fpage>201</fpage>
          -
          <lpage>208</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Boulle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guigoures</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Analyse exploratoire par k-Coclustering avec Khiops CoViz</article-title>
          .
          <source>Advances in Knowledge Discovery and Management</source>
          , volume
          <volume>527</volume>
          ,
          <fpage>15</fpage>
          -
          <lpage>35</lpage>
          , (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Boullé</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Data grid models for preparation and modeling in supervised learning</article-title>
          .
          <source>Hands-On Pattern Recognition: Challenges in Machine Learning</source>
          , volume
          <volume>1</volume>
          ,
          <string-name>
            <surname>Guyon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Cawley</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Dror</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Saffari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <volume>99</volume>
          -
          <fpage>130</fpage>
          ,
          <string-name>
            <surname>Microtome</surname>
            <given-names>Publishing</given-names>
          </string-name>
          , (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Boullé</surname>
            ,
            <given-names>M. :</given-names>
          </string-name>
          <article-title>MODL: a Bayes optimal discretization method for continuous attributes</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>65</volume>
          (
          <issue>1</issue>
          ):
          <fpage>131</fpage>
          -
          <lpage>165</lpage>
          , (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Candillier</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tellier</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torre</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bousquet</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Cascade evaluation of clustering algorithms</article-title>
          .
          <source>17th European Conference on Machine Learning (ECML</source>
          <year>2006</year>
          ), Berlin (Germany),
          <source>LNCS 4212</source>
          ,
          <fpage>574</fpage>
          -
          <lpage>581</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Chalamalla</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Negi</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramaniam</surname>
            ,
            <given-names>L. V.</given-names>
          </string-name>
          :
          <article-title>Identification of Class Specific Discourse Patterns</article-title>
          . CIKM '
          <volume>08</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chotimongkol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Learning the Structure of Task-Oriented Conversations from the Corpus of In-Domain Dialogs</article-title>
          . Carnegie Melon University,
          <source>PhD Thesis</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Davies</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bouldin</surname>
            ,
            <given-names>D. W.:</given-names>
          </string-name>
          <article-title>A Cluster Separation Measure</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          . PAMI-
          <volume>1</volume>
          (
          <issue>2</issue>
          ):
          <fpage>224</fpage>
          -
          <lpage>227</lpage>
          , (
          <year>1979</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Dhillon</surname>
            <given-names>I. S.:</given-names>
          </string-name>
          <article-title>Co-clustering documents and words using bipartite spectral graph partitioning</article-title>
          .
          <source>KDD '01 Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <fpage>269</fpage>
          -
          <lpage>274</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Forni</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Gartner Identifies the Top 10 Strategic Technology Trends for 2017</article-title>
          . Article on Gartner Website http://www.gartner.com/newsroom/id/3482617 (last consulted in June 2017)
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Gabor</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zargayouna</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tellier</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buscaldi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Charnois</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Unsupervised Relation Extraction in Specialized Corpora Using Sequence Mining</article-title>
          .
          <source>XVIth Symposium on Intelligent Data Analysis, Oct</source>
          <year>2016</year>
          , Stockholm, Sweden.
          <fpage>237</fpage>
          -
          <lpage>248</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Guerraz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gay</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemaire</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clerot</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Analyse exploratoire par k-Coclustering avec Khiops CoViz Atelier CluCo</article-title>
          ,
          <source>Extraction et Gestion des Connaissances (EGC)</source>
          ,(
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Guigourès</surname>
            , R. and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Boullé</surname>
            and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Rossi</surname>
          </string-name>
          <article-title>: Discovering patterns in time-varying graphs: a triclustering approach</article-title>
          .
          <source>Advances in Data Analysis and Classification</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          ,(
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Guigourès</surname>
          </string-name>
          , R.:
          <article-title>Utilisation des modeles de co-clustering pour l'analyse exploratoire des données</article-title>
          . Universite
          <string-name>
            <surname>Pantheon-Sorbonne - Paris</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <source>PhD Thesis</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>D'Haro</surname>
            ,
            <given-names>L. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cordoba</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lucas</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barra-Chicote</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , San-Segundo,
          <string-name>
            <surname>R.</surname>
          </string-name>
          :
          <article-title>Speeding Up the Design of Dialogue Applications by Using Database Contents and Structure Information</article-title>
          .
          <source>SIGDIAL</source>
          <year>2009</year>
          ,
          <volume>160</volume>
          -
          <fpage>169</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Klema</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malinka</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zelezny</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Semantic Biclustering: a New Way to Analyze and Interpret Gene Expression Data</article-title>
          .
          <source>Proceedings of The 12th International Symposium on Bioinformatics Research and Applications (ISBRA)</source>
          , Springer, LNBI
          <volume>9683</volume>
          ,
          <fpage>332</fpage>
          -
          <lpage>3</lpage>
          , (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Laroche</surname>
          </string-name>
          , R.:
          <article-title>Speeding Up the Design of Dialogue Applications by Using Database Contents and Structure Information</article-title>
          .
          <source>18th International Conference on Intelligence in Next Generation Networks</source>
          ,
          <fpage>231</fpage>
          -
          <lpage>238</lpage>
          ,(
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Negi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chalamallay</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramaniam</surname>
            ,
            <given-names>L. V.</given-names>
          </string-name>
          :
          <source>Automatically Extracting Dialog Models from Conversation Transcripts</source>
          .
          <source>2009 Ninth IEEE International Conference on Data Mining</source>
          , (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Paul</surname>
          </string-name>
          , M.:
          <article-title>Mixed membership Markov models for unsupervised conversation modeling</article-title>
          .
          <source>EMNLP-CoNLL '12</source>
          ,
          <fpage>231</fpage>
          -
          <lpage>238</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Rousseeuw</surname>
            ,
            <given-names>P. J.:</given-names>
          </string-name>
          <article-title>Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis</article-title>
          .
          <source>Computational and Applied Mathematics</source>
          .
          <volume>20</volume>
          :
          <fpage>53</fpage>
          -
          <lpage>65</lpage>
          , (
          <year>1987</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Takeuchi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takahashi</surname>
            ,
            <given-names>H. :</given-names>
          </string-name>
          <article-title>Co-clustering with Recursive Elimination for Verb Synonym Extraction from Large Text Corpus</article-title>
          .
          <source>IEICE Transactions on Information and Systems</source>
          Vol. E92.D,
          <fpage>2334</fpage>
          -
          <lpage>2340</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Tur</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mori</surname>
          </string-name>
          , R. D., Eds.:
          <source>Spoken Language Understanding: Systems for Extracting Semantic Information from Speech</source>
          . New York, NY: JohnWiley and Sons (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
          </string-name>
          Quoc V.
          <article-title>: A neural conversational model</article-title>
          .
          <source>International Conference on Machine Learning</source>
          ,
          <fpage>231</fpage>
          -
          <lpage>238</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Discovering Latent Structure in Task-Oriented Dialogues</article-title>
          .
          <source>Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <fpage>36</fpage>
          -
          <lpage>46</lpage>
          , (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>