<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a structured evaluation of improv-bots: Improvisational theatre as a non-goal-driven dialogue system</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maria Skeppstedt</string-name>
          <email>maria.skeppstedt@lnu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Magnus Ahltorp</string-name>
          <email>magnus@ahltorpdata.se</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Applied Computational Linguistics, University of Potsdam</institution>
          ,
          <addr-line>Potsdam</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Computer Science Department, Linnaeus University</institution>
          ,
          <addr-line>Va ̈xjo ̈</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Magnus Ahltorp Datakonsult</institution>
          ,
          <addr-line>Stockholm</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <fpage>37</fpage>
      <lpage>43</lpage>
      <abstract>
        <p>We have here suggested a structured procedure for evaluating artificially produced improvisational theatre dialogue. We have, in addition, provided some examples of dialogues generated within the evaluation framework suggested. Although the end goal of a bot that produces improvisational theatre should be to perform against human actors, we consider the task of having two improv-bots perform against each other as a setting for which it is easier to carry out a reliable evaluation. To better approximate the end goal of having two independent entities that act against each other, we suggest that these two bots should not be allowed to be trained on the same training data. In addition, we suggest the use of the two initial dialogue lines from human-written dialogues as input for the artificially generated scenes, as well as to use the same humanwritten dialogues in the evaluation procedure for the artificially generated theatre dialogues.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Improvisational theatre (or impro/improv) is an art form in
which unscripted theatre is performed. Dialogue,
characters and actions are typically created spontaneously. Through
collaboratively creating a story, the actors can make a new
scene evolve in front of the audience [Wikipedia contributors,
2018].</p>
      <p>Seen from the perspective of artificial intelligence research,
improvisational theatre is a sub-genre of human interaction
that is more forgiving than interaction in general. Errors made
in general interaction are typically seen as a failure, and in
the case of a dialogue system, errors might lead to a dialogue
breakdown. In contrast, errors made within an
improvisational theatre scene are encouraged, and can form an input to
how the scene evolves. It might, therefore, be interesting to
find out how artificially constructed improvisational theatre
bots, which are likely to make errors to a larger extent than a
human, are perceived in this special setting.</p>
      <p>Although there is previous work on the construction of
artificially generated improvisational theatre, there are, to the
best of our knowledge, no descriptions of structured
methods for the evaluation of the dialogues created, and thereby
no method for comparing different approaches for dialogue
generation. According to Serban et al. [2016], even the more
general question of which evaluation method to use for
nongoal-driven dialogue systems (for which improvisational
theatre could be claimed to be a sub-category), is an open one.</p>
      <p>The aim of this paper is therefore to i) provide a suggestion
for a structured procedure for evaluating artificially produced
improvisational theatre dialogue, and ii) give some examples
of dialogues generated within the evaluation framework
suggested.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Previous work</title>
      <p>Creating artificially generated human dialogue is a classical
task within the research field of artificial intelligence, with
the ultimate aim of a bot being able to pass the Turing test.
Dialogue could either be created in the form of a goal-driven
dialogue system, i.e., a system that is meant to be used to
perform a specific task, such as booking a ticket, or in the form
a non-goal-driven system, for which no such task is given.
2.1</p>
      <sec id="sec-2-1">
        <title>Conversational modelling and dialogue systems</title>
        <p>One implementation method for the task of generating
dialogue is to use actual lines (possibly slightly modified) from
an existing dialogue corpus. This approach was, for instance,
applied by Banchs and Li [2012]. They constructed a vector
space model-vector from the previous lines in the dialogue,
i.e., lines either automatically generated or provided by the
human dialogue participant, and measured its distance to
vectors constructed in the same fashion from the dialogues in the
dialogue corpus. The corpus dialogue which had the
closest vector representation was then retrieved, and the dialogue
line from the corpus, which was given in response to the ones
retrieved, was returned as the next line in the dialogue.</p>
        <p>
          Another solution is to generate new sentences, that do not
necessarily have to have been present in the corpus used
for training. For this task, neural network techniques are
typically applied [Vinyals and Le, 2015; Li et al., 2016;
Serban et al., 2016]. For instance, the seq2seq architecture
          <xref ref-type="bibr" rid="ref16 ref8">(perhaps best known for its ability to carry out machine
translation [Sutskever et al., 2014; Luong et al., 2017])</xref>
          , has been
applied for conversational modelling/dialogue generation.
        </p>
        <p>The second approach is intuitively more appealing, since it
gives more flexibility to what kinds of lines that can be
generated. Previous studies have, however, shown examples of
the generative approach resulting in utterances that are fairly
general, as well as examples of that the same utterances are
often repeated. That is, the content that is most commonly
occurring in the training corpus is that which is most typically
being generated, and the potential for flexibility does not
automatically lead to a larger creativity. Instead, dialogue lines
that are generated mainly on the basis of what is very
representative to the corpus might thus be boring in the context of
improvisational theatre (and possibly also in most other
applications of non-goal-driven dialogue systems). It has been
possible to solve the problem of repeated lines, through the
application of reinforcement learning that rewards diversity,
but the examples provided in the paper describing this
approach still include dialogue lines that are rather generic [Li
et al., 2016].</p>
        <p>In addition, we suspect that the generative approach might
require larger dialogue corpora to give usable results,
despite that out-of-domain resources, such as large external
monologue corpora to initialise word embeddings, have been
shown useful [Serban et al., 2016]. Li et al., for instance, used
the OpenSubtitles parallel corpus, which consists of around
80 million source-target pairs, for their generative approach.</p>
        <p>
          Since it is relevant to be able to provide automatically
generated improvisational dialogues also for languages for which
there does not exist a large dialogue corpus and possibly not
even a large out-of-domain corpus, or for sub-genres within
a language
          <xref ref-type="bibr" rid="ref1 ref15 ref17">(e.g., improvised Shakespeare [The Improvised
Shakespeare Company, 2018] or Strindberg [Strindbergs
intima teater, 2012])</xref>
          , it is also important to explore the
performance of methods that are less resource demanding.
Therefore, along with exploring generative approaches, it might
also be relevant to compare these (for different in-domain or
out-of-domain training data sizes) to methods that create
dialogues through the use of existing dialogue lines.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Artificial improvisational theatre</title>
        <p>The use of artificial intelligence as a part of
improvisational theatre has recently been explored by Mathewson and
Mirowski [2017]. Their work included the creation of a
dialogue system that allowed a human improvisation actor to
communicate with a robot that produced lines in response to
lines uttered by the human actor. Two versions of the robot
dialogue were constructed, one version that selected existing
lines in the training corpus, and one version that relied on text
generation techniques.</p>
        <p>The ambitious approach by Mathewson and Mirowski thus
included the use of speech recognition and a text-to-speech
system, which functioned in real-time in front of an
audience. We believe that this set-up is an appropriate goal for
artificial intelligence-powered improvisational theatre, in
particular their choice of including a human actor as one of the
participants in the dialogue. We suspect, albeit without
being able to provide any substantial basis for this suspicion,
that watching a human produce lines in real time is one of the
main fascinations of improvisational theatre, and that many
audience members would quickly lose interest in a play if
they were aware of that it only included artificial actors and
artificially generated dialogue.</p>
        <p>We do, however, not consider this ambitious approach to be
appropriate for the goal of objectively evaluating, and thereby
in the long run improving, the generation of improvised
dialogue. The main reason for this is that the competence of
the human actor impacts the quality of the resulting dialogue,
since skilful improvisers have a larger ability to fit strange
utterances from a co-actor into an improvised scene. There
is, for instance, an improvisational theatre game [improwiki,
2018b], where the actors are given a set of pre-written,
outof-context lines, which they are to incorporate in a natural
way into the scene. A human actor in an improvisational
theatre dialogue is thus very different from a human
interacting with a standard, task-oriented dialogue system. In
addition, the quality of the text-to-speech system and the speech
recognition might influence the audience’s perception of the
dialogue, and thereby their evaluation of the quality of the
dialogue content.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation procedure suggested</title>
      <p>Given the problems of including a human actor in a more
structured evaluation, we suggest the following procedure for
evaluating automatically generated improvisational theatre,
in which the task is narrowed down to the generation of
dialogue and in which the dialogue is initialised in a manner
which increases the possibilities to carry out a reliable
evaluation.
3.1</p>
      <sec id="sec-3-1">
        <title>Interaction between two bots</title>
        <p>A more reliable evaluation method needs to remove the
human influence, and the easiest approach for achieving that
would be to replace the human actor with another
improvisation bot, i.e., the set-up would be two improvisation bots
talking to each other. However, since the end goal is to
construct a bot that is able to act against a human actor, the
functionality of the bots should not be allowed to be dependent on
any one of the bots having full knowledge of the other bot.
Instead, the shared knowledge between the two bots should
aim to approximate the shared knowledge between two
human improvisational actors.</p>
        <p>To approximate that level of shared knowledge, we
suggest that the two bots that are to be evaluated should not be
allowed to be trained on the same training data. The data
could be taken from the same text genre, but is should not be
the exact same data. That is, in the same manner as two
humans that learn the same language are exposed to text from
the same genre, i.e., the very wide genre of utterances from
many different registers in the language, but are not exposed
to the exact same utterances.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Starting the improvised scene</title>
        <p>Improvisational theatre is often carried out with the use of a
set of constraints, typically in the form of an input that the
actors can use as a starting point for their scene. For instance,
the audience could provide an input in the form of a
suggestion for a location at which the scene is to take place. Another
example is input in form of body postures that the actors use
as the starting point for a scene [Johnstone, 1999, pp. 186–
187].</p>
        <p>The evaluation method we suggest is to use the two initial
dialogue lines from human written dialogues as input for the
scene. This is a form of input that can be easily automated
on a larger scale (as opposed to using non-textual input such
as body postures). In addition, the two initial lines provide
background data that the dialogue bots can use for generating
new lines, which simplifies the task somewhat.</p>
        <p>Most importantly, however, using the first two lines of
human-written dialogue as input, will result in that the
artificially generated dialogue has a comparable human-written
dialogue against which it can be evaluated. To make them
as comparable as possible, the improvisation bots could be
instructed to generate approximately the same number of
dialogue lines as the number of lines included in the human
dialogue.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Evaluating the scene from the perspective of its likelihood of having been produced by humans</title>
        <p>With the use of this set-up for dialogue generation, for which
there will be comparable human-written and automatically
generated texts available, the evaluation can be carried out
as follows:</p>
        <p>The two initial dialogue lines are randomly sampled from
a set of (preferably short) human-written dialogues, and one
or several pairs of bot systems use these two initial lines to
produce a generated dialogue.</p>
        <p>A human evaluator is then presented with a set of short
dialogue texts, of which some (e.g., half of them) have been
selected from the human-written dialogues from which the two
initial starter-lines were sampled, and some from the
automatically generated dialogues. The task of the human
evaluator is then to, for each text, decide whether the dialogue has
been generated by a machine or produced by humans. The
same human evaluator should not be presented with a
humanwritten dialogue and an automatically generated one that
begins with the same two initial lines. With this restriction, the
situation that the evaluator carries out a direct comparison
between the two texts is avoided. An evaluation through
comparison would be a less realistic task, since the final aim is to
produce a dialogue that could pass as human-produced, not a
dialogue that is more human-like than a text that has actually
been produced by a human. Employing at least three human
evaluators, would be a prerequisite for all automatically
generated texts being shown to a human, and that enough texts
are annotated to allow for inter-annotator agreement
calculations.</p>
        <p>Naturally, the set of dialogues from which the two initial
dialogue lines are sampled to use as evaluation data, can not
be allowed to be included in the data sets used for training the
improv-bots.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Evaluating the scene from other perspectives</title>
        <p>There are, of course, other aspects than the resemblance
to a human-produced dialogue that the dialogues generated
should be evaluated for. Two parameters, mentioned in the
background, are the level of diversity among the lines
generated and how general the lines produced are. Repetitive
and generic dialogue lines are both examples of phenomena
that might produce a boring scene, and these two parameters
might therefore be combined into a metric in the form of the
entertainment value of the dialogues. The evaluator should,
therefore, when estimating whether a dialogue has been
produced by a human, also assess how entertaining the dialogue
is. This is likely to be a more subjective measure.
However, given a hypothetical situation in which the artificially
generated dialogue often is perceived as being generated by
a human, but these dialogues are consistently being given a
lower entertainment value score than the human-written
dialogues, then this would give an indication of that there is
something important missing in the dialogues generated. The
easiest solution is, probably, to use a binary score, e.g., to let
the evaluator determine whether the dialogue was boring or
not.</p>
        <p>There are also other types of measures that could be
applied for evaluating generated dialogues, e.g., measures that
are related to techniques taught within improvisational
theatre. An actor should, for instance, aim to be collaborative,
e.g., give offers to and accept offers from the co-actors
[Johnstone, 1987, pp. 94–108]. To help the audience follow a
scene, what roles the actors play, what their relationship is,
where the scene is played and what the objectives of the
characters are, should also be established early on in a scene
[improwiki, 2018a]. It would be a very interesting task to
construct an improvisational theatre bot that could achieve such
improv-theatre tasks. With these more specific tasks,
however, the system is perhaps no longer a non-goal-driven
dialogue system, but starts to resemble a goal-driven system.
Creating such a system is thus a separate task, for which a
separate framework for evaluation should be developed.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Implementation</title>
      <p>In the long run, we aim to implement and evaluate a
resourceintensive method as well, e.g., a method that uses seq2seq
to generate new text. However, to illustrate the evaluation
method, we here implemented a dialogue creation strategy
built on selecting the most appropriate line from a dialogue
corpus. This method uses i) a moderate-size dialogue corpus,
and ii) a distributional semantics space that is constructed
from a very large out-of-domain corpus. We apply a dialogue
generation method that is built on several different sub-ideas,
which we hope might serve as inspiration for future work, but
an evaluation of the contribution of each idea is not within the
scope of this paper.</p>
      <p>As corpus, we used the Cornell movie-dialogues corpus
[Danescu-Niculescu-Mizil and Lee, 2011], and as
distributional semantics space we use the word2vec space that has
been pre-trained on a very large corpus of Google News
and which has been made available by Mikolov et al. [2013;
2013].</p>
      <p>Due to the spontaneous and collaborative nature of
improvisational theatre, we believe that each dialogue line in this
genre in average is likely to be shorter than lines in scripted
theatre. We, therefore, extracted a subset of dialogue line
triplets from the Cornell movie-dialogues corpus, where each
of the lines had to conform to the following set of length
criteria: A line was allowed to contain a maximum of two
sentences, and in case it contained two sentences, the first of
these two sentences was allowed to contain a maximum of
two tokens. The last sentence (that is, the only sentence for
one-sentence lines and the second for two-sentence lines) was
allowed to contain a maximum of twelve tokens. Sentence
splitting and tokenisation was carried out with NLTK [Bird,
2002].</p>
      <p>In the Cornell movie-dialogues corpus, there were only 262
dialogues that contained at least six dialogue lines and for
which all of the lines conformed to the length criteria we had
established for the experiment. These 262 dialogues were,
therefore, saved to use as the set of evaluation data, i.e., data
which could be used in the evaluation of the automatically
generated dialogues. Line triplets from the rest of the corpus
were divided into two groups, one group to use as training
data for Actor A and another group to use as training data for
Actor B. We divided the triplets film-wise, so that all triplets
from the same film were assigned either as training data to
Actor A or to Actor B. In addition, 100 of the dialogues were
not added to the training data set, but were used for an
informal evaluation during the development, i.e., used as the two
first input lines to run the dialogue generation during
development. A total of 10,322 line triplets were used to train the
functionality for Actor A and a total of 10,884 line triplets for
the functionality of Actor B.</p>
      <p>A context in the form of the line most recently uttered in
the dialogue and the line before that was used as input data
for predicting the next line in the dialogue. The first two lines
of each training data triplet were used to represent these two
most recent lines, and the third line to represent the line to be
predicted. The core of the method for prediction was thus to
retrieve the training data triplet for which the two first lines
were most similar to the two most recent lines in the
generated dialogue, and to use the third line in the triplet as the
next line in the generated dialogue. Similarity of dialogue line
pairs was determined through converting the two lines into a
semantic vector representation, and using the Euclidean
distance between the vectors as the similarity measure.</p>
      <p>The vector representation for the previous, and the most
recently uttered line in the generated dialogue (as well as for
the first and second lines in the training data triplets), were
constructed as follows: For the previous line, the average of
the word2vec vectors representing the tokens in the line were
used as the line representation. Tokens present in a standard
English stop word list were removed before creating the
average vector. For the most recently uttered line, the same
representation was used, except that stop words were retained.
We believe that also words that are normally considered as
stop words are important when interpreting the exact content
of the most recently uttered dialogue line, while they might
be less important for the content of an earlier line which we
included to provide a topical context.</p>
      <p>In addition to the averaged vectors, we used the word2vec
representation of the three first tokens in the most recently
uttered line, as well as the three last tokens in the line, as we
believe that these might be more important than the other words
for capturing the surface form of the conversation. All of
these six vector representations were then concatenated into
one long vector. The averaged vectors were slightly
downweighted, to give more importance to the vector
representations for the three initial and ending tokens of the most recent
line (the weights were determined by inspecting the output
of the algorithm on the development data). Vector elements
were also added to indicate whether a line contained any of
the question words who, where, when, why, what, which, how
or a question mark.</p>
      <p>When there were several dialogue line pairs in the training
data that matched the lines in the generated dialogues equally
well (allowing for a maximum Euclidean distance difference
of 0.08 between different candidates), and which resulted in
many candidates for the next line, we applied an unsupervised
outlier detection to this set of candidates, using scikit-learn’s
OneClassSVM [Pedregosa et al., 2011]. The set of outliers
were then removed from the candidate list.</p>
      <p>For the number of candidates that were still present in the
candidate list after outliers had been removed, we tried to
incorporate the co-operative spirit of improvisational theatre for
selecting which of them to use. This was accomplished by
selecting the candidate line, for which, when this line (together
with its preceding line) was submitted as input the algorithm,
the closest neighbour was found. The motivation for this was
that when a line was selected to which the co-actor would be
more likely to find a good answer, the dialogue would run
more smoothly, i.e., just as in real improvisational theatre.</p>
      <p>We also applied two simple rules to improve the dialogues,
i) to avoid to end a dialogue with a line ending with a question
mark, ii) and to avoid repeating a line in the dialogue. These
rules were, however, not strictly enforced, and when there
were no other candidates of approximately the same quality
as a line ending with a question mark or as a repeated line,
these were still used.</p>
      <p>Word2vec vectors were accessed through the Gensim
library [Rˇ ehu˚rˇek and Sojka, 2010]. The search for dialogue
line pairs in the training data, i.e., the dialogue line pairs that
were closest to the data given when constructing new
dialogues, was sped up by training a scikit-learn
NearestNeighbors classifier [Pedregosa et al., 2011].
5</p>
    </sec>
    <sec id="sec-5">
      <title>Example output</title>
      <p>In Table 1, we present 6 generated dialogues, which were
randomly sampled from the set of 262 dialogues that had been
set aside as evaluation data. The first two lines are given from
the corpus dialogue, and the left-hand column presents the
generated version while the right-hand column presents the
human-written corpus version. The last two examples show
the output of our algorithm and the output presented by Li et
al. [2016]. Similarly as when generating lines starting from
human-written dialogue, we provided the first two lines in the
dialogues published by Li et al. as input to our system.</p>
      <p>Our suggested formal evaluation of these dialogues would
thus be to present half of the dialogues in Table 1 to
Evaluator 1 and the other half to Evaluator 2, who are to determine
i) whether the dialogue is produced by a human or not, and
ii) whether the dialogue is boring. When informally
evaluating these dialogues, we would say that most dialogues in
the right-hand column would pass as human made, except the
strange dialogue 2, while hardly any of the dialogues in the
left-hand column would be classified as produced by humans.
1
2
3
4
5
6
7
8
None of the dialogues would, however, be classified as
boring, except maybe the first of the two dialogues provided by
Li et al. [2016], as it starts to generate very generic lines
towards the end of the dialogue.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and outlook</title>
      <p>The generated dialogues presented here portray a collection
of somewhat strange exchanges, and would not be useful in
the context of simulating a real conversation. They might,
however, function as absurd dialogues that, for instance,
could be used as improvised scene starters. We believe,
however, that the more structured form of evaluating a
non-goaldriven dialogue system that we present and exemplify could
be generally useful. The evaluation structure might be
possible to apply in the setting of a shared task, in which the
participants not only produce dialogues of this type, but also
participate in the evaluation by classifying the dialogues
produced by other participating groups.</p>
      <p>The next step is to implement a more resource-intensive
method, e.g., a method built on seq2seq or some other neural
network-based technique. We also intend to extend our initial
attempts of achieving dialogue generation with the help of
a moderately sized dialogue corpus. We have, for instance,
not yet attempted any post-processing of the selected lines
to make them fit better into the dialogue, e.g., to make the
pronoun gender and number agree between the lines, or to
match the use of helper verbs.</p>
      <p>Although the ultimate goal would be to achieve an
improvbot that could act seamlessly with a human actor, it would
also be interesting to explore the suspicion we introduced in
the background, i.e., that an audience would quickly lose
interest in a play if they were aware of that it consisted solely
of artificially generated dialogue. For instance, if two
puppets were given two starting lines by the audience, and from
these starting lines played a scene with automatically
generated human-like dialogues, would the audience still find it
interesting?</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>We would like to thank Jonas Sjo¨bergh, as well as the
anonymous reviewers, for valuable input to the content of this paper.</p>
      <p>Proceedings of the Workshop on Cognitive Modeling and
Computational Linguistics, ACL 2011, 2011.
crow.
line.
Mikolov.</p>
      <p>2013.</p>
      <p>Company.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Banchs and Li</source>
          , 2012]
          <string-name>
            <given-names>Rafael E.</given-names>
            <surname>Banchs</surname>
          </string-name>
          and
          <string-name>
            <given-names>Haizhou</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Iris: a chat-oriented dialogue system based on the vector space model</article-title>
          .
          <source>In Proceedings of the Association for Computational Linguistics, System Demonstrations</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Bird</source>
          , 2002]
          <string-name>
            <given-names>Steven</given-names>
            <surname>Bird</surname>
          </string-name>
          .
          <article-title>Nltk: The natural language toolkit</article-title>
          .
          <source>In Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics</source>
          , Stroudsburg, PA, USA,
          <year>2002</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Danescu-Niculescu-Mizil and Lee</source>
          , 2011]
          <string-name>
            <given-names>Cristian</given-names>
            <surname>Danescu-Niculescu-Mizil</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lillian</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs</article-title>
          . In [improwiki, 2018a]
          <fpage>improwiki</fpage>
          . https://improwiki.com/en/wiki/improv/crow,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [improwiki, 2018b]
          <fpage>improwiki</fpage>
          . Drop a https://improwiki.com/en/wiki/improv/drop a line,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Johnstone</source>
          , 1987]
          <string-name>
            <given-names>Keith</given-names>
            <surname>Johnstone</surname>
          </string-name>
          .
          <article-title>Impro : improvisation and the theatre</article-title>
          . Routledge, New York,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Johnstone</source>
          , 1999] Keith. Johnstone.
          <article-title>Impro for storytellers : theatresports and the art of making things happen</article-title>
          .
          <source>Faber</source>
          , London, [new ed.
          <source>] edition</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>[Li</surname>
          </string-name>
          et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Jiwei</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Will</given-names>
            <surname>Monroe</surname>
          </string-name>
          , Alan Ritter, Dan Jurafsky, Michel Galley, and
          <string-name>
            <given-names>Jianfeng</given-names>
            <surname>Gao</surname>
          </string-name>
          .
          <article-title>Deep reinforcement learning for dialogue generation</article-title>
          .
          <source>In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>1192</fpage>
          -
          <lpage>1202</lpage>
          , Austin, Texas,
          <year>November 2016</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Luong et al.,
          <year>2017</year>
          ]
          <string-name>
            <surname>Minh-Thang</surname>
            <given-names>Luong</given-names>
          </string-name>
          , Eugene Brevdo, and
          <string-name>
            <surname>Rui Zhao.</surname>
          </string-name>
          <article-title>Neural machine translation (seq2seq) tutorial</article-title>
          . https://github.com/tensorflow/nmt,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Mathewson and Mirowski</source>
          , 2017] Kory W. Mathewson and
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Mirowski</surname>
          </string-name>
          .
          <article-title>Improvised theatre alongside artificial intelligences</article-title>
          .
          <source>In In proceedings of AAAI Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Mikolov et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>CoRR, abs/1301.3781</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Mikolov</source>
          , 2013] Tomas https://code.google.com/archive/p/word2vec/, word2vec on Google code
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Pedregosa et al.,
          <year>2011</year>
          ]
          <string-name>
            <given-names>Fabian</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          , Gae¨l Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          , Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau,
          <string-name>
            <given-names>Matthieu</given-names>
            <surname>Brucher</surname>
          </string-name>
          , Matthieu Perrot, and
          <string-name>
            <given-names>Edouard</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>[Rˇ ehu˚rˇek and Sojka</source>
          , 2010]
          <article-title>Radim Rˇ ehu˚rˇek and Petr Sojka. Software Framework for Topic Modelling with Large Corpora</article-title>
          .
          <source>In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</source>
          , pages
          <fpage>45</fpage>
          -
          <lpage>50</lpage>
          , Paris, France, May
          <year>2010</year>
          .
          <article-title>European Language Resources Association (ELRA).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Serban et al.,
          <year>2016</year>
          ] Iulian V Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and
          <string-name>
            <given-names>Joelle</given-names>
            <surname>Pineau</surname>
          </string-name>
          .
          <article-title>Building end-to-end dialogue systems using generative hierarchical neural network models</article-title>
          .
          <source>Proceedings of AAAI</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Strindbergs intima teater</source>
          , 2012]
          <article-title>Strindbergs intima teater</article-title>
          . http://strindbergsintimateater.se/festival-i-maj-2012/,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Sutskever et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , Oriol Vinyals, and Quoc V Le.
          <article-title>Sequence to sequence learning with neural networks</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>3104</fpage>
          -
          <lpage>3112</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [The Improvised Shakespeare Company,
          <year>2018</year>
          ] The Improvised Shakespeare http://improvisedshakespeare.com,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Vinyals and Le</source>
          , 2015]
          <string-name>
            <given-names>Oriol</given-names>
            <surname>Vinyals</surname>
          </string-name>
          and
          <string-name>
            <given-names>Quoc V.</given-names>
            <surname>Le</surname>
          </string-name>
          .
          <article-title>A neural conversational model</article-title>
          .
          <source>CoRR, abs/1506.05869</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Wikipedia contributors, 2018]
          <article-title>Wikipedia contributors</article-title>
          .
          <source>Improvisational theatre - Wikipedia</source>
          , the free encyclopedia,
          <year>2018</year>
          . [Online; accessed 27-June-2018].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>