<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linguistic Profiling and Behavioral Drift in Chat Bots</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nawaf Ali</string-name>
          <email>ntali001@louisville.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Derek Schaeffer</string-name>
          <email>dwscha02@louisville.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman V. Yampolskiy</string-name>
          <email>roman.yampolskiy@louisville.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Engineering and Computer, Science Department, J. B. Speed School of Engineering, University of Louisville</institution>
          ,
          <addr-line>Louisville, KY.</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>When trying to identify the author of a book, a paper, or a letter, the object is to detect a style that distinguishes one author from another. With recent developments in artificial intelligence, chat bots sometimes play the role of the text authors. The focus of this study is to investigate the change in chat bot linguistic style over time and its effect on authorship attribution. The study shows that chat bots did show a behavioral drift in their style. Results from this study imply that any non-zero change in lingual style results in difficulty for our chat bot identification process.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Biometric identification technologies are not limited
to fingerprints. Behavioral traits associated with each
human provide a way to identify the person by a biometric
profile. Behavioral biometrics provides an advantage over
traditional biometrics in that they can be collected
unbeknownst to the user under investigation
        <xref ref-type="bibr" rid="ref22">(Yampolskiy
&amp; Govindaraju, 2008)</xref>
        . Characteristics pertaining to
language, composition, and writing style, such as
particular syntactic and structural layout traits, vocabulary
usage and richness, unusual language usage, and stylistic
traits remain relatively constant. Identifying and learning
these characteristics is the primary focus of authorship
authentication
        <xref ref-type="bibr" rid="ref15">(Orebaugh, 2006)</xref>
        .
      </p>
      <p>
        Authorship identification is a research field interested
in finding traits, which can identify the original author of
the document. Two main subfields of authorship
identification are: (a) Authorship recognition, when there
is more than one author claiming a document, and the task
is to identify the correct author based on the study of style
and other author-specific features. (b) Authorship
verification, where the task is to verify that an author of a
document is the correct author based on that author’s
profile and the study of the document
        <xref ref-type="bibr" rid="ref2">(Ali, Hindi &amp;
Yampolskiy, 2011)</xref>
        . The twelve Federalist papers claimed
by both Alexander Hamilton and James Madison are an
example for authorship recognition
        <xref ref-type="bibr" rid="ref6">(Holmes &amp; Forsyth,
1995)</xref>
        . Detecting plagiarism is a good example of the
second type. Authorship verification is mostly used in
forensic investigation.
      </p>
      <p>
        When examining people, a major challenge is that the
writing style of the writer might evolve and develop with
time, a concept known as behavioral drift
        <xref ref-type="bibr" rid="ref13">(Malyutov,
2005)</xref>
        . Chat bots, which are built algorithmically, have
never been analyzed from this perspective. A study on
identifying chat bots using Java Graphical Authorship
Attribution Program (JGAAP) has shown that it is possible
to identify chat bots by analyzing their chat logs for
linguistics features
        <xref ref-type="bibr" rid="ref2">(Ali, Hindi &amp; Yampolskiy, 2011)</xref>
        .
A.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Chat bots</title>
      <p>
        Chat bots are computer programs mainly used in
applications such as online help, e-commerce, customer
services, call centers, and internet gaming
        <xref ref-type="bibr" rid="ref20">(Webopedia,
2011)</xref>
        .
      </p>
      <p>
        Chat bots are typically perceived as engaging software
entities, which humans may communicate with, attempting
to fool the human into thinking that he or she is talking to
another human. Some chat bots use Natural Language
Processing Systems (NLPS) when replying to a statement,
while majority of other bots are scanning for keywords
within the input and pull a reply with the most matching
keywords
        <xref ref-type="bibr" rid="ref21">(Wikipedia, 2011)</xref>
        .
      </p>
      <p>B.</p>
    </sec>
    <sec id="sec-3">
      <title>Motivations</title>
      <p>The ongoing threats by criminal individuals have migrated
from actual physical threats and violence to another
dimension, the Cyber World. Criminals try to steal others
information and identity by any means. Researchers are
following up and doing more work trying to prevent any
criminal activities, whether it is identity theft or even
terrorist threats.</p>
      <p>II.</p>
      <sec id="sec-3-1">
        <title>Application and Data Collection</title>
        <p>
          Data was downloaded from the Loebner prize website
          <xref ref-type="bibr" rid="ref12">(Loebner, 2012)</xref>
          , in which a group of human judges from
different disciplines and ages are set to talk with the chat
bots, and the chat bots get points depending on the quality
of the conversation that the chat bot produces. A study
was made on chat bot authorship with data collected in
2011
          <xref ref-type="bibr" rid="ref2">(Ali, Hindi &amp; Yampolskiy, 2011)</xref>
          ; the study
demonstrated the feasibility of using authorship
identification techniques on chat bots. The data in the
current study was collected over a period of years. Our
data only pertained to chat bots that were under study in
          <xref ref-type="bibr" rid="ref2">(Ali, Hindi &amp; Yampolskiy, 2011)</xref>
          , which is why this study
does not cover every year of the Loebner contest, which
started in 1996. Only the years, that contain the chat bots
under study, were used in this research.
        </p>
        <p>III.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data Preparation</title>
        <p>The collected data had to be preprocessed by deleting
unnecessary labels like the chat bot name, and time-date of
conversation (Fig. 1). A Perl script was used to clean the
files and split each chat into two text files, one for the chat
bot under study, the other for the human judge. The judge
part was ignored, and only the chat bot text was analyzed.</p>
      </sec>
      <sec id="sec-3-3">
        <title>IV. Chat Bots used.</title>
        <p>
          Eleven chat bots were used in the initial experiments:
Alice
          <xref ref-type="bibr" rid="ref3">(ALICE, 2011)</xref>
          , CleverBot
          <xref ref-type="bibr" rid="ref4">(CleverBot, 2011)</xref>
          , Hal
          <xref ref-type="bibr" rid="ref5">(HAL, 2011)</xref>
          , Jeeney
          <xref ref-type="bibr" rid="ref11">(Jeeney, 2011)</xref>
          , SkyNet
          <xref ref-type="bibr" rid="ref17">(SkyNet,
2011)</xref>
          , TalkBot
          <xref ref-type="bibr" rid="ref19">(TalkBot, 2011)</xref>
          , Alan
          <xref ref-type="bibr" rid="ref1">(Alan, 2011)</xref>
          ,
MyBot
          <xref ref-type="bibr" rid="ref14">(MyBot, 2011)</xref>
          , Jabberwock
          <xref ref-type="bibr" rid="ref8">(Jabberwock, 2011)</xref>
          ,
Jabberwacky
          <xref ref-type="bibr" rid="ref7">(Jabberwacky, 2011)</xref>
          , and Suzette
          <xref ref-type="bibr" rid="ref18">(Suzette,
2011)</xref>
          . These were our main baseline that we intend to
Store
Model
Apply
Model
compare to the chat bots under study, which were: Alice,
Jabberwacky, and Jabberwock
The experiments were conducted using RapidMiner
          <xref ref-type="bibr" rid="ref16">(RapidMiner, 2011)</xref>
          . A model was built for authorship
identification that will accept the training text and create a
word list and a model using the Support Vector Machine
(SVM) (Fig 2), and then this word list and model will be
implemented on the test text, which is, in our case, data
from the Loebner prize site
          <xref ref-type="bibr" rid="ref12">(Loebner, 2012)</xref>
          .
        </p>
        <p>Process
Document</p>
        <p>Normalize</p>
        <p>Validation
Store</p>
        <p>Word list</p>
        <p>In Fig. 3 we use the saved word list and model as
input for the testing stage, and the output will give us the
percentage prediction of the tested files.</p>
        <p>Get Word
list</p>
        <p>Process
Document</p>
        <p>Normalize</p>
        <p>Get</p>
        <p>Model</p>
        <p>The data was tested using two different saved models,
one with a complete set of chat bots (eleven bots) in the
training stage, and the second model was built with
training using only the three chat bots under study.</p>
        <p>When performing the experiments, the model output
is confidence values, in which, values reflecting how
confident we are that this chat bot is identified correctly.
Chat bot with highest confidence value (printed in
boldface in all tables) is the predicted bot according to the
model. Table 1 shows how much confidence we have in
our tested data for Alice’s text files in different years,
when using eleven chat bots for training.</p>
        <p>Fig 5. Identification percentage over different years using only the three
chat bots under study for training.</p>
      </sec>
      <sec id="sec-3-4">
        <title>VI. Conclusions and Future Work</title>
        <p>The initial experiments conducted on the collected data
did show a variation between chat bots, which is expected.
It is not expected that all chat bots will act the same way,
since they have different creators and different algorithms.</p>
        <p>Some chat bots are more intelligent than others; the
Loebner contest aims to contrast such differences. Alice
bot showed some consistency over the years under study,
but in 2005 Alice’s style was not as recognizable as in
other years. While Jabberwacky performed well for all
years when training with just three bots and was not
identified in 2001 when the training set contained all
eleven chat bots for training, Jabberwacky gave us a 40%
correct prediction in 2005. Jabberwock, the third chat bot
under study here, was the least consistent compared to all
other bots, and gave 0% correct prediction in 2001 and
2004, and 91% for 2011, which may indicate that
Jabberwock’s vocabulary did improve in a way that gave
him his own style.</p>
        <p>With three chat bot training models, Jabberwacky
was identified 100% correctly over all years. Alice did
well for all years except for 2005, and Jabberwock was
not identified at all in 2001 and 2004.</p>
        <p>With these initial experiments, we can state that some
chat bots do change their style, most probably depending
on the intelligent algorithms used in initializing
conversations. Other chat bots do have a steady style and
do not change over time.</p>
        <p>More data is required to get reliable results; we only
managed to obtain data from the Loebner prize
competition, which in some cases was just one 4KB text
file. With sufficient data, results should be more
representative and accurate.</p>
        <p>Additional research on these chat bots will be
conducted, and more work on trying to find specific
features to identify the chat bots will be continued. This
is a burgeoning research area and still much work need to
be done.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Alan.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <source>AI Research</source>
          . Retrieved June 10,
          <year>2011</year>
          , from http://www.ai.com/show_tree.
          <source>asp?id=59&amp;level=2&amp;root=115</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hindi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yampolskiy</surname>
            ,
            <given-names>R. V.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Evaluation of authorship attribution software on a Chat bot corpus</article-title>
          .
          <source>XXIII International Symposium on Information, Communication and Automation Technologies (ICAT)</source>
          , Sarajevo, Bosnia and Herzegovina,
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>ALICE.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <source>ALICE. Retrieved June 12</source>
          ,
          <year>2011</year>
          , from http://alicebot.blogspot.com/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>CleverBot.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <source>CleverBot Retrieved July 5</source>
          ,
          <year>2011</year>
          , from http://cleverbot.com/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>HAL.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <source>AI Research</source>
          . Retrieved June 16,
          <year>2011</year>
          , from http://www.ai.com/show_tree.
          <source>asp?id=97&amp;level=2&amp;root=115</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>D. I.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Forsyth</surname>
            ,
            <given-names>R. S.</given-names>
          </string-name>
          (
          <year>1995</year>
          ).
          <source>The Federalist Revisited: New Directions in Authorship Attribution. Literary and Linguistic Computing</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ),
          <fpage>111</fpage>
          -
          <lpage>127</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Jabberwacky.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Jabberwacky-live chat bot-AI Artificial Intelligence chatbot</article-title>
          .
          <source>Retrieved June 10</source>
          ,
          <year>2011</year>
          , from http://www.jabberwacky.com/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Jabberwock.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <source>Jabberwock Chat. Retrieved June 12</source>
          ,
          <year>2011</year>
          , from http://www.abenteuermedien.de/jabberwock/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Biometric Identification</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>43</volume>
          (
          <issue>2</issue>
          ),
          <fpage>91</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ross</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Nandakumar</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2011</year>
          ). Introduction to Biometrics: Springer-Verlag New York, LLC.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Jeeney.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <source>Artificial Intelligence Online. Retrieved March</source>
          <volume>11</volume>
          ,
          <year>2011</year>
          , from http://www.jeeney.com/
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Loebner</surname>
            ,
            <given-names>H. G.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Home Page of The Loebner Prize</article-title>
          .
          <source>Retrieved Jan 3</source>
          ,
          <year>2012</year>
          , from http://loebner.net/Prizef/loebner-prize.html
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Malyutov</surname>
            ,
            <given-names>M. B.</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Authorship attribution of texts: a review</article-title>
          .
          <source>Electronic Notes in Discrete Mathematics</source>
          ,
          <volume>21</volume>
          ,
          <fpage>353</fpage>
          -
          <lpage>357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>MyBot.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <source>Chatbot Mybot, Artificial Intelligence. Retrieved Jan 8</source>
          ,
          <year>2011</year>
          , from http://www.chatbots.org/chatbot/mybot/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Orebaugh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>An Instant Messaging Intrusion Detection System Framework: Using character frequency analysis for authorship identification and validation</article-title>
          .
          <source>40th Annual IEEE International Carnahan Conference Security Technology, Lexington, KY.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>RapidMiner.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <source>Rapid- I. Retrieved Dec 20</source>
          ,
          <year>2011</year>
          , from http://rapid-i.com/
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>SkyNet.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>SkyNet - AI</article-title>
          .
          <source>Retrieved April 20</source>
          ,
          <year>2011</year>
          , from http://home.comcast.net/~chatterbot/bots/AI/Sky net/
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Suzette.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>SourceForge ChatScript Project</article-title>
          .
          <source>Retrieved Feb 7</source>
          ,
          <year>2011</year>
          , from http://chatscript.sourceforge.net/
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>TalkBot.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>TalkBot- A simple talk bot</article-title>
          .
          <source>Retrieved April 14</source>
          ,
          <year>2011</year>
          , from http://code.google.com/p/talkbot/
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Webopedia.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>What is chat bot? A Word Definition from the Webpedia Computer Dictionary</article-title>
          . Retrieved June 20, from www.webopedia.com/TERM/C/chat_bot.html
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Wikipedia.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Chatterbot- Wikipedia, the free encyclopedia</article-title>
          .
          <source>Retrieved June 22</source>
          ,
          <year>2011</year>
          , from www.en.wikipedia.org/wiki/Chatterbot
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Yampolskiy</surname>
            ,
            <given-names>R. V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Govindaraju</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Behavioral Biometrics: a Survey and Classification</article-title>
          .
          <source>International Journal of Biometrics (IJBM)</source>
          .
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <fpage>81</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>