<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Linguistic Profiling and Behavioral Drift in Chat Bots</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nawaf</forename><surname>Ali</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Computer Engineering and Computer Science Department J. B. Speed School of Engineering</orgName>
								<orgName type="institution">University of Louisville Louisville</orgName>
								<address>
									<region>KY</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Derek</forename><surname>Schaeffer</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Computer Engineering and Computer Science Department J. B. Speed School of Engineering</orgName>
								<orgName type="institution">University of Louisville Louisville</orgName>
								<address>
									<region>KY</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Roman</forename><forename type="middle">V</forename><surname>Yampolskiy</surname></persName>
							<email>roman.yampolskiy@louisville.edu</email>
							<affiliation key="aff2">
								<orgName type="department">Computer Engineering and Computer Science Department J. B. Speed School of Engineering</orgName>
								<orgName type="institution">University of Louisville Louisville</orgName>
								<address>
									<region>KY</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Linguistic Profiling and Behavioral Drift in Chat Bots</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">5810B8904BAF137010A38DEEEC3FA37D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T17:00+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>When trying to identify the author of a book, a paper, or a letter, the object is to detect a style that distinguishes one author from another. With recent developments in artificial intelligence, chat bots sometimes play the role of the text authors. The focus of this study is to investigate the change in chat bot linguistic style over time and its effect on authorship attribution. The study shows that chat bots did show a behavioral drift in their style. Results from this study imply that any non-zero change in lingual style results in difficulty for our chat bot identification process.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. Introduction</head><p>Biometric identification is a way to discover or verify the identity of who we claim to be by using physiological and behavioral traits <ref type="bibr">(Jain, 2000)</ref>. To serve as an identifier, a biometric should have the following properties: (a) Universality, which means that a characteristic should apply to everybody, (b) uniqueness, the characteristics will be unique to each individual being studied, (c) permanence, the characteristics should not change over time in a way that will obscure the identity of a person, and (d) collectability, the ability to measure such characteristics <ref type="bibr" target="#b7">(Jain, Ross &amp; Nandakumar, 2011)</ref>. Biometric identification technologies are not limited to fingerprints. Behavioral traits associated with each human provide a way to identify the person by a biometric profile. Behavioral biometrics provides an advantage over traditional biometrics in that they can be collected unbeknownst to the user under investigation <ref type="bibr">(Yampolskiy &amp; Govindaraju, 2008)</ref>. Characteristics pertaining to language, composition, and writing style, such as particular syntactic and structural layout traits, vocabulary usage and richness, unusual language usage, and stylistic traits remain relatively constant. Identifying and learning these characteristics is the primary focus of authorship authentication <ref type="bibr" target="#b10">(Orebaugh, 2006)</ref>.</p><p>Authorship identification is a research field interested in finding traits, which can identify the original author of the document.</p><p>Two main subfields of authorship identification are: (a) Authorship recognition, when there is more than one author claiming a document, and the task is to identify the correct author based on the study of style and other author-specific features.</p><p>(b) Authorship verification, where the task is to verify that an author of a document is the correct author based on that author's profile and the study of the document <ref type="bibr" target="#b1">(Ali, Hindi &amp; Yampolskiy, 2011)</ref>. The twelve Federalist papers claimed by both Alexander Hamilton and James Madison are an example for authorship recognition <ref type="bibr" target="#b5">(Holmes &amp; Forsyth, 1995)</ref>. Detecting plagiarism is a good example of the second type. Authorship verification is mostly used in forensic investigation.</p><p>When examining people, a major challenge is that the writing style of the writer might evolve and develop with time, a concept known as behavioral drift <ref type="bibr" target="#b9">(Malyutov, 2005)</ref>. Chat bots, which are built algorithmically, have never been analyzed from this perspective. A study on identifying chat bots using Java Graphical Authorship Attribution Program (JGAAP) has shown that it is possible to identify chat bots by analyzing their chat logs for linguistics features <ref type="bibr" target="#b1">(Ali, Hindi &amp; Yampolskiy, 2011)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Chat bots</head><p>Chat bots are computer programs mainly used in applications such as online help, e-commerce, customer services, call centers, and internet gaming <ref type="bibr">(Webopedia, 2011)</ref>.</p><p>Chat bots are typically perceived as engaging software entities, which humans may communicate with, attempting to fool the human into thinking that he or she is talking to another human. Some chat bots use Natural Language Processing Systems (NLPS) when replying to a statement, while majority of other bots are scanning for keywords within the input and pull a reply with the most matching keywords <ref type="bibr">(Wikipedia, 2011)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Motivations</head><p>The ongoing threats by criminal individuals have migrated from actual physical threats and violence to another dimension, the Cyber World. Criminals try to steal others information and identity by any means. Researchers are following up and doing more work trying to prevent any criminal activities, whether it is identity theft or even terrorist threats.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. Application and Data Collection</head><p>Data was downloaded from the Loebner prize website <ref type="bibr" target="#b9">(Loebner, 2012)</ref>, in which a group of human judges from different disciplines and ages are set to talk with the chat bots, and the chat bots get points depending on the quality of the conversation that the chat bot produces. A study was made on chat bot authorship with data collected in 2011 <ref type="bibr" target="#b1">(Ali, Hindi &amp; Yampolskiy, 2011)</ref>; the study demonstrated the feasibility of using authorship identification techniques on chat bots. The data in the current study was collected over a period of years. Our data only pertained to chat bots that were under study in <ref type="bibr" target="#b1">(Ali, Hindi &amp; Yampolskiy, 2011)</ref>, which is why this study does not cover every year of the Loebner contest, which started in 1996. Only the years, that contain the chat bots under study, were used in this research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. Data Preparation</head><p>The collected data had to be preprocessed by deleting unnecessary labels like the chat bot name, and time-date of conversation (Fig. <ref type="figure" target="#fig_0">1</ref>). A Perl script was used to clean the files and split each chat into two text files, one for the chat bot under study, the other for the human judge. The judge part was ignored, and only the chat bot text was analyzed. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. Chat Bots used.</head><p>Eleven chat bots were used in the initial experiments: Alice (ALICE, 2011), CleverBot (CleverBot, 2011), Hal (HAL, 2011), Jeeney (Jeeney, 2011), SkyNet (SkyNet, 2011), TalkBot (TalkBot, 2011), Alan (Alan, 2011), MyBot (MyBot, 2011), Jabberwock (Jabberwock, 2011), Jabberwacky <ref type="bibr" target="#b6">(Jabberwacky, 2011)</ref>, and Suzette (Suzette, 2011). These were our main baseline that we intend to compare to the chat bots under study, which were: Alice, Jabberwacky, and Jabberwock</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. Experiments</head><p>The experiments were conducted using RapidMiner <ref type="bibr" target="#b11">(RapidMiner, 2011)</ref>. A model was built for authorship identification that will accept the training text and create a word list and a model using the Support Vector Machine (SVM) (Fig 2 <ref type="figure">)</ref>, and then this word list and model will be implemented on the test text, which is, in our case, data from the Loebner prize site <ref type="bibr" target="#b9">(Loebner, 2012)</ref>. In Fig. <ref type="figure" target="#fig_2">3</ref> we use the saved word list and model as input for the testing stage, and the output will give us the percentage prediction of the tested files. The data was tested using two different saved models, one with a complete set of chat bots (eleven bots) in the training stage, and the second model was built with training using only the three chat bots under study.</p><p>When performing the experiments, the model output is confidence values, in which, values reflecting how confident we are that this chat bot is identified correctly. Chat bot with highest confidence value (printed in boldface in all tables) is the predicted bot according to the model. Table <ref type="table" target="#tab_0">1</ref> shows how much confidence we have in our tested data for Alice's text files in different years, when using eleven chat bots for training. Table <ref type="table" target="#tab_2">2</ref> shows the confidence level of Alice's files when using only the three chat bots under study. Fig. <ref type="figure" target="#fig_3">4</ref> shows the results of testing the three chat bots over different years when training our model using all eleven chat bots.</p><p>The results in Fig. <ref type="figure">5</ref> comes from the experiments that uses a training set based on the three chat bots under study, Alice, Jabberwacky, and Jabberwock. Jabberwock did not take part in the 2005 contest. Table <ref type="table" target="#tab_3">3</ref> shows the confidence level of Jabberwacky's files values when tested with the complete set of eleven chat bots. Table <ref type="table" target="#tab_4">4</ref> shows the confidence level of Jabberwock's files when all the chat bots are used for training. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. Conclusions and Future Work</head><p>The initial experiments conducted on the collected data did show a variation between chat bots, which is expected. It is not expected that all chat bots will act the same way, since they have different creators and different algorithms.</p><p>Some chat bots are more intelligent than others; the Loebner contest aims to contrast such differences. Alice bot showed some consistency over the years under study, but in 2005 Alice's style was not as recognizable as in other years. While Jabberwacky performed well for all years when training with just three bots and was not identified in 2001 when the training set contained all eleven chat bots for training, Jabberwacky gave us a 40% correct prediction in 2005. Jabberwock, the third chat bot under study here, was the least consistent compared to all other bots, and gave 0% correct prediction in 2001 and 2004, and 91% for 2011, which may indicate that Jabberwock's vocabulary did improve in a way that gave him his own style.</p><p>With three chat bot training models, Jabberwacky was identified 100% correctly over all years. Alice did well for all years except for 2005, and Jabberwock was not identified at all in 2001 and 2004.</p><p>With these initial experiments, we can state that some chat bots do change their style, most probably depending on the intelligent algorithms used in initializing conversations. Other chat bots do have a steady style and do not change over time.</p><p>More data is required to get reliable results; we only managed to obtain data from the Loebner prize competition, which in some cases was just one 4KB text file. With sufficient data, results should be more representative and accurate.</p><p>Additional research on these chat bots will be conducted, and more work on trying to find specific features to identify the chat bots will be continued. This is a burgeoning research area and still much work need to be done.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Sample conversation between a chat bot and a judge.</figDesc><graphic coords="2,45.00,418.60,234.00,153.70" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Training model using Rapid Miner.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Testing stage using Rapid Miner.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Identification percentage over different years using all eleven chat bots for training.</figDesc><graphic coords="3,45.00,292.25,247.65,159.10" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Confidence level of Alice's files when tested with all eleven chat bots used in training</figDesc><table><row><cell>Process</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Document Normalize Validation Store Word list Store Model Get Word list Process Document Normalize Get Model Apply Model</head><label></label><figDesc></figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 .</head><label>2</label><figDesc>Confidence level of Alice's files when tested with only three chat bots used in training.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 .</head><label>3</label><figDesc>Confidence level of Jabberwacky's files when tested with all 11 chat bots used in training.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4 .</head><label>4</label><figDesc>Confidence level of Jabberwock's files when tested with all eleven chat bots used in training.</figDesc><table /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">AI Research</title>
		<author>
			<persName><forename type="first">Alan</forename></persName>
		</author>
		<ptr target="http://www.a-i.com/show_tree.asp?id=59&amp;level=2&amp;root=115" />
		<imprint>
			<date type="published" when="2011-06-10">2011. June 10, 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">N</forename><surname>Ali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hindi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">V</forename><surname>Yampolskiy</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Evaluation of authorship attribution software on a Chat bot corpus</title>
	</analytic>
	<monogr>
		<title level="m">XXIII International Symposium on Information, Communication and Automation Technologies (ICAT)</title>
				<meeting><address><addrLine>Sarajevo, Bosnia and Herzegovina</addrLine></address></meeting>
		<imprint>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">ALICE</title>
		<author>
			<persName><forename type="first">Alice</forename></persName>
		</author>
		<ptr target="http://alicebot.blogspot.com/" />
		<imprint>
			<date type="published" when="2011-06-12">2011. June 12, 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">CleverBot Retrieved</title>
		<author>
			<persName><surname>Cleverbot</surname></persName>
		</author>
		<ptr target="http://www.a-i.com/show_tree.asp?id=97&amp;level=2&amp;root=115" />
		<imprint>
			<date type="published" when="2011-06-16">2011. July 5, 2011. 2011. June 16, 2011</date>
		</imprint>
		<respStmt>
			<orgName>AI Research</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">The Federalist Revisited: New Directions in Authorship Attribution</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">I</forename><surname>Holmes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Forsyth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Literary and Linguistic Computing</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="111" to="127" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Jabberwacky-live chat bot-AI Artificial Intelligence chatbot</title>
		<author>
			<persName><surname>Jabberwacky</surname></persName>
		</author>
		<ptr target="http://www.abenteuermedien.de/jabberwock/Jain" />
	</analytic>
	<monogr>
		<title level="m">Communications of the ACM</title>
				<imprint>
			<date type="published" when="2000">2011. June 10, 2011. 2011. June 12, 2011. 2000</date>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="page" from="91" to="98" />
		</imprint>
	</monogr>
	<note>Biometric Identification</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">A</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Ross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Nandakumar</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<ptr target="http://www.jeeney.com/" />
		<title level="m">Artificial Intelligence Online</title>
				<imprint>
			<publisher>Springer-Verlag New York, LLC. Jeeney</publisher>
			<date type="published" when="2011-03-11">2011. March 11, 2011</date>
		</imprint>
	</monogr>
	<note>Introduction to Biometrics</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Authorship attribution of texts: a review</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">G</forename><surname>Loebner</surname></persName>
		</author>
		<author>
			<persName><surname>Malyutov</surname></persName>
		</author>
		<ptr target="http://loebner.net/Prizef/loebner-prize" />
	</analytic>
	<monogr>
		<title level="m">Electronic Notes in Discrete Mathematics</title>
				<imprint>
			<date type="published" when="2005">2012. Jan 3, 2012. 2005</date>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="353" to="357" />
		</imprint>
	</monogr>
	<note>Home Page of The Loebner Prize</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">An Instant Messaging Intrusion Detection System Framework: Using character frequency analysis for authorship identification and validation</title>
		<author>
			<persName><surname>Mybot</surname></persName>
		</author>
		<author>
			<persName><surname>Orebaugh</surname></persName>
		</author>
		<ptr target="http://www.chatbots.org/chatbot/mybot/" />
	</analytic>
	<monogr>
		<title level="m">40th Annual IEEE International Carnahan Conference Security Technology</title>
				<meeting><address><addrLine>Lexington, KY</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006">2011. Jan 8, 2011. 2006</date>
		</imprint>
	</monogr>
	<note>Chatbot Mybot, Artificial Intelligence</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Rapid-I</title>
		<author>
			<persName><surname>Rapidminer</surname></persName>
		</author>
		<ptr target="http://rapid-i.com/" />
		<imprint>
			<date type="published" when="2011-12-20">2011. Dec 20, 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">SkyNet -AI</title>
		<author>
			<persName><surname>Skynet</surname></persName>
		</author>
		<ptr target="www.en.wikipedia.org/wiki/ChatterbotYampolskiy" />
	</analytic>
	<monogr>
		<title level="m">Chatterbot-Wikipedia, the free encyclopedia</title>
				<editor>
			<persName><forename type="first">R</forename><forename type="middle">V</forename><surname>Govindaraju</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2008">2011. April 20, 2011. 2011. Feb 7, 2011. 2011. April 14, 2011. 2011. June 20. 2011. June 22, 2011. 2008</date>
		</imprint>
	</monogr>
	<note>TalkBot-A simple talk bot. What is chat bot? A Word Definition from the Webpedia Computer Dictionary</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Behavioral Biometrics: a Survey and Classification</title>
	</analytic>
	<monogr>
		<title level="j">International Journal of Biometrics (IJBM)</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="81" to="113" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
