<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Style Breach Detection: An Unsupervised Detection Model Notebook for PAN at CLEF 2017</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Jamal</forename><forename type="middle">Ahmad</forename><surname>Khan</surname></persName>
							<email>j_ahmadkhan@yahoo.com</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science and Software Engineering</orgName>
								<orgName type="institution">International Islamic University</orgName>
								<address>
									<settlement>Islamabad</settlement>
									<country key="PK">Pakistan</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Style Breach Detection: An Unsupervised Detection Model Notebook for PAN at CLEF 2017</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">0A84888B4B8CB10FAEE5180055320C30</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:30+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper deals with the sub-task of PAN 2017 Author Identification, which is to detect style breaches for unknown number of authors within a single document in English. The presented model is an unsupervised approach that will detect style breaches and mark text boundaries on the basis of different stylistic features. This model will use some classical stylistic features like POS analysis and sentence lexical analysis. Also some new features naming common English word frequencies within sentence text, sentence expression and sentence attitude have been proposed. The new features may not be directly linked to author's style of writing but to the subject/topic of sentence under analysis. Moreover the model uses sentence window for style detection. The sentence window may be extended to neighboring sentences during its unsupervised analysis.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Stylometry is an important tool in the field of digital text forensics, especially in cases where we have unidentified or dubious text documents <ref type="bibr">[1]</ref> written by one or more authors. These documents do not have an external link, tool or repository to prove that which text passage relates to which author. In other words, we use stylometric approaches when we may have to ascertain if the acclaimed authorship of text document actually exists in circumstances where we do not have any external verification resources.</p><p>Stylometric approaches generally achieve higher accuracy for long documents <ref type="bibr" target="#b0">[2]</ref> because longer documents contain more text to reveal stylistic features of authors like in the field of Intrinsic Plagiarism detection problem solving <ref type="bibr" target="#b1">[3,</ref><ref type="bibr" target="#b2">4]</ref>. But in cases of short documents or texts e.g. in cases of social media like twitter where there may be fewer sentences by each author, Stylometric approaches my not get more accurate results. Although much work has been done in cases of scam emails <ref type="bibr" target="#b3">[5]</ref>, cyber-crimes <ref type="bibr" target="#b4">[6]</ref> and fake service provision reviews <ref type="bibr" target="#b5">[7]</ref> using Stylometric models.</p><p>One way of using stylometric approach in case of author attribution and author profiling is by training the computer applications over specific writing style of some specific author in a number of documents. But as discussed above the task of detecting style breaches within a document without knowing in advance about the exact number of authors is difficult task and also an objective for ongoing research. Detection of style breach is related to text segmentation where text boundaries are marked with detection in change of topics <ref type="bibr" target="#b6">[8]</ref>.</p><p>The presented model uses unsupervised classification approach to detect and mark passage boundaries in given documents on the basis of style breaches. A combination of well-known stylometric features like Syntactic, Lexical and content specific features <ref type="bibr" target="#b7">[9]</ref> are used with features like ordinary words frequency, sentence expression and sentence attitude that may be related to textual topic specification and may not be directly related to author's style. But this approach may be very handy in cases where we want to relate one sentence to its neighboring sentences and thus detect exact passage boundaries within a given document.</p><p>Also this model is a good example of how a text as small as a sentence within a document may be helpful in finding its related sentences on the basis of stylometric and other parameters to help us figure-out the passage boundaries by unknown number of authors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Dataset</head><p>The training dataset of PAN at CLEF 2017 <ref type="bibr" target="#b6">[8]</ref> for the task of style breach detection under main task of author identification. The dataset contained about 187 English text documents of different lengths and sizes over different topics like biography, politics, travel, hotels etc. Along with each text document a truth file was provided which contained exact character positions indicating style breach occurrences within that document, topic of document however remains unchanged.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">System Methodology</head><p>The presented model uses different types of classical stylometric methods along with some new methods in order to find text borders where style breach is identified. The system used sentences as text segmentation unit. The sentence window keeps extending over its neighboring sentences until style breach is detected. Following are the methodology steps used by the system in order to find out style breaches. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Words Lists Preparation</head><p>Different types of lists of words were prepared from different internet sources <ref type="bibr">[10,</ref><ref type="bibr">11,</ref><ref type="bibr">12,</ref><ref type="bibr">13</ref>] that express specific moods or human feelings. Seven expression lists of words were used including anger, confusion, curiosity, urgency, satisfaction, inspiration and happiness; where all lists comprised of about 200 words each. One reason for choosing only these seven expressions was the availability of proper expressive words over internet sources for these expressions. The second reason was to use limited set of expressions that may express human feelings while writing some text. More expressions may be included for future research. Two words additional words lists of about 500 words each of which reflecting positive or negative attitudes [14, 15] were included. An example of these expressive and attitude lists is shown in table <ref type="table" target="#tab_1">1 and table 2</ref> </p><formula xml:id="formula_0">---------- ---------- ---------- ---------- ---------- ----------</formula><p>These lists became the part of model and will be used for labeling of sentences in next methodology steps.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Text Segmentation into Sentences</head><p>Each individual document D in the repository was segmented into sentences , , , ,…. . A simple algorithm was used to break a document into array of sentences. It first traverse through each character of document D from start until the any of the two characters '.' or '?' are encountered, which indicates sentence endings. The sentence is extracted and the algorithm continues from next character as start of next sentence.</p><formula xml:id="formula_1">D = + + + + …. + (<label>1</label></formula><formula xml:id="formula_2">)</formula><p>Where i is the starting index of each sentence and n is the number of total sentences in D. The first three sentences of any document D will be the starting window (j = 1) for initializing point that may or may not extend and merge with next adjacent sentence windows (two at a time) depending on further analysis, also the adjacent sentence windows will also share boundary sentence as shown in equation 2 and 3.</p><formula xml:id="formula_3">= + + (2) = + (<label>3</label></formula><formula xml:id="formula_4">)</formula><p>The sentence is common boundary sentence in first and second windows and . This common sentence among two adjacent windows will increase the similarity index when comparing both windows for a possible merger/extension.</p><p>As discussed above n is the total number of sentences in any document and each sentence window W can have only three sentences in start (as shown in equations 2 and 3); hence the maximum number of text windows in any document will be as shown in equation 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Max. Windows (m) = (4)</head><p>Let's consider for an example j = 1, so first two sentence windows and are chosen for further analysis. The next steps performed by model are as follows.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Sentence Window based syntactic analysis: Text in both adjacent</head><p>windows is converted to its respective part of speech (POS) tags for each word present in texts as shown in table <ref type="table" target="#tab_3">4</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Sentence Window based Lexical Analysis:</head><p>At this step, the model performs a lexical analysis for both text windows. In this analysis following features are extracted:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Most frequent alphanumeric and non-space character</head><p>in the text window is extracted e.g.</p><p>= "e" in both text windows in shown table <ref type="table" target="#tab_3">4</ref>.</p><p>Most frequent non-alphanumeric and non-space character ( in the text window is extracted e.g. , .</p><p>="," in both text windows and .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Most frequent word</head><p>in the text window is extracted where i in equation below is the index of word w e.g.</p><p>= "in" and = "of" in both text windows respectively as mentioned in table <ref type="table" target="#tab_3">4</ref>. The frequency of each word is calculated as shown in equation 5.</p><p>Word Frequency (</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>∑ (5)</head><p>Character to Space Ratio is calculated for each text window as shown in equation 6.</p><p>Character to Space Ratio ( ) = (6)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Content Based Analysis of Sentence Window: At this step commonality index</head><p>of each window is calculated using the list L of 5000 common words. Let be a common word existing in both L and any text window where i specifies the index (i = 1… 5000) in L in eq. 7.</p><p>√ ∑ <ref type="bibr" target="#b5">(7)</ref> Where k is the total number of coexisting words in both L and , and be the frequency of in , is the frequency of in list L (as shown in table <ref type="table" target="#tab_2">3</ref>) and l is the total number of words in .</p><p>Next two steps can be considered as sub-steps of Content based analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Sentence Window Expression Labeling:</head><p>The model will label each window with a specific feeling or human mood expression . Let i is the index (i = 1… 7) of expression list as shown in table <ref type="table" target="#tab_0">1</ref>, Let be a coexisting word in both and text window where m specifies the index in . Expression score is measured on the basis of following equation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>∑ (8)</head><p>Where k is the total number of coexisting words in both and , and be the frequency of in . After calculating all seven expression scores the model will calculate e through following equation. <ref type="bibr" target="#b7">(9)</ref> In cases where two or more expression scores are equal, or all expression scores are zero, the model will assign a "neutral" expression for window .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Sentence Window Attitude Labeling:</head><p>The model will label each window with a specific attitude or human behavior . Let i is the index (i = 1… 2) of attitude list as shown in table <ref type="table" target="#tab_1">2</ref>, Let be a coexisting word in both and text window where m specifies the index in . Attitude score is measured on the basis of following equation.</p><formula xml:id="formula_5">∑ (<label>10</label></formula><formula xml:id="formula_6">)</formula><p>Where k is the total number of coexisting words in both and , and be the frequency of in . After calculating both positive and negative attitude scores the model will calculate a through following equation.</p><p>(11)</p><p>In case both scores are equal or zero, the model will assign a neutral attitude for e.g. both and have neutral attitude. </p><formula xml:id="formula_7">[ ]<label>(14)</label></formula><p>The system will now measure stylistic similarity score as shown in following equations <ref type="bibr">(16)</ref> Where, for each x in equation 15, the similarity score is incremented accordingly. and are treated separately as matrices because these two contains decimal values. A matrix subtraction is applied to and</p><formula xml:id="formula_9">[ ]<label>(17)</label></formula><p>If cr and ci lie within a threshold range described in next section, then similarity score is incremented accordingly. Finally, it's time to decide whether or not to merge and on the basis of value of lies within a threshold range described in next section. At this point two cases will emerge:</p><p>Case-1: lies within a threshold range In this case and are considered merged, and a new resultant window will be created where r is the index of resultant window. The model will continue from step 1 of methodology for sentence and .</p><formula xml:id="formula_10">= + + + + (<label>18</label></formula><formula xml:id="formula_11">)</formula><p>will keep expanding until case-1 keeps occurring and this resultant window will reflect a single style for all sentences contained within.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Case-2: does not lie within a threshold range</head><p>In this case the coexisting sentence in both adjacent windows will stay either in window or in e.g. let's assume in equations 2 and 3. 1.</p><p>will become a separate single sentence window . </p><p>After the style breach detection among first two consecutive sentence windows, new windows and will be compared starting from step 1 of methodology.</p><p>In the end we have a set of resultant windows known as R = where m is the maximum number of sentence windows and each in R is considered a breach detection.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Results</head><p>A number of experiments were carried out in order to adjust the threshold values and for which the final F-Measure score was highest. Once the values were adjusted over the training dataset, the system was ready to run for test dataset provided at TIRA [17] in order to detect style breaches.</p><p>Following are the evaluator results shown in table <ref type="table" target="#tab_6">5</ref>. The results were improved for the final test dataset, however the model precision remained low from recall and that affected the final F-Measure score, which shows that more experiments over different data sources for adjusting threshold values may be required.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>In this paper an unsupervised model for the detection of style breach is presented, this research field is rather new and more difficult to implement because non availability of any external resources for reference and also we only have to rely on stylistic attributes of unknown number authors that may or may not have contributed in the creation of text document under inquiry, hence this model presents new directions or ways i.e. Expression and Attitude labeling of textual windows in order to find style breach within sentences without the pre-assumption of authors style of writing and relying more on text content. In future the results can be improved with discovery of more text labels or with the addition of more expression lists and reduction of conventional stylistic approaches, this model can hence be applied to other languages as well.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>Words lists preparation  Text segmentation into sentences  Sentence window based syntactic analysis  Sentence window based lexical analysis  Content based analysis of sentence window  Sentence window expression labeling  Sentence window attitude labeling  Style breach calculation</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 . Example of words expressing different feelings Index Expression Words 1</head><label>1</label><figDesc>.</figDesc><table><row><cell>3</cell><cell>Curiosity</cell></row><row><cell>4</cell><cell>Inspiration</cell></row><row><cell>5</cell><cell>Happiness</cell></row><row><cell>6</cell><cell>Satisfaction</cell></row><row><cell>7</cell><cell>Urgency</cell></row></table><note>Anger ordeal, outrageousness, provoke, repulsive …. 2 Confusion doubtful, uncertain, indecisive , perplexed…. secret, confidential, controversial, underground….. motivated, eager, keen, earnest…. blissful, joyous, delighted, overjoyed….. accurate, satisfied, advantage, always….. magical, instantly, missing, quick……</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 . Example of words expressing positivity or negativity Attitude Words Positive</head><label>2</label><figDesc>An additional list of 5000 most common English words with word frequencies was also included[16]  an example of which is shown in table3. This list contributes in order to measure the commonality index in a sentence.</figDesc><table /><note>admiring, adoring, affectionate, appreciative, approving…. Negative abhorring, acerbic, ambiguous, ambivalent, angry, annoyed…...</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 . Example of common English words with frequencies</head><label>3</label><figDesc></figDesc><table><row><cell>Word</cell><cell>Frequency</cell><cell>Word</cell><cell>Frequency</cell><cell>Word</cell><cell>Frequency</cell></row><row><cell>A</cell><cell>10144200</cell><cell>casual</cell><cell>6946</cell><cell>Naval</cell><cell>4990</cell></row><row><cell>abandon</cell><cell>15323</cell><cell>casualty</cell><cell>6439</cell><cell>Near</cell><cell>54869</cell></row><row><cell>ability</cell><cell>51476</cell><cell>cat</cell><cell>21135</cell><cell>Nearby</cell><cell>13820</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 . Example of POS tagging in adjacent text windows</head><label>4</label><figDesc></figDesc><table><row><cell>Window#</cell><cell></cell><cell></cell><cell>Text</cell><cell></cell><cell>POS Tags</cell></row><row><cell></cell><cell cols="5">Obama's mother returned to Hawaii in</cell><cell>NNP POS VBN TO NNP</cell></row><row><cell></cell><cell cols="5">1972 for five years, and then in 1977</cell><cell>IN CD IN CD NNS, CC</cell></row><row><cell></cell><cell cols="5">went back to Indonesia, where she worked as an anthropological fieldworker. She stayed there most of the rest of her life, returning to Hawaii in 1994. She died of ovarian cancer in</cell><cell>RB IN CD NN TO NNP, WRB PRP VBD IN DT JJ NN. PRP VBD RB JJS IN DT NN IN PRP$ NN, VBG NNS IN CD. PRP VBD IN JJ NN IN CD.</cell></row><row><cell></cell><cell>1995.</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="5">She died of ovarian cancer in 1995. Of</cell><cell>PRP VBD IN JJ NN IN</cell></row><row><cell></cell><cell cols="5">his early childhood, Obama has recalled, "That my father looked nothing like the people around me that he was black as pitch, my mother white as milk barely registered in my mind." In his 1995 memoir, he</cell><cell>CD. IN PRP$ JJ NN, NNP VBD, `` IN PRP$ NN VBD NN IN DT NNS IN IN PRP VBD JJ IN NN, PRP$ NN JJ IN NN VBN IN PRP$ NN. IN PRP$ CD NN, PRP VBD NNS IN</cell></row><row><cell></cell><cell cols="5">described his struggles as a young</cell><cell>DT JJ NN TO VB JJ NNS</cell></row><row><cell></cell><cell cols="5">adult to reconcile social perceptions of</cell><cell>IN JJ NN.</cell></row><row><cell></cell><cell cols="4">his multiracial heritage.</cell></row><row><cell cols="6">From the two examples presented in table 4, the model extracts following text</cell></row><row><cell>features:</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="5">Starting and ending POS tags ( ,</cell><cell>) for each sentence in each sentence</cell></row><row><cell cols="4">window e.g. starting POS tags for</cell><cell>are</cell><cell>= {NNP, PRP, PRP} and ending</cell></row><row><cell>POS tags are</cell><cell cols="3">= {NN, CD, CD}.</cell><cell></cell></row><row><cell cols="6">Most frequent POS tags and POS tag pairs ( ,</cell><cell>) are extracted e.g. most</cell></row><row><cell cols="2">frequent POS tag in</cell><cell>and</cell><cell>is</cell><cell>,</cell><cell>= IN and most frequent POS tag pairs</cell></row><row><cell cols="2">in both windows are</cell><cell cols="3">= {IN, CD} and</cell><cell>= {IN, PRP$} respectively.</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 5 . Training and Test Results over Style Breach detection Datasets</head><label>5</label><figDesc></figDesc><table><row><cell>Corpus</cell><cell>Win. Diff</cell><cell>Win. Precision</cell><cell>Win. Recall</cell><cell>Win.F-</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Measure</cell></row><row><cell cols="2">Training dataset 0.5184</cell><cell>0.3656</cell><cell>0.4841</cell><cell>0.2671</cell></row><row><cell>Test dataset</cell><cell>0.4799</cell><cell>0.39900</cell><cell>0.48710</cell><cell>0.2888</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Authorship verification for short messages Using stylometry</title>
		<author>
			<persName><forename type="first">Marcelo</forename><surname>Brocardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Issa</forename><surname>Luiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sherif</forename><surname>Traore</surname></persName>
		</author>
		<author>
			<persName><surname>Saad</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computer, Information and Telecommunication Systems (CITS), International Conference</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of the 3 rd international competition on plagiarism detection</title>
		<author>
			<persName><forename type="first">Benno</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Barrón</forename><surname>Cedeño</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Eiselt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paolo</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><surname>Rosso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings. CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Methods for Intrinsic Plagiarism Detection and Author Diarization Notebook for PAN at CLEF</title>
		<author>
			<persName><forename type="first">Mikhail</forename><surname>Kuznetsov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anastasia</forename><surname>Motrenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rita</forename><surname>Kuznetsova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vadim</forename><surname>Strijov</surname></persName>
		</author>
		<ptr target=".org" />
	</analytic>
	<monogr>
		<title level="m">CLEF 2016 Evaluation Labs and Workshop -Working Notes Papers</title>
				<editor>
			<persName><forename type="first">Krisztian</forename><surname>Balog</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Linda</forename><surname>Cappellato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Nicola</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Craig</forename><surname>Macdonald</surname></persName>
		</editor>
		<meeting><address><addrLine>Évora, Portugal</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
	<note>CEUR-WS</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Data mining challenges for electronic safety. The case of fraudulent intent detection in e-mails</title>
		<author>
			<persName><forename type="first">Edoardo</forename><surname>Airoldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bradley</forename><surname>Malin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Privacy and Security Aspects of Data Mining</title>
				<meeting>the Workshop on Privacy and Security Aspects of Data Mining</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Seduced into scams: Online lovers often duped</title>
		<author>
			<persName><forename type="first">B</forename><surname>Sullivan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">MSNBC</title>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A survey of trust and reputation systems for online service provision</title>
		<author>
			<persName><forename type="first">Audun</forename><surname>Josanga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Roslan</forename><surname>Ismailb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Colin</forename><surname>Boyda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Decis. Support Syst</title>
		<imprint>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="618" to="644" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Identification Task at PAN 2017: Style Breach Detection and Author Clustering</title>
		<author>
			<persName><forename type="first">Michael</forename><surname>Tschuggnall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Efstathios</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ben</forename><surname>Verhoeven</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Walter</forename><surname>Daelemans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gunther</forename><surname>Specht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benno</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Potthast</surname></persName>
		</author>
		<ptr target="org" />
	</analytic>
	<monogr>
		<title level="m">CLEF Labs and Workshops</title>
		<title level="s">Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.</title>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">10456</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace</title>
		<author>
			<persName><forename type="first">Ahmed</forename><surname>Abbasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hsinchun</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Information Systems (TOIS)</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
	<note>Article No</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
