<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Identifying Sexual Predators by SVM Classification with Lexical and Behavioral Features</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Toronto</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <abstract>
        <p>We identify sexual predators in a large corpus of web chats using SVM classification with a bag-of-words model over unigrams and bigrams. We find this simple lexical approach to be quite effective with an F1 score of 0.77 over a 0.003 baseline. By also encoding the language used by an author's partners and some small heuristics, we boost performance to an F1 score of 0.83. We identify the most “predatory” messages by calculating a score for each message equal to the average of the weights of the n-grams therein, as determined by a linear SVM model. We boost performance with a manually constructed “blacklist”.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Our tasks were to distinguish sexual predators from non-predators in a corpus of online
chats which was highly biased toward the negative class, and then to identify lines
written by these alleged predators which were most indicative of their bad behaviour.
Of the approximately 98,000 chat participants (whom we will generically refer to as
“authors”) in the PAN training corpus, 142 are identified as being sexual predators. We
subdivide the complement class of non-predators into “victims” (anyone who ever talks
to a predator — we have 142 in our training corpus), and “bystanders” (those who have
no interactions with predators).</p>
      <p>
        Although we read existing literature on the linguistic characteristics of sexual
predators, such as [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], unlike some of the other teams we make no a priori
assumptions about the language of sexual predators, and only the barest assumptions
about predator behaviour (we merely assume that predators and victims chat in pairs,
rather than in larger groups). Rather, we use a naive machine learning approach, wherein
predators are defined solely and completely by the behaviour and language of the 142
predators identified by the ground truth of the training set.
      </p>
      <p>We use a set of standard lexical features and features that generically describe the
behaviour of chat participants. It’s our hope that, given the success of our approach,
a post hoc analysis of feature weights will suggest an empirically defensible model
of “predatory language”, and perhaps add or remove evidentiary weight to existing
theories of predator language and behaviour.</p>
      <p>We hypothesize that our classifier will be more effective if it can be attuned to both
the language of predatoriness and the language of victimhood. For example, we imagine
that an adult engaging in a sexually explicit chat with another consenting adult might
use language not unlike that of a sexual predator. However, we would expect the other
participant to be an eager participant in the former case, and reticent or evasive in the
latter case.</p>
      <p>
        Thus, an important aspect of our approach is that a given author’s feature vector
reflects not just that author’s language and behaviour, but also the language and behaviour
of his or her interlocutor(s). This gives our machine learning algorithm roughly twice
as much information to base its model on; we expect at least some of this additional
information to be useful for discrimination, since we don’t expect the language of one
author to wholly determine the language of his or her interlocutor, notwithstanding the
effect of lexical entrainment [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>For the second task of predatory message identification, we return to our set of
lexical features from the classification task. We train a linear SVM model for distinguishing
predators from non-predators using just lexical features, and use the resulting weights
over unigrams and bigrams to induce a weighting of “predatoriness” over all terms. We
flag all predator messages where the sum of the weights of the terms in the message is
above a certain hand-tuned threshold, along with all messages which contain any terms
in a hand-assembled “blacklist”.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Features</title>
      <p>Our feature set can broadly be divided into lexical features and what we’ll term
“behavioural features”, which capture patterns in the ebb and flow of conversation. Feature
vectors are calculated on a per-author basis.
2.1</p>
      <sec id="sec-2-1">
        <title>Lexical features</title>
        <p>We use a standard bag-of-words model, since this has been shown to be robust in the
face of a wide variety of text classification problems. Having also experimented with
term presence, tf-idf, and log of term frequency, we ultimately settled on simple term
frequency as our metric. We used both unigrams and bigrams.</p>
        <p>As noted above, a key aspect of our approach to lexical features was our
consideration of the language of the focal author’s interlocutors as well as that of the focal
author themselves. Thus every token t that appears more often than our threshold
(empirically set to 10) yields two features: the number of times the focal author utters t, and
the number of times any of the focal author’s interlocutors utters t. We will henceforth
refer to features of the latter type as “mirror” features. If we take the following short,
imagined exchange as an example:</p>
      </sec>
      <sec id="sec-2-2">
        <title>Author1: hi alice</title>
      </sec>
      <sec id="sec-2-3">
        <title>Author2: hi hi</title>
        <p>then Author1 would be associated with the following vector:</p>
        <p>fhi : 1; alice : 1; hi alice : 1; OTHER_hi : 2; OTHER_hi hi : 1g
and Author2 would be associated with a mirror vector:</p>
        <p>fhi : 2; hi hi : 1; OTHER_hi : 1; OTHER_alice : 1; OTHER_hi alice : 1g:
We experimented with a number of standard text preprocessing routines including
lowercasing, stripping punctuation, and stemming. None of these routines improved
performance, thus our final results use simple space-separated tokens as features.</p>
        <p>We also tried to add “smarts” to our lexical features with some transformation rules.
We introduced the following special tokens:
\SMILEY For smiley faces matching a collection of emoticons assembled from Wikipedia
(http://en.wikipedia.org/wiki/List_of_emoticons). We also introduce the following
refinements:
\SMILEY_happy
\SMILEY_sad
\SMILEY_silly
\SMILEY_other
\MALE_name For tokens matching a list of the 1,000 most common male given names
in the United States.1 We manually removed around 10 names which are more
likely to appear as common nouns (e.g. “Guy”).
\FEMALE_name As above, for female names. In cases where a name can be both
male and female, we choose the sex for which the name is more popular.
\NUM For any sequence of digits. We also introduce the following refinements on this
category:
\NUM_small For n &lt; 13.
\NUM_teen For 13 n &lt; 18.
\NUM_adult For 18 n &lt; 80.</p>
        <p>\NUM_large For n 80.
\PHONE_num For tokens matching any number of patterns for a phone number, with
or without area code, with a variety of possible delimiters.</p>
        <p>To our disappointment, these transformations seemed to add little discriminative power
to our model; we will elaborate on and discuss this later in our results section.
Unless otherwise specified, all results given below use only the simplest lexical features,
without preprocessing or transformation rules.
2.2</p>
      </sec>
      <sec id="sec-2-4">
        <title>Behavioural features</title>
        <p>In addition to using the language of our authors, we explored high-level conversational
patterns in order to exploit the small amount of metadata associated with conversations
(mostly in the form of timestamps). In addition to looking at what words authors use,
we’re interested to see how they use them.
1 We sourced our name lists from http://www.galbithink.org/names/us200.htm, using births
from 1990 to 1999. The figures ultimately come from United States Social Security
Administration.</p>
        <p>Because we became interested in the secondary problem of distinguishing predators
from victims (see section 3.1), many of these features are concerned with the problem
of “symmetry-breaking”. That is, given two authors who speak to one another using
very similar language (which we found is often the case with predators and victims),
what non-lexical aspects of the conversation can be used to distinguish them?</p>
        <p>We used two “author-level” features which were straightforward to calculate on a
per-author basis:
NMessages The total number of messages sent by this author in the corpus.
NConversations The total number of conversations in the corpus which this author
participates in.</p>
        <p>These two features were quite strongly correlated with predatorhood. This is probably
an unintended side effect of the corpus construction, and we shouldn’t use this fact to
draw any conclusions about predator behaviour, such as “predators talk a lot”.</p>
        <p>Because of the large imbalance between the positive and negative class in the
corpus and because there were anomalies on both sides (that is, predators with very few
messages or conversations, and non-predators with many messages and conversations),
these features alone are not enough to attain a reasonable F-score.</p>
        <p>Initiative We employ a number of features which can be thought of as approximating
an author’s tendency to “initiate” with their partner:
Initiations The number of times this author initiates a conversation by sending the first
message (this is usually something like “hey” or “what’s up?”).</p>
        <p>Initiation rate The above variable normalized by number of conversations.
Questions The number of times this author asks a question, where we roughly define
a question as any message ending in a question mark or interrobang.</p>
        <p>Question rate As above, but normalized by number of messages.</p>
        <p>Attentiveness Another set of features correspond to an author’s attempts to keep a
conversation going, and perhaps their level of commitment to the conversation.
Response time Messages in our corpus come with timestamps which are not
guaranteed to be correct in an absolute sense, but which we assume are at least correct
with respect to some time offset; thus, we expect the time deltas between messages
to be accurate. Unfortunately, we have only minute-level precision. In a
conversation between authors A and B we measure A’s response times as follows: when we
first see a message from B, we record the timestamp t0. We pass by any subsequent
messages from B until we encounter a message from A and record its timestamp t1.
The response time is t1 t0. We seek ahead to the next message from B and repeat
this process until the end of the conversation. We measure the mean, median, and
max response times for each author, aggregated over all response times (rather than
over all conversations).</p>
        <p>This measure falls apart somewhat with conversations involving more than two authors.
However, one of the few assumptions we make about predators and victims is that they
always speak in pairs — and this is certainly true in the training data.</p>
        <p>Repeated messages We measure the lengths of “streaks” of messages from the focal
author which are uninterrupted by an interlocutor. The shortest allowable streak
length is 1. Again, we record the mean, max, and median repeated messages.
Conversation dominance Our last set of features can be thought of as reflecting the
degree to which the focal author “dominates” his conversations.</p>
        <p>Message ratio The ratio of messages from the focal author to the number of messages
sent by the other authors in the conversation, aggregated over all conversations in
which the focal author participates.</p>
        <p>Wordcount ratio As above, but using the number of “words” (space-separated tokens)
written by each author.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Machine learning techniques and tools</title>
      <p>
        Our machine learning algorithm of choice was support vector machines, using the
LIBSVM library [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We used a radial kernel, having also experimented with a linear
kernel. We return to the linear kernel in the predator message task (below), since unlike the
radial kernel, it allows us to inspect feature weights to get a rough idea of the
discriminative power of various features.
      </p>
      <p>In testing our models, we used cross-validation with n = 5.
3.1</p>
      <sec id="sec-3-1">
        <title>Results postprocessing</title>
        <p>After classifying unknown authors using our model, we experimented with two later
filters for boosting performance. Both steps were motivated by our observation that a
large proportion of false positives (usually more than 75%) were in fact victims; thus
predators and victims were quite similar in our dataset with respect to our lexical and
behavioural features.</p>
        <p>The first and most obviously effective step hinged on the assumption that the
likelihood of two predators talking to one another was negligbly small. Thus, with our set
of predicted predators, we returned to our corpus of conversations and found any pairs
that ever talked to one another. For every such pair, we flipped the label of the author
in whom the SVM had the least confidence (in addition to predicted labels, LIBSVM
yields the confidence of each prediction). This increased precision at a small cost to
recall.</p>
        <p>The second filter used a second SVM model with the specialized task of
distinguishing predators from victims (rather than predators from non-predators). After the
first classification, we would run our predator-victim classifier on the alleged predators,
and keep only the authors that were again labelled as predators. The rationale behind
this step was that the differences between predators and bystanders are quite coarse.
This is due to the nature of the training set, where the non-predatory conversations tend
to be very different from predatory conversations in terms of topic (e.g. IRC chatrooms
on web programming), or in the relationship between interlocutors (e.g. short chats
between anonymous strangers on Omegle, which contrast with predators and victims who
tend to have repeated, sustained conversations).</p>
        <p>Because predators and victims are discussing the same topics and are virtually
identical in terms of number and length of conversations, we need to look to more
finegrained differences. This is what motivated our “symmetry-breaking” behavioural
features such as message ratio, number of repeated messages, and number of initiations.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Predatory messages task</title>
        <p>We trained a linear model for discriminating predators from non-predators using only
our lexical features. We then treated the weight assigned to each term as an
approximation of the “predatoriness” of that term. We assigned a predator score to each message
equal to the sum of the weights of all unigrams and bigrams in the message, and flagged
as predatory all messages with a predator score above a certain threshold. We
handtuned this threshold so that what we deemed was a reasonable proportion of messages
were flagged (approximately 2% to 5%).</p>
        <p>We also build by hand a “blacklist” of 122 n-grams (including morphological
variations and spelling variants) which automatically flag a message as predatory. Because
we begin from the assumption that the messages we’re classifying are all from
predators to victims, we can choose words which have no conceivable place in an
appropriate conversation between an adult and a child. Thus, these words don’t automatically
signal a message as predatory (since they may be employed in conversations between
consenting adults), but they do signal a message as predatory when the message is from
a predator to a victim.</p>
        <p>
          Our blacklist focuses on terms which are sexually explicit, pertain to the exchange
of photos, or pertain to arranging meetings. In an analysis of 51 chats between sexual
predators and victims, Briggs et al [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] found that 100% of predators initiated sexually
explicit conversations, 68.6% sent nude photos, and 60.8% scheduled a face-to-face
meeting. We expect this blacklist to strictly increase recall, at a trivial cost to precision,
if any.
        </p>
        <p>Finally, we heavily penalize very short messages (those consisting of four or fewer
space-separated tokens). This is based on the assumption that such short messages are
unlikely to convey enough propositional content to be “predatory” (except, perhaps,
with respect to the surrounding context), and on the volatility of taking averages over a
small set of values.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <sec id="sec-4-1">
        <title>Predator classification</title>
        <p>Using the default parameter settings for LIBSVM (g = 1=nfeatures, C = 1), gave
precision of 0:91, recall of 0:28, and F1 score of 0.43 on the PAN training data. The large
disparity between recall and precision suggested that we needed to penalize errors in
one class above those in the other. Setting the parameter w1 to 15, thus penalizing false
negatives 15 times more than false positives, gave precision 0:63, recall 0:65 and F1
score 0:64, thus optimizing F1 score.</p>
        <p>We performed a grid search to optimize the setting of parameters C and g , varying
them on a logarithmic scale. We settled on C = 100 and g = 10 4.</p>
        <p>Table 1 gives our basic cross-validated results on the training data, along with the
results associated with certain variations. Section 3.1 describes the “partner flip” and
“predator-victim classification” filters. Our set of transformation rules are described in
section 2.1. “Only focal lexical features” means that we only count the words used by
the author under consideration (the “focal author”) and not their interlocutors – see
section 2.1.</p>
        <p>Precision and recall alone don’t give a full picture of the nature of our errors, since
there is a hidden “third class” beyond predators and non-predators. There is a relatively
high degree of confusion between predators and “victims” (those who chat with
predators). Table 2 gives the confusion matrix for these classes in a basic run, and table 3
gives the confusion matrix for the same run following our “partner flip” filter. Note that
these confusion matrices aren’t square because in our classification scheme the “victim”
and “bystander” classes are conflated into the class of “non-predators”.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Message classification</title>
        <p>Our results for the message classification subtask are given in table 4, evaluated on the
ground truth given for the test data. Because our training data contains no ground truth
for the message classification task, we’re unable to give cross-validated results.
aFor the sake of clarity and completeness, we include here our results as reported on the
competition website, which are hindered by a bug which caused messages by alleged predators and
victims to be considered. All other results reported here were obtained after this bug was fixed.</p>
        <p>In preparing our submission, we didn’t know that F3 score would be the evaluation
metric, nor what proportion of predator messages would be flagged. Thus our particular
“standard” threshold, which resulted in high precision and low recall, put us in a
relatively poor position. The “Low predatoriness threshold” run uses the same methods but
a much lower minimum predatoriness score for messages ( 0:03 rather than 0:012),
with the aim of improving recall and therefore F3 score.</p>
        <p>Note that our baseline involves selecting every message as predatory, even though
it does not have 1.0 recall. This is because the pool of “predators” whose messages we
classified was based on our classification in the previous step, rather than the ground
truth (and thus, because we didn’t achieve perfect recall in the first subtask, some
predators don’t even have their messages considered in this subtask). The interdependence
of the subtasks also means that our baseline applies uniquely to our results, and not to
those of other teams, who may have higher or lower baselines.
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <sec id="sec-5-1">
        <title>Predator classification</title>
        <p>Perhaps the most interesting feature of table 1 is the robustness of simple lexical
features. Of our innovations – the mirror lexical features for conversational partners (see
section 2.1), the partner flip and predator-victim classification filters, transformation
rules, and behavioural features – only the mirror lexical features have an
unambiguously positive effect on results, and some seem to diminish F-score when compared to
the lexical baseline.
2 Feature weights are not necessarily distributed symmetrically about 0; thus it would be facile
to say that positive weights are predatory and negative weights are “anti-predatory”.</p>
        <p>The partner flip step was generally effective, especially in maximizing F0:5 score
which was the evaluation measure for the competition. The improvement shown in
table 1 is small because the partner flip is a step that’s most effective in high-recall,
low-precision runs, whereas ours tended to be the opposite. While our predator-victim
classifier was quite accurate (having a cross-validated accuracy of 0.93 when applied to
the predators and victims in our training data), it wasn’t ultimately able to increase our
F-score in the classification of predators and non-predators. Again, we suspect that the
picture might have been different if our results had been skewed toward high recall and
low precision rather than the opposite.</p>
        <p>Omitting behavioural features seems to give a slight (0.03) increase in cross-validated
F-score. A naive interpretation of this might be that behavioural features are actually
harmful to accuracy. In fact, they do convey useful information about predatoriness,
since our 12 behavioural features alone attain an F-score of 0.56, which is well above
baseline (and which would place in the middle of the competition results). We suspect
that the score increase when omitting these features is due to random noise. Applied to
the evaluation data, it was the purely lexical model that gave a slightly lesser F-score.</p>
        <p>We suspect that the negligible effects of our innovations are because the Pareto
principle is at play in the data, wherein 20% of our features capture 80% of the instances
in our corpus (in fact, the ratio may be more like 1% to 99%). This is supported by
the fact, as noted above, that our mere 12 behavioural features can attain a stunningly
high F-score of 0.56 on our highly imbalanced dataset (where the random baseline
is 0.03). We claim that our transformation rules and behavioural features carry useful
information about predatorhood, but that they unfortunately don’t provide enough new
information on top of our simple lexical features to increase performance.</p>
        <p>Table 5 gives the 10 top and bottom lexical features associated with predatorhood.
While we know that our simple lexical features are very effective at identifying
predators, the feature weightings are surprisingly opaque. While the top 100 features contains
a handful of obviously sexual n-grams (e.g. 18:sexy, 23:wanna fuck), the vast majority
are common function words (e.g. 10:there, 24:you, 28:my, 40:and). Thus, it’s not
obvious how to draw a meaningful picture of predator language based on these weights.</p>
        <p>Table 6 gives the average of some of our behavioural features across our three
classes of authors. Although our behavioural features ultimately offered no
improvement on top of our lexical features, they were able to form a reasonably accurate
classification model alone, and their distribution may offer some insights into predator and
victim behaviour (in a way that our lexical features have not). As noted earlier, the
trends in number of messages and conversations are artefactual and not much should
be read into them. However, it’s interesting that predators consistently send more and
longer messages than their victim counterparts. Predators also initiate conversations
almost twice as often as victims, and take, on average, less than half as long to respond
to messages. The standard deviation for average response time among victims is 6.733,
quite large compared to 1.053 for predators and 2.267 for bystanders. This suggests that
the distribution for victims has a long tail, with victims often waiting long periods of
time to respond.</p>
        <p>These numbers paint a behavioural picture of the predator as someone who
dominates conversations, and who is the more “eager” participant, tending to initiate
conversations, and keep them going by responding quickly and voluminously.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Message classification</title>
        <p>Despite our ranking of n-grams based on linear SVM weights being difficult to
interpret, they were fairly effective at classifying messages. Our approach gives the highest
precision of all submissions, our original submission achieving 0.445 precision, and
0.544 following a bugfix, above the next highest precision submission of 0.350, and
well above the baseline of 0.092.</p>
        <p>Our initial parameter setting gives an F1 score of 0:284, which is well above the
baseline of 0:169. Our best F3 score is achieved by setting a low threshold for
predatorscore, giving F3 = 0:403. To our surprise, our baseline of labelling every message as
predatory achieves an F3 score of 0.363, which bests all but the aforementioned run,
and which handily exceeds all submissions to the competition.</p>
        <p>The “Low threshold, only weights” row of table 4 shows that our SVM weights
alone achieve a respectable F1 score (0.232, exceeding the baseline of 0.160).</p>
        <p>As we would expect, the blacklist alone achieves the highest precision, at a cost to
recall. We were surprised to see that precision was only 0.565, since we had constructed
our blacklist in such a way that we thought all terms would be unambiguously
“predatory”. Examining the false positives from this run reveals that most could be argued to
belong to the class of predatory messages, for example:
&lt;conversation id=027600c74917a8d2438070be950fc2b6&gt;
&lt;message line=40&gt;i wanna kiss, etc&lt;/message&gt;
&lt;message line=42&gt;lick&lt;/message&gt;
&lt;/conversation&gt;
&lt;conversation id=0730400af8a1b5a8aa88146baf417191&gt;
&lt;message line=15&gt;so you wont be sleeping naked tonight</p>
        <p>I take it&lt;/message&gt;
&lt;message line=71&gt;so what are you wearing?&lt;/message&gt;
&lt;message line=84&gt;so does she have a cam?&lt;/message&gt;
&lt;message line=90&gt;what would you show me on cam?&lt;/message&gt;
&lt;/conversation&gt;</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Brennan</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          :
          <article-title>Conceptual pacts and lexical choice in conversation</article-title>
          .
          <source>Journal of Experimental Psychology: Learning, Memory, and Cognition</source>
          <volume>22</volume>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Briggs</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simon</surname>
          </string-name>
          , W.T.,
          <string-name>
            <surname>Simonsen</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An exploratory study of internet-initiated sexual offenses and the chat room sex offender: Has the internet enabled a new typology of sex offender? Sexual Abuse:</article-title>
          <source>A Journal of Research and Treatment</source>
          <volume>23</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <issue>3</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>LIBSVM: A library for support vector machines</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology 2, article 27</source>
          (
          <year>2011</year>
          ), software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Malesky</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          :
          <article-title>Predatory online behavior: Modus operandi of convicted sex offenders in identifying potential victims and contacting minors over the Internet</article-title>
          .
          <source>Journal of Child Sexual Abuse</source>
          <volume>16</volume>
          (
          <issue>2</issue>
          ),
          <fpage>23</fpage>
          -
          <lpage>32</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Marcum</surname>
          </string-name>
          , C.D.:
          <article-title>Interpreting the intentions of Internet predators: An examination of online predatory behavior</article-title>
          .
          <source>Journal of Child Sexual Abuse</source>
          <volume>16</volume>
          (
          <issue>4</issue>
          ),
          <fpage>99</fpage>
          -
          <lpage>114</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>McGhee</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bayzick</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontostathis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Edwards</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McBride</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakubowski</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Learning to identify Internet sexual predation</article-title>
          .
          <source>International Journal of Electronic Commerce</source>
          <volume>15</volume>
          (
          <issue>3</issue>
          ),
          <fpage>103</fpage>
          -
          <lpage>122</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pendar</surname>
          </string-name>
          , N.:
          <article-title>Toward spotting the pedophile: Telling victim from predator in text chats</article-title>
          .
          <source>In: First IEEE International Conference on Semantic Computing</source>
          . pp.
          <fpage>235</fpage>
          -
          <lpage>241</lpage>
          . Irvine, CA (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>