SU@PAN’2016: Author Obfuscation
                        Notebook for PAN at CLEF 2016

     Tsvetomila Mihaylova1 , Georgi Karadjov1 , Yasen Kiprov1 , Georgi Georgiev1 ,
                        Ivan Koychev1 , and Preslav Nakov2
 1
  Faculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, Bulgaria
       {tsvetomila.mihaylova, georgi.m.karadjov}@gmail.com,
{yasen.kiprov, g.d.georgiev}@gmail.com, koychev@fmi.uni-sofia.bg
                  2
                    Qatar Computing Research Institute, HBKU, Qatar
                                 pnakov@qf.org.qa


       Abstract The anonymity of a text’s writer is an important topic for some do-
       mains, such as witness protection and anonymity programs. Stylometry can be
       used to reveal the true author of a text even if s/he wishes to hide his/her identity.
       In this paper, we present our approach for hiding an author’s identity by mask-
       ing their style, which we developed for the Author Obfuscation task, part of the
       PAN-2016 competition. The approach consists of three main steps: the first one
       is an evaluation of different metrics in the text that can indicate authorship; the
       second one is application of various transformations, so that those metrics of the
       target text are adjusted towards the average level, while still keeping the meaning
       and the soundness of the text; as a final step, we are adding random noise to the
       text. Our system showed the best performance for masking the author style.


1    Introduction

Stylometry is a well-studied topic. Detecting the author style in particular, has been
studied for years and different approaches have been explored.
    However, the reverse process, i.e., hiding the style of an author, is less explored. It
has a lot of challenges, not only the author style has to be hidden, but the text needs to
remain grammatically correct and the meaning of the original text needs to be preserved.
    The PAN-2016 [2] Author Obfuscation task [16] is divided into two subtasks - Au-
thor Masking and Obfuscation Evaluation. The Author Masking task seeks for solutions
solving the following problem: “Given a document, paraphrase it so that its writing
style does not match that of its original author, anymore.” The documents given for
obfuscation have to be split into parts of up to 50 words each and each part is then
subject to obfuscation. The outcome is evaluated by three criteria: safety, i.e., whether
the author verification systems can detect the author from the obfuscated text; sound-
ness, which measures whether the result is entailed from the original, and sensibleness,
which checks whether the obfuscated text is meaningful. The latter two were evaluated
with peer review. The Obfuscation Evaluation subtask asks the participants to propose
automated measures for evaluation of the first subtask. Measures for one or more of the
criteria can be suggested. We have participated in both subtasks.
2   Related Work
Author identification is a well-studied topic. For instance Juola et al., 2011 [10] ana-
lyzed the features in JGAAP (Java Graphical Authorship Attribution Program) and built
a model using them, e.g., words, parts of speech, characters, and word bi-grams.
    Author Identification has been explored as a task at the PAN competition since 2011.
The PAN-2015 task description paper [18] summarizes the approaches and features
used for author identification. Among the most used features are the lengths of words,
sentences, and paragraphs, type-token ratios, hapax legomena, character n-grams (in-
cluding unigrams), words, punctuation marks, stopwords, part of speech n-grams. Other
features analyze the text more deeply by checking style and grammar.
    Kacmarcik et al. 2006 [11] explored author masking by detecting the most used
words by the author and trying to change them. They also mention the application of
machine translation as a possible approach for author obfuscation. Some authors de-
scribe using machine translation as a means for author obfuscation ([17], [4]) (trans-
lating passages of text from English to one or more other languages and then back to
English). Brennan et al. 2012 [4] investigate three different approaches for adversarial
stylometry: obfuscation (masking author style), imitation (trying to copy another au-
thor’s style) and machine translation. They have summarized the features people use
most when trying to obfuscate their own writing style.
    Juola et al. 2011 [10] experiment with different techniques for author obfuscation.
Their system consists of three main modules - canonization (unifying case, normaliz-
ing whitespaces, spelling correction, etc.), event set determination (extraction of events
significant for author detection, such as words, parts of speech bi- or tri- grams, etc.),
statistical inference (measures that determine the results and confidence in the final re-
port). The same authors used this approach ([9]) to detect deliberate style obfuscation.
Some other features used for author recognition are personal pronouns, sentence length,
unique words, and parts of speech ([1]).
    In our work here, we study most of the features mentioned by the research described
above in order to mask the author style, i.e., to address the Author Masking task.

3   Method
Our approach measures some of the most significant features of the text used for author
identification as mentioned in the work of Brennan et al. 2012 [4]. After that we apply
transformations that change the calculated metrics, so the text has average values for
the aforementioned metrics.
    The system consists of three main parts. First, we calculate “average” metrics based
on the training corpus provided for the Author Obfuscation task and a corpus of several
public domain books from Project Gutenberg [7]. Having the average metrics, before
transforming each document, we calculate the corresponding metrics for it. Then trans-
formation for each metric is applied, depending on whether its value is below or above
the calculated average. After the targeted transformations are applied, additional trans-
formations are added to transform the text beyond the target metrics. We chose very safe
transformations, so that the text would not change its meaning. We used dictionaries to
transform abbreviations, equations, and short forms to their text alternatives.
3.1   Calculating text metrics
We used the following metrics:
 1. Average sentence word count;
 2. Punctuation to word count ratio;
 3. Stop words to word count ratio;
 4. Type-token ratio;
 5. POS to word count ratio: measured for four part-of-speech groups: nouns, verbs,
    adjectives, and adverbs; we used the Python NLTK [3] with Universal Tagset for
    part of speech tagging;
 6. Words in all capital letters to word count ratio;
 7. Count of each word in the text.
     We calculate “average values” that were obtained by calculating the average for the
above metrics on the training corpus and a corpus of several public domain books from
Project Gutenberg [7]. Before splitting each document in parts to be obfuscated, we
calculate the above measures for it. For each part, we compare the document measure
values of the calculated averages and we apply transformations to increase the value if
it is below the corresponding average, or to decrease it when it is above it.

3.2   Modulizing the text
The given texts are split into parts of up to 50 words each, according to the task require-
ments. First, the text is split into sentences using the NLTK sentence splitter. Each part
of the text was obtained by merging sentences while the sample had less than 50 words.
We ignored paragraph separation for this splitting.

3.3   Text Transformations
 1. Splitting or merging sentences
    If the average sentence length of the whole document is below the average, we per-
    form merging of the sentences for each text part. We merge all the sentences for a
    given text part into one sentence. Merging is done by adding a random connecting
    word (and, as, yet) and randomly inserting punctuation - comma (,) or semicolon
    (;). When the average sentence length of the entire document is above the average,
    we split the sentence into shorter ones. We use a simple sentence splitting algo-
    rithm: We go through all POS-tagged words in the text, and we count the nouns
    and the verbs. When we reach a conjunction and, if the sentence so far contains a
    noun and a verb, we replace the and with a comma (,) and we capitalize the next
    word’s initial as it will now start a new sentence.
 2. Stop Words
    Stop words can be strong indicators for author identification due to the fact that
    some authors have the tendency to use specific stop words or to have specific stop
    words to other words ratio. Thus, we perform two kinds of transformations regard-
    ing stop words:
    – Removing stop words that carry little to no information.
    – Replacing stop words with their alternatives or with a phrases with the same
    meaning.
3. Spelling
   The spelling score of a document is high if there are no spelling mistakes, and low
   when there are some.
   – To increase the spelling score we apply spelling correction. The spell-checker
   uses a probability model and a previously mentioned corpus from publicly-available
   books.
   – To decrease the score, we use a dictionary to insert common mistakes in the
   text. The aforementioned dictionary was manually created using data from various
   sources.
4. Punctuation
   If the punctuation use is above average, we remove all punctuation used within the
   sentence. This is limited to the symbols comma (,), semicolon (;) and colon (:) If
   the punctuation use is bellow average, we apply two techniques to improve that
   score:
   – We randomly insert comma or semicolon before prepositions. We insert comma
   with a higher probability then for inserting semicolon.
   – We insert redundant symbols using the following schema:
   ! can be replaced with one of [!, !!, !!!]
   ? can be replaced with one of [?, ??, ???, ?!?, !?!]

5. Word Substitution
   In order to change the ratio of unique words, we replace the most or the least com-
   mon words. Replacement is done with synonyms, hypernyms or word descriptions
   from WordNet [5,15].
   If the document type-token ratio is above average, most words used in the document
   are randomly replaced with their synonym or hypernym. If the unique words ration
   in the document is below average, we randomly replace the least used words with
   their definition from WordNet.
6. Paraphrase Corpus
   We randomly replace phrases from the text with their substitutions from a para-
   phrase corpus. We use the short version of the phrasal corpus of PPDB, or the
   Paraphrase Database ([6]). This transformation appeared to be very useful for the
   results. By changing small phrases, the meaning of the text was still preserved, and
   there was an improvement in changing the metrics for unique work count and parts
   of speech.
7. Uppercase Words
   For decreasing the uppercase words ratio, we only transform words that are all
   upper case and are longer then three symbols. We assume that if the word is in
   upper case and is shorter then three symbols, it is an acronym and thus is supposed
   to stay in uppercase. The transformation is straightforward: all uppercase letters are
   substituted with lowercase ones.
3.4    Noise

After the transformations are added to mask the author identification features, we apply
some transformations that insert some noise in the text.

 1. Switching British and American English
    Randomly changing words from British to American English and vice versa. The
    words are taken from a vocabulary.
 2. Inserting random functional words
    Randomly selected functional words are inserted in the beginning of the sentence.
    The words are taken from a discourse marker vocabulary.


3.5    General Transformations

We also apply some general transformations that would keep the meaning of the text,
but would mask the author style.

 1. Replacing short forms
    We replace short forms such as I’ve, I’d, I’m, I’ll, don’t, etc. with their full forms.
 2. Replacing numbers with words
    We replace the parts of the text, POS-tagged as numbers, with their word represen-
    tation in English.
 3. Replacing equations
    As there were some examples of scientific text in the training corpus, if the text con-
    tains equations, the operations in them are being replaced with words. The equa-
    tions are captured if the text contains both comparing and inner equation symbols:

      ".[<>=]+." and ".[\+\-\*\/]+."

    The following symbols are being replaced if an equation is found: + (plus), - (mi-
    nus), * (multiplied by), / (divided by), = (equals), > (greater than), < (less than),
    <= (less than or equal to), >= (greater than or equal to).
 4. Replace symbols and abbreviations with words
    We replace symbols and abbreviations with their word representations. Such sym-
    bols are currency symbols, % (percent), @ (at), abbreviations of person titles (such
    as Prof., Mr., Dr., etc.).
 5. Simple transformations with regular expressions
    Possessions are replaced with their short forms:

       "(\w+) of (\w+)"            is replaced with           "\2’s \1"

      This will slightly change the stop words rate and also could obfuscate some specific
      manner of writing as the first method of describing possessions is not commonly
      used.
3.6   Experiments with Machine Translation

We also experimented with applying machine translation as described in [4]. We trans-
late from English to two languages (Croatian and Estonian) and then back to English.
Microsoft Translation API is used for the translations.
    The measured results in Section 4 show that the transformations we are applying
work better for most of the metrics and are comparable to changing the metrics for
parts of speech. Manual evaluation of the text obtained with machine translation show
that very often the meaning of the obfuscated text differs from the original text.


4     Evaluation and Discussion

There is no adequate metric that could automatically measure the soundness of the text,
but we can evaluate how much text metrics had changed after the obfuscation process.
    Table 1 shows the average values calculated on the training set and on the books
from the Project Gutenberg corpus that we used. For each measure, the average from
all the documents in the training set is displayed: before and after obfuscation. The
average change rate from all documents is shown, as well as the minimum and the
maximum change rate from the documents in the training set.
    We have compared the results from the transformations described in Section 3
with machine translation. Table 1 shows the results: average, minimum and maximum
change from the custom transformations. Table 2 shows the same measures of the
obfuscation with machine translation. We can see that our transformations work better
than just applying two-way machine translation for changing the measured indicators
of item style. Machine translation has close values for the part of speech ratio and is
better than verb ratio.


                                         Average on Train Set Transformations Change
       Text Metric             Average Before             After      Avg. Min        Max
       Sentence word count            19 18.42            28.43 136.71% 2.04% 800.85%
       Stop words ratio              0.5 0.52              0.45 12.30% 0.63% 28.79%
       Type-token ratio             0.44 0.44              0.47 7.32% 0.49% 22.68%
       Adjective rate               0.06 0.08              0.09 19.46% 0.27% 73.26%
       Adverb rate                0.076 0.07               0.09 28.16% 0.94% 140.00%
       Noun rate                    0.24 0.23              0.24 9.62% 0.88% 32.28%
       Verb rate                    0.19 0.20              0.21 5.26% 0.58% 29.04%
       Punctuation ratio            0.15 0.14              0.14 48.51% 9.26% 157.68%
       Words in all caps ratio      0.02 0.03              0.01 43.43% 0.93% 100.00%
Table 1. Text measures on the training set. Results from obfuscation with transformations.
Shown are the values of the used text measures on the training set. Column Average shows the
calculated average on the training set and the corpus on Project Gutenberg. The next two columns
show the average metrics of the train corpus before and after the obfuscation. The last columns
show the change rate of the transformations.
                                              Translations Change
                       Text Metric            Average Min       Max
                       Sentence word count      4.27% 0.42% 12.14%
                       Stop words ratio         5.54% 0.50% 15.09%
                       Type-token ratio         2.50% 0.23% 5.35%
                       Adjective rate          13.85% 9.25% 19.55%
                       Adverb rate             10.72% 1.85% 27.13%
                       Noun rate                4.38% 0.13% 11.55%
                       Verb rate                7.63% 0.30% 19.63%
                       Punctuation ratio       28.40% 0.58% 66.88%
                       Words in all caps ratio 30.54% 0.00% 107.42%
Table 2. Results from obfuscation with multilanguage translation. The results in the columns
show the average, the minimum, and the maximum change for the corresponding measure.


Participant                        PAN 2013        PAN 2014 EE         PAN 2014 EN         PAN 2015
Mihaylova et al. (our system)        -0.10            -0.13                 -0.16            -0.11
Keswani et al. [12]                  -0.09            -0.11                 -0.12            -0.06
Mansoorizadeh et al. [14]            -0.05            -0.04                 -0.03            -0.04
Table 3. Average performance drops in terms of ‘final scores’ of the authorship verifiers submit-
ted at PAN 2013 to PAN 2015 when run on obfuscated versions of the corresponding test datasets
as per the submitted obfuscators. The smaller the number (i.e., the higher the performance drop),
the better.


    The results from the evaluation of the Author Obfuscation task authors [16] are
shown in Table 4. Our system performs best in terms of fooling the state-of-the-art
systems that participated in the Author Identification tasks in the previous years.
    The metrics that change the most are the average sentence length and the punctua-
tion to word count rate. The metrics whose values changed the least are the rates of the
different parts of speech: nouns, verbs and adjectives and unique words.
    The soundness of the obfuscated texts was checked manually for randomly selected
documents from the corpus. The observations are that after applying the transformations
mentioned above, the result text is close to the meaning of the original. The most use-
ful transformations were word replacement using the paraphrase corpus and WordNet.
Splitting and merging sentences also contributes to changing the author style. Insertion
of random noise - spelling and punctuation mistakes - lowers the quality of the resulting
text, but contributes to changing the measures used for author identification.
    After the submission, we further checked our results for sensibleness and noticed
that some of the transformations were applied too often and resulted in tests of lower
quality. These are replacements of numbers, replacement of words with their definitions
from WordNet, insertion of too many spelling and punctuation errors. Lowering those
transformations improves the quality of the resulting texts.
5     Obfuscation Evaluation
5.1   Evaluation Metrics
For the Obfuscation Evaluation subtask, we provide metrics for safety and soundness.
    The metrics used for safety measure how much each of the metrics has changed in
the obfuscated text compared to the original text. We are giving the metric as follows:
|original_value-obfuscated_value| / original_value
or 0 if the metric value for the original text is 0. The metrics for safety are the difference
in the metrics described in the previous chapters.
     One metric for measuring soundness is proposed. It measures the semantic similar-
ity between the original and the obfuscated text. We measure this for all of the text parts
and we use the average as a metric for the obfuscated document ([13]). The similarity
is a measure for each original-obfuscation pair. The similarity between two text parts is
measured as a weighted sum of their semantic similarity and word order similarity. The
semantic similarity between the two texts is measured as the cosine distance between
their semantic vectors. For calculation of the semantic vectors, the union of all words
in the two texts is taken. The sentence is passed in as a collection of words. The size
of the semantic vector is the same as the size of the joint word set. The elements are
1 if a word in the sentence already exists in the joint word set, or the similarity of the
word to the most similar word in the joint word set if it does not. Word order similarity
computes the word order vector for a sentence. The sentence is passed in as a sequence
of words. The size of the word order vector is the same as the size of the joint word
set. The elements of the word order vector are the position mapping (from the windex
dictionary) of the word in the joint set if the word exists in the sentence. If the word
does not exist in the sentence, then the value of the element is the position of the most
similar word in the sentence. However, our experiments showed that this metric works
too slowly and we did not include it in the final evaluation of the systems.

5.2   Results from Peer Review
Part of our task was to evaluate the results of the systems for all participants who sub-
mitted results for the Author Obfuscation task. This subsection shows our results from
this evaluation.
     We measure the three dimensions specified in the task: safety, soundness and sensi-
bleness.
     Soundness and sensibleness are measured manually. We get random samples of
original-obfuscation pairs and we count the pairs that correspond to the given metric.
     Safety is measured in two ways. The first one is running the system GLAD [8], one
of the best systems for author identification from PAN-2015 [18]. The results from the
original texts were compared with the results for the obfuscation of each participant.The
model was trained on the PAN-2015 corpus for English.
     The second way was by using our approach, which measures the change in the
metrics mentioned in the previous sections, which are commonly used for author iden-
tification: parts of speech ratio, type-token ratio, average sentence length, punctuation
rate, rate of words in all capital letters.
   The submissions were anonymized for the evaluation. The participant team names
were replaced with “Participant A”, “Participant B” and “Participant C”. Our results for
each of the participants is shown below.

 1. Participant A: mansoorizadeh:2016 [14]
      – Safety
        This participant performs worst in terms of safety. The results from the evalu-
        ation with author identification software as well as with custom metrics show
        that they have the least change of the values output by the evaluations.
      – Soundness
        Sound Not Sound Total
          212        13 225
      – Sensibleness
        Sensible Not Sensible Total
            195           30 225
      – Overall thoughts
        Overall the documents are not obfuscated except for several phrases. The phrases
        that are switched often do not carry the same meaning and/or are out of con-
        text. There are occasional grammatical errors. The obfuscation is sound and
        sensible, but one can detect the style of the original author.
 2. Participant B: keswani:2016 [12]
      – Safety
        This participant performs second best in terms of safety.
      – Soundness
        Sound Not Sound Total
            2       200 202
      – Sensibleness
        Sensible Not Sensible Total
               1         201 202
      – Overall thoughts
        The texts are obfuscated, but almost all of them do not make any sense. The
        obfuscation is neither sound, nor sensible.
 3. Participant C: mihaylova:2016 (our system)
      – Safety
        This participant (i.e., our system) performs best in terms of safety. They achieve
        the highest scores from both author obfuscation software and custom metrics.
      – Soundness
        Sound Not Sound Total
          200         2 202
      – Sensibleness
         Sensible Not Sensible Total
             100          100 200

      – Overall thoughts
        The text is safe as the author’s style is masked. Most of text fragments preserve
        the meaning of the original text. Grammatical mistakes and typos are common
        and one could tell that the text is obfuscated. Some phrases are out of context.


5.3   Results for Safety


We show the results from our evaluation of safety.
    Evaluation with author identification software
    We ran one of the best-performing systems of the PAN-2015 Author Identification
task - GLAD [8]. We trained it on the training set from PAN-2015, then we used as
test sets the original texts and the texts obfuscated with each of the given systems for
participants A, B and C.
    We measured how much the prediction changes for each document and then we get
the average, the maximum and the minimum change per participant. The results are
given in Table 5.3.


                                 Participant A Participant B Participant C
                Average change            0.14           0.20          0.24
                Min change                0.00           0.00          0.00
                Max change                0.68           0.82          1.93
Table 4. Results from evaluation with author identification software. Shown are the average,
min and max change for each participant.


    Evaluation with custom metrics
    For this part of the evaluation of safety, we measure how much each of the met-
rics we use has changed for each of the obfuscated documents. Then we compare the
average, min and max change for each participant. Table 5.3 shows the results of this
evaluation. Participant A does not have any metric for which they have the max average
change. Participant B has a max average change for noun, verb, punctuation rate and
unique words rate. Participant C has a max average change for adjectives and adverbs
rates, stop words rate, average sentence length and words in all capital letters.
                                     Participant A Participant B Participant C
           Average adjective rate              0.03           0.15              0.19
                                               0.00           0.00              0.00
                                               0.35           0.96              0.94
            Average adverb rate                0.03           0.12              0.29
                                               0.00           0.00              0.02
                                               0.34           1.18              2.57
            Average noun rate                  0.01           0.25              0.10
                                               0.00           0.07              0.00
                                               0.11           0.59              0.27
            Average verb rate                  0.01           0.20              0.05
                                               0.00           0.00              0.00
                                               0.07           0.45              0.27
            Average punctuation rate           0.01           1.42              0.49
                                               0.00           0.26              0.08
                                               0.04           6.68              1.90
            Average sentence length            0.01           0.87              1.37
                                               0.00           0.26              0.04
                                               0.08           0.99              8.21
            Stop words ratio                   0.02           0.05              0.12
                                               0.00           0.00              0.00
                                               0.09           0.25              0.28
            Unique words ratio                 0.01           0.12              0.07
                                               0.00           0.02              0.00
                                               0.04           0.35              0.23
            Words all capitals ratio           0.02           0.29              0.42
                                               0.00           0.00              0.00
                                               0.43           4.04              1.00
Table 5. Results for evaluation with custom metrics. Each cell shows the average, min and
max change of the corresponding metric and participant. The best results for the average change
are highlighted.
6   Conclusion and Future work
We have described the system of the Sofia University’s mihaylova16 team for the PAN-
2016 Author Obfuscation task. Our main approach is based on measuring popular text
characteristics used for Author Identification and applying transformations aiming to
change those measures for the given text.
     Further development includes adding more features used for author identification.
The existing transformations should be improved in terms of producing more meaning-
ful text.
     The task requirements included splitting the text into smaller parts and applying
obfuscation on those parts, and we have implemented transformations suitable for such
smaller text parts. We would like to try experiment with transformations that could be
applied to the entire text or to paragraphs, which is closer to the way people transform
texts.
     What our approach is lacking is a proper evaluation measure about whether it per-
forms well in terms of soundness; designing one is a challenging but necessary and
enabling research direction.
     Finally, we plan to use the techniques used in this paper for author imitation. One
key difference will be that the goal for the transformations should not be the average
metrics, but the metrics of the author that should be imitated.

Acknowledgments
This research was performed by a team of students from MSc programs in Computer
Science in the Sofia University “St Kliment Ohridski”.
    We thank the Sofia University “St Kliment Ohridski” for the support and guidance
to our team participation at the CLEF 2016 Conference.


References
 1. Afroz, S., Brennan, M., Greenstadt, R.: Detecting hoaxes, frauds, and deception in writing
    style online. In: Proceedings of the 2012 IEEE Symposium on Security and Privacy. pp.
    461–475. SP ’12, IEEE Computer Society, Washington, DC, USA (2012)
 2. Balog, K., Cappellato, L., Ferro, N., Macdonald, C. (eds.): CLEF 2016 Evaluation Labs and
    Workshop – Working Notes Papers, 5-8 September, Évora, Portugal. CEUR Workshop
    Proceedings, CEUR-WS.org (2016)
 3. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media
    (2009)
 4. Brennan, M., Afroz, S., Greenstadt, R.: Adversarial stylometry: Circumventing authorship
    recognition to preserve privacy and anonymity. ACM Trans. Inf. Syst. Secur. 15(3),
    12:1–12:22 (Nov 2012)
 5. Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books (1998)
 6. Ganitkevitch, J., Van Durme, B., Callison-Burch, C.: PPDB: The paraphrase database. In:
    Proceedings of NAACL-HLT. pp. 758–764. Atlanta, Georgia (June 2013)
 7. Hart, M.: Project gutenberg. Project Gutenberg (1971)
 8. Hürlimann and Benno Weck and Esther van den Berg and Simon Suster and Malvina
    Nissim, M.: Glad: Groningen lightweight authorship detection. In: CLEF (2015)
 9. Juola, P.: Detecting stylistic deception. In: Proceedings of the Workshop on Computational
    Approaches to Deception Detection. pp. 91–96. Avignon, France (2012)
10. Juola, P., Vescovi, D.: Advances in Digital Forensics VII: 7th IFIP WG 11.9 International
    Conference on Digital Forensic, chap. Analyzing Stylometric Approaches to Author
    Obfuscation, pp. 115–125. Orlando, FL, USA (2011)
11. Kacmarcik, G., Gamon, M.: Obfuscating document stylometry to preserve author
    anonymity. In: Proceedings of COLING/ACL: Poster Sessions. pp. 444–451. Sydney,
    Australia (2006)
12. Keswani, Y., Trivedi, H., Mehta, P., Majumder, P.: Author Masking through
    Translation—Notebook for PAN at CLEF 2016. In: Balog et al. [2]
13. Li, Y., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence similarity based on
    semantic nets and corpus statistics. IEEE Trans. on Knowl. and Data Eng. 18(8),
    1138–1150 (2006)
14. Mansoorizadeh, M., Rahgooy, T., Aminiyan, M., Eskandari, M.: Author Obfuscation using
    WordNet and Language Models—Notebook for PAN at CLEF 2016. In: Balog et al. [2]
15. Miller, G.A.: WordNet: A lexical database for English. Commun. ACM 38(11), 39–41
    (1995)
16. Potthast, M., Hagen, M., Stein, B.: Author Obfuscation: Attacking State-of-the-Art
    Authorship Verification Approaches. In: Working Notes Papers of the CLEF 2016
    Evaluation Labs. CLEF and CEUR-WS.org (2016)
17. Quirk, C., Brockett, C., Dolan, W.: Monolingual machine translation for paraphrase
    generation. In: Proceedings of EMNLP 2004. pp. 142–149. Barcelona, Spain (2004)
18. Stamatatos, E., Daelemans, W., Verhoeven, B., Juola, P., López-López, A., Potthast, M.,
    Stein, B.: Overview of the author identification task at PAN 2015. In: CLEF (2015)
A
    Appendix 1 - Project Gutenberg books
– The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle
– History of the United States by Charles A. Beard and Mary R. Beard
– Manual of Surgery Volume First: General Surgery by Alexis Thomson and Alexan-
  der Miles. Sixth Edition.
– War and Peace, by Leo Tolstoy