<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluating Safety, Soundness and Sensibleness of Obfuscation Systems</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computer Science Heinrich Heine University Düsseldorf D-40225 Düsseldorf</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Matthias Liebeck</institution>
          ,
          <addr-line>Pashutan Modaresi, and Stefan Conrad</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>Author masking is the task of paraphrasing a document so that its writing style no longer matches that of its original author. This task was introduced as part of the 2016 PAN Lab on Digital Text Forensics, for which a total of three research teams submitted their results. This work describes our methodology to evaluate the submitted obfuscation systems based on their safety, soundness and sensibleness. For the first two dimensions, we introduce automatic evaluation measures and for sensibleness we report our manual evaluation results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Author masking is the task of paraphrasing a document so that its writing style no longer
matches that of its original author. Due to the advances in fields such as authorship
attribution and author verification, it is not clear whether authors (particularly in the
age of the Internet and social media) can assure their anonymity anymore [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. While
in some scenarios, such as verifying the authorship of disputed novels or revealing the
author of harassing messages in social media [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], author unmasking might be useful,
there are situations where authors have the right to protect their privacy, among them
the desire to avoid retribution from an employer or government agency [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        The task of author masking was introduced as part of the 2016 PAN Lab on Digital
Text Forensics [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], for which a total of three research teams, namely Mansourizade et
al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Keswani et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and Mihalvoya et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] (called Team A, B and C
respectively in the rest of this work) submitted their results. The evaluation was completely
anonymous and the identities of the teams were revealed after the submission of our
evaluation results.
      </p>
      <p>
        Together with the task of author masking, obfuscation evaluation has been
introduced as another task to evaluate the performance of the author masking submissions.
Three dimensions have been defined by the task organizers for the performance
evaluation of the obfuscation systems: safety to ensure that a forensic analysis does not reveal
the original author of an obfuscated text; soundness to evaluate if the obfuscated texts
are textually entailed with their originals; and sensibleness to ensure that the obfuscated
texts are inconspicuous [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>In this work, we describe our methodology to evaluate the performance of the
submitted systems based on the aforementioned dimensions. In section 2 we define the
problem of author masking more concretely and describe the provided training data.
The evaluation results of the dimensions safety, soundness and sensibleness are reported
in sections 3, 4 and 5, respectively. Finally, we conclude our work in section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Problem Definition</title>
      <p>Given a document, an author masking software has to paraphrase it so that its writing
style no longer matches that of its original author. Although the organizers of the author
masking task do not directly define this task as a supervised machine learning problem,
a training set is provided so that the participant can evaluate their designed algorithms
based on this dataset. The same dataset is also used as the test dataset for the final
evaluation.</p>
      <p>
        The provided dataset is a collection of 205 problems selected from author
verification tasks from PAN2013 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], PAN2014 [22] and PAN2015 [21]. Each problem
is a collection of at most five known documents (written by the same author) and a
questioned document. Normally in author verification problems, the author of the
questioned document is unknown and the task of an author verifier is to figure out whether
the questioned document has the same author as the known documents or not. But in
the training dataset of the author masking task, all problems are selected from
positive instances, meaning all questioned documents have the same author as the known
documents. The language of all provided problems is English.
      </p>
      <p>The participants were asked to develop a software that outputs a detailed list, how
each piece of the original text has been paraphrased. For a detailed description of the
desired system output, the reader is referred to the official task page1.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Safety</title>
      <p>
        An obfuscation software is called safe, if a forensic analysis does not reveal the
original author of the obfuscated texts. We evaluate the safety of the obfuscation software
using an automatic author verifier called GLAD [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The idea behind this automatic
evaluation measure is that if an obfuscation system successfully masks the authors of
the questioned documents in the training set (remember that all problems in the training
set belong to the positive class), the author verifier will classify the problems as
negative (meaning that the obfuscated document no longer has the same author as the other
documents).
      </p>
      <p>
        The GLAD algorithm was one of the top ranked systems at PAN2015 and treats
the author verification problem as an intrinsic binary classification machine learning
task. GLAD uses SVM [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] as a learning algorithm and makes use of simple feature
classes such as N-Grams, tokens, sentences, visual, compression, entropy and syntactic
features [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
1 http://pan.webis.de/clef16/pan16-web/author-obfuscation.html
      </p>
      <p>To train the GLAD algorithm we used the English problems from the training
datasets from PAN2013 to PAN2015. The statistics of the training dataset used are
shown in Table 1.</p>
      <p>Notice that the training dataset from PAN2014 consisted of novels and essays and
we took both categories to train our model.</p>
      <p>
        Using the trained model, we measure the performance of the GLAD algorithm once
using the original documents from the training set of the author masking problem (labels
are all considered to be true), and once on the submissions of each team (labels are
all considered to be true). As the evaluation measure we use c@1 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], as defined in
Equation 1. The c@1 measure is a modified version of accuracy, where c@1 rewards
approaches that maintain the same number of correct answers and decrease the number
of incorrect answers by leaving some problems unanswered.
      </p>
      <p>c@1 = 1 nc + nunc (1)</p>
      <p>n n</p>
      <p>In the definition of c@1 measure, n is the number of problems, nc is the number of
correct answers and nu represents the number of unanswered problems.</p>
      <p>Table 2 summarizes the evaluation results of the safety dimension. On the original
dataset (the training dataset of author masking), the GLAD algorithm has a c@1 score of
0.656, meaning that the algorithm correctly classifies roughly 65% of the problems as
positive. Notice that the gold standard labels are all set to be true and that teams having
lower c@1 scores are more successful at the masking task than the other teams. It is also
observable that none of the participants has a c@1 score below 0.5. This means that the
portion of successful obfuscations for all submissions is below 50%. At the same time
it can be seen that all submissions have a c@1 score below the baseline 0.656, meaning
that all teams were partly successful at the author masking task.</p>
      <p>
        Although in previous PAN competitions, AUC (Area Under the Curve) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] was also
used to evaluate the automatic verifiers, the use of this measure was not possible in our
scenario as the test datasets contain either only positive or only negative instances.
      </p>
      <p>Another interesting analysis is to investigate the relation between true positives and
false negatives. The idea behind this analysis is to figure out the portion of documents
classified as positive before obfuscation, and the ones classified as negative after
obfuscation. For this we select true positives from the original dataset and count the ones
that have been classified as negative by the GLAD algorithm. Table 3 summarizes the
results.</p>
      <p>Notice that higher values in Table 3 are preferred. Team C has the highest score
among the teams and has managed to obfuscate roughly 30% of the true positive
problems to false negative ones. These results are consistent with the results shown in Table
2.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Soundness</title>
      <p>We assume that the goal of author masking is to reword a text segment into a
paraphrased one while retaining as much semantic similarity as possible. Therefore, we
propose to quantify soundness by measuring the semantic textual similarity (STS)
between the original text segment and its corresponding obfuscation.</p>
      <p>
        The prediction of semantic textual similarity has been a recurring task in SemEval
challenges since 2012 [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1–4</xref>
        ]. The aim of the STS task is to determine the semantic
similarity of two sentences in the continuous interval [0; 5] where 0 represents a complete
dissimilarity and 5 denotes a complete semantic equivalence between the sentences. The
task organizers provide sentence pairs with gold standards from different categories.
The task is evaluated by calculating the Pearson correlation between the predicted
values and a crowdsourced gold standard.
      </p>
      <p>
        In this paper, we use the unsupervised semantic similarity approach called Overlap
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to automatically determine the semantic similarity between the original segment
and its paraphrase. There are two advantages of using an unsupervised approach: (i)
human annotators can only annotate a subset of the paraphrases within a reasonable
amount of time. An automatic approach can evaluate all original-paraphrase pairs and
(ii) we do not need labeled training data as compared to a supervised approach.
      </p>
      <p>
        The idea of the Overlap method is simple since it measures the overlap between
the tokens in the original segment s1 and the tokens in the paraphrase s2 by aligning
tokens to the best match in the other text segment. The authors first define a similarity
function for two tokens which uses synsets from WordNet [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and word embeddings
from word2vec [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], as denoted in Equation 2.
      </p>
      <p>sim(t1; t2) :=
&gt;81
&gt;
&gt;&gt;&gt;1
&gt;
&lt;
if t1.lemma == t2.lemma
if t1 and t2 have the same most common synset
0:5 if t1 and t2 share any other synset
&gt;&gt;&gt;cos(t1; t2) if t1 and t2 have word2vec embeddings
&gt;
&gt;&gt;:0.15 otherwise
(2)</p>
      <p>Afterwards, the similarity score between two text segments in [0; 5] is defined as
follows:
0 P max sim(t1; t2)
t12s1 t22s2
2 js1j</p>
      <p>P max sim(t2; t1) 1
+ t22s2 t12s1
2 js2j A
(3)</p>
      <p>Since we assume the obfuscations to be semantically as close as possible to the
originals, the STS score between both segments should be 5. We predict the semantic
similarity for all pairs for each team. Afterwards, we average over the predicted scores
for each team. Table 4 summarizes the results for the soundness dimension.</p>
      <p>
        For the soundness dimension, the best semantic paraphrases were created by team A
with an average STS score of 4.87. This is not surprising since team A only substituted
a few words and often kept the original segment as a paraphrase. Therefore, the
paraphrases are semantically very close or even identical to the original. Team C achieved a
mean STS score of 4.48 and team B had the lowest score with 4.04. Since the Overlap
approach from [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is independent of the word order, the results of team B cannot be
explained by changing the word order of the phrases. One factor that definitely
influenced the semantic similarity is the appearance of German words in the paraphrases,
which cannot be matched to the English tokens in the original texts.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Sensibleness</title>
      <p>The dimension sensibleness describes the language quality of the obfuscations and
whether it allows us to understand them. An author masking software might mask the
author of a text at the cost of its comprehension. Therefore, it is also crucial to evaluate
the quality of the produced obfuscations.</p>
      <p>We observed that teams A and C used dictionaries to perform simple substitutions
and team B usually changed the order of phrases. It is surprising to see that the
paraphrases by team B sometimes contain random German words, as in the following
example: “it is difficult to across, Once the Mitbürgers unschön is faint, odor street, on
the village so massed mold Verfalls and centuries.”</p>
      <p>Although there are approaches to automatically predict the grammatical quality of
text, we chose to manually evaluate the sensibleness because portions of the text have
a low language quality but still allow for a limited understanding of the content. For
example, this can be compared to a non-native speaker who asks in an online forum a
question that is poorly worded but still comprehensible.</p>
      <p>After a manual inspection of a subset of the paraphrases from all three teams, we
decided to annotate each pair with a score s 2 f0; 1; 2g to measure the language quality.
We then drew a small sample and discussed annotation guidelines. Our three labels and
their definitions are described in Table 5.</p>
      <p>In our evaluation, sensibleness is only evaluated by looking at the obfuscated text.
This is due to the fact that only the paraphrased text after author masking is used in a
real world scenario. Therefore, it is reasonable to only evaluate the output of the system.
We ignore spacing and line breaks during the annotation process. Furthermore, we also
ignore the substitutions of the words “oof ” and “tto” from team C because they do not
impact the understanding of the text.</p>
      <p>We randomly drew a subset of 20 problems. For each team, we then drew three
obfuscations per problem. All of these obfuscations were manually annotated by three
annotators. In order to report a single value per team, we averaged all the scores from
the annotators. Table 6 summarizes the results for the sensibleness dimension.</p>
      <p>Team A achieved the best results in the sensibleness dimension with an average
score close to 2. The paraphrases from team B allow for the lowest understanding of all
three teams with an average score of 0.57 which is between partially comprehensible
and incomprehensible.</p>
      <p>We should note that there are at least two problems for the evaluation of the
sensibleness dimension: (i) it is difficult to formalize language quality and understanding and
(ii) the sensibleness dimension is subjective. Although we observed a high agreement
on the category incomprehensible, we had a lower agreement on whether a paraphrase
is fully or partially comprehensible. This is plausible since one annotator might
perfectly understand a text segment while another annotator may have some troubles with
it.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this work, we discussed our methodology to evaluate the performance of the
obfuscation systems submitted to the PAN2016 Author Masking shared task. More concretely,
submissions were evaluated based on their safety (Section 3), soundness (Section 4),
and sensibleness (Section 5). The scripts for our evaluation are available on GitHub2.</p>
      <p>An automatic author verifier was used to measure the safety of the submissions. The
ranking of the teams in terms of safety is as follows: team C, B, and A</p>
      <p>We proposed to quantify soundness by automatically measuring the semantic text
similarity between the original text fragments and their obfuscations. The best score
was achieved by team A, followed by teams C and B.</p>
      <p>Unlike the first two dimensions, the sensibleness of the submissions was evaluated
manually. As sensibleness is subjective and difficult to formally define, we consider its
measurement a nontrivial task. Regarding sensibleness, teams A, C and B were ranked
first, second, and third, respectively.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was partially funded by the PhD program Online Participation, supported by
the North Rhine-Westphalian funding scheme Fortschrittskollegs and by the German
Federal Ministry of Economics and Technology under the ZIM program (Grant No.
KF2846504). We would like to thank Daniel Braun for his help in the evaluation of the
sensibleness dimension.
2 https://github.com/pasmod/obfuscation
21. Stamatatos, E., amd Ben Verhoeven, W.D., Juola, P., López-López, A., Potthast, M., Stein,
B.: Overview of the Author Identification Task at PAN 2015. In: CLEF 2015 Evaluation Labs
and Workshop – Working Notes Papers. CEUR-WS.org (2015)
22. Stamatatos, E., Daelemans, W., Verhoeven, B., Potthast, M., Stein, B., Juola, P.,
SanchezPerez, M., Barrón-Cedeño, A.: Overview of the Author Identification Task at PAN 2014. In:
CLEF 2014 Evaluation Labs and Workshop – Working Notes Papers. CEUR-WS.org (Sep
2014)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banea</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardie</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , LopezGazpio, I.,
          <string-name>
            <surname>Maritxalar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mihalcea</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigau</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uria</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiebe</surname>
            , J.: SemEval-2015 Task 2:
            <given-names>Semantic</given-names>
          </string-name>
          <string-name>
            <surname>Textual</surname>
            <given-names>Similarity</given-names>
          </string-name>
          , English, Spanish and Pilot on Interpretability.
          <source>In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2015</year>
          ). pp.
          <fpage>252</fpage>
          -
          <lpage>263</lpage>
          . Association for Computational Linguistics (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banea</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardie</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mihalcea</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigau</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiebe</surname>
          </string-name>
          , J.: SemEval-2014
          <source>Task</source>
          <volume>10</volume>
          :
          <article-title>Multilingual Semantic Textual Similarity</article-title>
          .
          <source>In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2014</year>
          ). pp.
          <fpage>81</fpage>
          -
          <lpage>91</lpage>
          .
          <article-title>Association for Computational Linguistics</article-title>
          and Dublin City University (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity</article-title>
          .
          <source>In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval</source>
          <year>2012</year>
          ). pp.
          <fpage>385</fpage>
          -
          <lpage>393</lpage>
          . Association for Computational Linguistics (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
          </string-name>
          , W.: *
          <article-title>SEM 2013 shared task: Semantic Textual Similarity</article-title>
          .
          <source>In: Second Joint Conference on Lexical and Computational Semantics (*SEM)</source>
          , Volume
          <volume>1</volume>
          :
          <source>Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity</source>
          . pp.
          <fpage>32</fpage>
          -
          <lpage>43</lpage>
          . Association for Computational Linguistics (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Balog</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macdonald</surname>
          </string-name>
          , C. (eds.):
          <article-title>CLEF 2016 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers. CEUR Workshop Proceedings, CEUR-WS.org (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bradley</surname>
            ,
            <given-names>A.P.:</given-names>
          </string-name>
          <article-title>The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms</article-title>
          .
          <source>Pattern Recogn</source>
          .
          <volume>30</volume>
          (
          <issue>7</issue>
          ),
          <fpage>1145</fpage>
          -
          <lpage>1159</lpage>
          (
          <year>Jul 1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Support-vector networks</article-title>
          .
          <source>Machine Learning</source>
          <volume>20</volume>
          (
          <issue>3</issue>
          ),
          <fpage>273</fpage>
          -
          <lpage>297</lpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hürlimann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weck</surname>
          </string-name>
          , B., van den Berg, E.,
          <string-name>
            <surname>Suster</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissim</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>GLAD: Groningen Lightweight Authorship Detection</article-title>
          . In: Working Notes of CLEF 2015 -
          <article-title>Conference and Labs of the Evaluation forum (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Juola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
          </string-name>
          , E.:
          <article-title>Overview of the Author Identification Task at PAN 2013</article-title>
          .
          <article-title>In: CLEF 2013 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kacmarcik</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gamon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Obfuscating Document Stylometry to Preserve Author Anonymity</article-title>
          .
          <source>In: Proceedings of the COLING/ACL on Main Conference Poster Sessions</source>
          . pp.
          <fpage>444</fpage>
          -
          <lpage>451</lpage>
          . COLING-ACL '
          <fpage>06</fpage>
          ,
          <string-name>
            <surname>Association for Computational Linguistics</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Keswani</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trivedi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Author Masking through TranslationNotebook for PAN at CLEF 2016</article-title>
          .
          <article-title>In: CLEF 2016 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers.
          <article-title>CEUR-WS.org (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Liebeck</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pollack</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Modaresi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conrad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : HHU at SemEval
          <article-title>-2016 Task 1: Multiple Approaches to Measuring Semantic Textual Similarity</article-title>
          .
          <source>In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)</source>
          . pp.
          <fpage>607</fpage>
          -
          <lpage>613</lpage>
          . Association for Computational Linguistics (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mansourizade</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahgooy</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aminiyan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eskandari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Author Obfuscation using WordNet and Language Models-Notebook for PAN at CLEF 2016</article-title>
          .
          <article-title>In: CLEF 2016 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers.
          <article-title>CEUR-WS.org (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mihaylova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karadjov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiprov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Georgiev</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koychev</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>SU@PAN'2016: Author Obfuscation-Notebook for PAN at CLEF 2016</article-title>
          .
          <article-title>In: CLEF 2016 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers.
          <article-title>CEUR-WS.org (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Efficient Estimation of Word Representations in Vector Space</article-title>
          .
          <source>ICLR Workshop</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          :
          <article-title>WordNet: A Lexical Database for English</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>38</volume>
          (
          <issue>11</issue>
          ),
          <fpage>39</fpage>
          -
          <lpage>41</lpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Peñas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A Simple Measure to Assess Non-response</article-title>
          .
          <source>In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1</source>
          . pp.
          <fpage>1415</fpage>
          -
          <lpage>1424</lpage>
          . HLT '
          <volume>11</volume>
          ,
          <string-name>
            <surname>Association for Computational Linguistics</surname>
          </string-name>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Author Obfuscation: Attacking the State of the Art in Authorship Verification</article-title>
          .
          <source>In: Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org (Sep</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Celli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Overview of the 3rd Author Profiling Task at PAN 2015</article-title>
          .
          <article-title>In: CLEF 2015 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers.
          <article-title>CEUR-WS.org (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rohatgi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <source>Can Pseudonymity Really Guarantee Privacy? In: Proceedings of the 9th USENIX Security Symposium</source>
          . pp.
          <fpage>85</fpage>
          -
          <lpage>96</lpage>
          . USENIX (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>