<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Author Masking using Sequence-to-Sequence Models</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Antiplagiat CJSC</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia;</country>
          <institution>Moscow Institute of Physics and Technology (MIPT)</institution>
          ,
          <addr-line>Moscow, Russia Antiplagiat CJSC , Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>The paper describes the approach adopted for Author Masking Task at PAN 2017. For the purpose of masking the original author, we use the combination of methods based either on deep learning approach or traditional methods of obfuscation. We obtain sample of obfuscated sentences from original one and choose best of them using language model. We try to change both the content and length of original sentence preserving its meaning.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The related tasks that were proposed at PAN 2017 are author identification [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and
author profiling [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The evaluation of all the tasks is conducted using TIRA [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], a
service for data analysis tasks evaluation.
      </p>
      <p>
        On PAN’16 conference [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] in "Author Obfuscation" task participants proposed
three different ways for author masking. The first approach consists of translation text
from the source language (English) into an intermediate language before it gets
eventually translated back to English [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The main advantage of this method is a strong
modification of the original text, the main disadvantages — a vast amount of
untranslated words and weak semantic coherence of the resulted text. The second approach
used in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is to synonymize the most frequent words of original text. This approach
keeps the original meaning of the text in most of cases, but gives a small amount of
modifications of the original text. The third approach combines strong context
modification with preserving the original sense [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This algorithm is based on different types
of text obfuscation and gave the best result by the metrics used in the contest.
      </p>
      <p>
        Statistical and context features are used in modern detecting authorship approaches,
for example in GLAD [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In our solution we try to obfuscate both of them. We use
both traditional methods for author masking, such as synonimizing and splitting/joining
sentences and obtain some modern methods based on recurrent neural networks. Using
deep neural networks we took into account the papers [
        <xref ref-type="bibr" rid="ref13 ref17 ref19 ref20 ref22 ref4">13,19,22,4,17,20</xref>
        ] on the use of
recurrent neural networks in paraphrase generation and detection. We use LSTM-based
model [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] in Encoder-Decoder fashion.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Proposed Approach</title>
      <p>Our approach is based on per-sentence obfuscation. At the first step we split text into
sentences. After that we try to paraphrase sentences using methods described below.
We paraphrase each sentence until Jaccard similarity score between set of tokens from
an original soriginal sentence and an obfuscated sobfuscated sentence is less than threshold
or unless we tested all the obfuscation methods for the original sentence:
J (soriginal; sobfuscated) = jsoriginal \ sobfuscatedj
jsoriginal [ sobfuscatedj
:
(1)</p>
      <p>All of the described obfuscation methods works with one or two sentences. Priority
of using obfuscation methods is based on statistics of its previous successful appliance
— we try to make the distribution of methods usage close to uniform since different
methods of obfuscation can mask different style features of the original text. Therefore
infrequently used approaches apply first for new sentences.</p>
      <p>The methods we use to obfuscate sentences can be divided into 2 groups:
1. Methods that change the content of the sentences, trying to save the sense.
2. Methods that change the structure and length of the sentences.
2.1</p>
      <sec id="sec-2-1">
        <title>Changing the Structure and Length of the Original Text</title>
        <p>We use different types of changing sentences length. As a part of preprocessing, we
replace short forms for long ones: words ended with ’ll, ’ve, ’m, etc. replaces with their
long forms — will, have, am, etc.</p>
        <p>Our main approach of changing text length is to split and join sentences. As a trigger
of splitting we use rather simple heuristic: we try to split sentences by coordinating
(and, but) and subordinating (because, since, so, therefore) conjunctions. As a method
of joining sentences we use the following rule: we can join sentences using the same
conjunctions if both sentence have rather small length, we use range between 30 and
150 chars for this constraint.</p>
        <p>The third method we used is an adjustment or removal introductory phrases from
sentences. We use only general meaning phrases such as it is important to note that,
anyway, in fact, also, etc.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Changing Content of the Original Text</title>
        <p>We use two methods of changing content of the sentence.</p>
        <p>Synonym replacing. First method is based on traditional synonimizing idea, where
some words of the input sentence are replaced by their synonyms. However, instead
of using existing dictionaries or ontologies we use word embedding as a source of
synonimizing. We generate subsample set of k different combinations from nearest words
lists and take best of generated sentence by the language model score.</p>
        <p>Let (w1; : : : ; wn) be a sequence of word embeddings from the sentence. For
each word wi except stopwords we take k nearest words by cosine similarity: vi =
(wi01; : : : ; wi0k). We generate s sentences s1; : : : ; ss sampling from vi words instead of
original word wi. After that we find the sampled sentence with the maximal language
model score:
sobfuscated = arg max LM(s);</p>
        <p>
          s2fs1;:::;ssg
where LM is a logarithm of language model probability [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          For our experiments we used k = 5 and s = 100. The language model was trained
on 3-grams from Shakespeare’s Sonnets corpus from Project Gutenberg [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. In our
opinion the original author style will be masked because this procedure gives best scores for
sentences, nearest to Shakespeare style. We did not use language model of higher order
because of small size of the corpus.
        </p>
        <p>Encoder-Decoder approach. Another method is based on LSTM recurrent neural
network. The basic LSTM model can be described with the following equations:
it = ( xixt +</p>
        <p>hiht 1 + bi);
ft = ( xf xt +</p>
        <p>hf ht 1 + bf );
ot = ( xoxt +</p>
        <p>hoht 1 + bo);
cin = tanh( xcxt +</p>
        <p>hcht 1 + bc);
ct = ft ct 1 + it cin;</p>
        <p>ht = ot tanh(ct);
g(ht 1; wt 1; ct 1) = ht:</p>
        <p>
          We train our model in Encoder-Decoder way [
          <xref ref-type="bibr" rid="ref19 ref20">20,19</xref>
          ] with modification of LSTM
described in [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]: we decompose our model into Encoder model and Decoder model.
        </p>
        <p>Encoder recursively combines the sequence of word embeddings w1; : : : ; wn into
a fixed-length vector hn 1:</p>
        <p>ht = ge(hte 1; wt 1; cte 1);
where ge is a stack of LSTM functions, hte 1 is a hidden state, cte 1 is a cell state vector.</p>
        <p>Decoder tries to reproduce the input sequence w1; : : : ; wn by hidden vector
sequence hen 1; : : : ; he1 and vector c:
w^ t = fd(htd 1; w^ t 1; ctd 1; c);
(2)
(3)
(4)
where gd is a stack of LSTM functions, htd 1 is a hidden state, ctd 1 is a cell state vector,
c is a cell state vector from the last step of the encoder.</p>
        <p>Encoder and Decoder models are jointly trained in order to minimize reconstruction
error:
n
X
i=1
jjwi</p>
        <p>w^ ijj2:</p>
        <p>For the end of sentence determination we added “End of sentence” token to our
embedding model so that in general the length of our original sentence soriginal and
the obfuscated sentence sobfuscated may differ. Further we use reproduced sequence
w^ 1; : : : ; w^ no the same way as we use in our synonym replacing approach (2), where
no is the number of tokens before “End of sentence” token.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Evaluation</title>
        <p>
          We considered two automatic metrics for evaluating final obfuscation. For the
sensibleness evaluation we used average language model score from KenLM language
model [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The language model from our obfuscation method differs from the model
we use for evaluation: whenever we used model trained on Shakespeare corpus for
obfuscation, the model for the evaluation was trained on Wikipedia corpus. Therefore
despite the fact we tried to mask the original author style using Shakespeare style, during
the evaluation step we considered how the obfuscated text fitted into common English
language.
        </p>
        <p>
          For the safety evaluation we used the similar method as described in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]: we
measured how much the prediction from GLAD [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] author verification system changed. We
used random forest classifier in GLAD.
        </p>
        <p>We did not consider any automatic metric for the soundness and used peer review.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiment Details and Results</title>
      <p>
        On preprocessing step we used the NLTK toolbox [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to extract separate sentences
from the original text. We used FastText library [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for word embedding. Our model
was trained on the latest dump of Wikipedia corpus, with word vector dimension equal
to 300. For the recurrent neural network training we used Seq2Seq library1 also trained
on Wikipedia corpus. Based on peer review we set = 0:75 in (1). We used 2-layer
LSTM as it showed better results than 1-layer model.
      </p>
      <p>Our average language model score for sensibleness was 99:4 61:9 whenever the
score for the original sentences was 79:4 55:8. As we can see, the scores are rather
close since the means of distributions lie in the range of the standard deviations of each
other.</p>
      <p>The average change in GLAD probabilities is 0:11 0:22. The number of
correctly verified texts was lowered after obfuscation from 189 to 153. We observe that
our obfuscation method works successfully and lowers the verification probabilities for
the obfuscated texts.
1 https://github.com/farizrahman4u/seq2seq</p>
      <p>An example of our obfuscation method is listed in table 3. As we can see, the
obfuscated sentences obtained by Encoder-Decoder can lead to some grammatical errors.
However, the significant part of the sentences we viewed was grammatically correct.
The other interesting feature of the sentences with synonym replacement and
EncoderDecoder method is an appearance of word “scabbard” in obfuscated sentences. We
consider it is a result of using Shakespeare corpus in the final sentence scoring (2).
The paper describes our system for the PAN 2017 Author Masking Task. Our main
approach based on using recurrent neural networks for text obfuscation. Also we use
more traditional methods of obfuscation, such as synonimizing and changing statistical
text features. We used language model for selection best masking result.</p>
      <p>Further development includes improving obfuscation quality of seq2seq model by
tuning its parameters and taking into consideration many other heuristics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Project</given-names>
            <surname>Gutenberg</surname>
          </string-name>
          . http://www.gutenberg.org/wiki/Main_Page, http://www.gutenberg.org/wiki/Main_Page
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
          </string-name>
          , E.:
          <article-title>Natural language processing with Python: analyzing text with the natural language toolkit. "</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          ,
          <source>Inc."</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>arXiv preprint arXiv:1607.04606</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vilnis</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jozefowicz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Generating sentences from a continuous space</article-title>
          .
          <source>arXiv preprint arXiv:1511.06349</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          : In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Mandl</surname>
          </string-name>
          , T. (eds.)
          <source>Working Notes Papers of the CLEF 2017 Evaluation Labs</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Heafield</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>KenLM: faster and smaller language model queries</article-title>
          .
          <source>In: Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine Translation</source>
          . pp.
          <fpage>187</fpage>
          -
          <lpage>197</lpage>
          . Edinburgh, Scotland, United
          <string-name>
            <surname>Kingdom</surname>
          </string-name>
          (
          <year>July 2011</year>
          ), https://kheafield.com/papers/avenue/kenlm.pdf
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hürlimann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weck</surname>
          </string-name>
          , B., van den Berg, E.,
          <string-name>
            <surname>Šuster</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissim</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>GLAD: Groningen Lightweight Authorship Detection-Notebook for PAN at CLEF 2015</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          , San Juan, E. (eds.)
          <article-title>CLEF 2015 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers,
          <fpage>8</fpage>
          -
          <lpage>11</lpage>
          September, Toulouse, France.
          <source>CEUR-WS.org (Sep</source>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>G.J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lawless</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.):
          <article-title>Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction - 8th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2017</year>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          , Proceedings
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Keswani</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trivedi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Author Masking through Translation-Notebook for PAN at CLEF 2016</article-title>
          . In: Balog,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Macdonald</surname>
          </string-name>
          , C. (eds.)
          <article-title>CLEF 2016 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers,
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          September, Évora, Portugal.
          <source>CEUR-WS.org (Sep</source>
          <year>2016</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1609</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Koehn</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>Statistical Machine Translation</article-title>
          . Cambridge University Press, New York, NY, USA, 1st edn. (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mansoorizadeh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahgooy</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aminiyan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eskandari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Author Obfuscation using WordNet and Language Models-Notebook for PAN at CLEF 2016</article-title>
          . In: Balog,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Macdonald</surname>
          </string-name>
          , C. (eds.)
          <article-title>CLEF 2016 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers,
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          September, Évora, Portugal.
          <source>CEUR-WS.org (Sep</source>
          <year>2016</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1609</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mihaylova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karadjov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiprov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Georgiev</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koychev</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>SU@PAN'2016: Author Obfuscation-Notebook for PAN at CLEF 2016</article-title>
          . In: Balog,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Macdonald</surname>
          </string-name>
          , C. (eds.)
          <article-title>CLEF 2016 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers,
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          September, Évora, Portugal.
          <source>CEUR-WS.org (Sep</source>
          <year>2016</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1609</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mueller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thyagarajan</surname>
            ,
            <given-names>A.:</given-names>
          </string-name>
          <article-title>Siamese recurrent architectures for learning sentence similarity</article-title>
          .
          <source>In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence</source>
          . pp.
          <fpage>2786</fpage>
          -
          <lpage>2792</lpage>
          . AAAI Press (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Improving the Reproducibility of PAN's Shared Tasks: Plagiarism Detection, Author Identification, and Author Profiling</article-title>
          . In: Kanoulas,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Sanderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Toms</surname>
          </string-name>
          , E. (eds.)
          <article-title>Information Access Evaluation meets Multilinguality, Multimodality, and Visualization</article-title>
          .
          <source>5th International Conference of the CLEF Initiative (CLEF 14)</source>
          . pp.
          <fpage>268</fpage>
          -
          <lpage>299</lpage>
          . Springer, Berlin Heidelberg New York (
          <year>Sep 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Author Obfuscation: Attacking the State of the Art in Authorship Verification</article-title>
          .
          <source>In: Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org (Sep</source>
          <year>2016</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1609</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          : Overview of PAN'17:
          <string-name>
            <surname>Author</surname>
            <given-names>Identification</given-names>
          </string-name>
          , Author Profiling, and
          <string-name>
            <given-names>Author</given-names>
            <surname>Obfuscation</surname>
          </string-name>
          . In: Jones,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Lawless</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <string-name>
            <surname>Experimental IR Meets Multilinguality</surname>
          </string-name>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          .
          <source>8th International Conference of the CLEF Initiative (CLEF 17)</source>
          . Springer, Berlin Heidelberg New York (
          <year>Sep 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Prakash</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datla</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qadir</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farri</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Neural paraphrase generation with stacked residual lstm networks</article-title>
          .
          <source>arXiv preprint arXiv:1610.03098</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          : In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Mandl</surname>
          </string-name>
          , T. (eds.)
          <source>Working Notes Papers of the CLEF 2017 Evaluation Labs</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>E.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pennin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          :
          <article-title>Dynamic pooling and unfolding recursive autoencoders for paraphrase detection</article-title>
          . In:
          <string-name>
            <surname>Shawe-Taylor</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Zemel</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bartlett</surname>
            ,
            <given-names>P.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weinberger</surname>
            ,
            <given-names>K.Q</given-names>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>24</volume>
          , pp.
          <fpage>801</fpage>
          -
          <lpage>809</lpage>
          . Curran Associates, Inc. (
          <year>2011</year>
          ), http://papers.nips.cc/paper/4204-dynamic
          <article-title>-pooling-and-unfolding-recursive-autoencodersfor-paraphrase-detection</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          :
          <article-title>Sequence to sequence learning with neural networks</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <fpage>3104</fpage>
          -
          <lpage>3112</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhoeven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Mandl</surname>
          </string-name>
          , T. (eds.)
          <source>Working Notes Papers of the CLEF 2017 Evaluation Labs</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Wieting</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gimpel</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Livescu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Towards universal paraphrastic sentence embeddings</article-title>
          .
          <source>CoRR abs/1511</source>
          .08198 (
          <year>2015</year>
          ), http://arxiv.org/abs/1511.08198
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>