<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Text Alignment Algorithm Based on Prediction of Obfuscation Types Using SVM Neural Network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fatemeh Mashhadirajab</string-name>
          <email>f.mashhadirajab@mail.sbu.ac.ir</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mehrnoush Shamsfard</string-name>
          <email>m-shams@sbu.ac.ir</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>NLP Research Lab, Faculty of Computer Science and Engineering, Shahid Beheshti University</institution>
          ,
          <country country="IR">Iran</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe our text alignment algorithm that achieved the first rank in Persian Plagdet 2016 competition. The Persian Plagdet corpus includes several obfuscation strategies. Information about the type of obfuscation helps plagiarism detection systems to use their most suitable algorithm for each type. For this purpose, we use SVM neural network for classification of documents according to the type of obfuscation strategy used in a document pair. Then, we set the parameter values in our text alignment algorithm based on the detected type of obfuscation. The results of our algorithm on the test dataset and training dataset in the Persian Plagdet 2016 are shown in this article.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Plagiarism detection</kwd>
        <kwd>Text alignment</kwd>
        <kwd>SVM neural network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        In recent years, automatic discovery of plagiarism has been
considered by many researchers, and many plagiarism detection
systems have been developed [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ]. Plagiarism detection has
become a task of PAN competition1 which is held every year
since 2009 to evaluate participants’ plagiarism detection
algorithms. At the PAN competition the plagiarism detection task
is divided into source retrieval and text alignment subtasks. The
task of source retrieval is to retrieve documents similar to the
suspicious document from the set of source documents, and the
duty of text alignment task is to extract all the plagiarized
passages from the given source-suspicious document pair. Figure
1 shows different parts of a plagiarism detection system.
As mentioned, in text alignment, the documents in the data sets
used to evaluate similarity detector systems are divided into two
categories of source documents and suspicious documents. Each
Suspicious document contains one or more parts of a source
document in its original or edited or rephrased form. The duty of
text alignment -which is the focus of this paper- is to find the
plagiarized parts of the source document in the suspicious
document for each pair of source and suspicious document [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Persian Plagdet 2016 competition2 which is a subtask of PAN Fire
2016 competition3 is held for Persian language. It means that the
text alignment algorithms are evaluated on a Persian corpus.
      </p>
      <sec id="sec-1-1">
        <title>1 http://pan.webis.de/</title>
      </sec>
      <sec id="sec-1-2">
        <title>2 http://ictrc.ac.ir/plagdet/</title>
        <p>In this paper we discuss our proposed algorithm which has
participated in Persian Plagdet 2016 and ranked first among
participants. Our approach firstly uses a neural network for
detecting the type of obfuscation in each document pair. Then it
sets the parameters in the text alignment algorithm based on the
detected type of obfuscation. The rest of the paper explains the
proposed algorithm with the special focus on the obfuscation type
detection module. Then the result of our evaluation of the system
on Persian Plagdet corpus 2016 is discussed.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
      <p>
        At the PAN competition, the text alignment algorithms are
evaluated by the evaluation corpora that contain different types of
obfuscation. For example in PAN 2013- 2014 competitions, the
evaluation corpus consisted of the obfuscation types: none,
random, translation and summary. In PAN text alignment corpora
it is assumed that just one type of obfuscation is employed in each
document pair. Based on this assumption most participants try to
predict the type of obfuscation strategy used in a document pair
and detect similarities based on the predicted type. At PAN 2014
competition in Glinos’ algorithm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] all of plagiarism documents
are divided into two categories: order-based and non-order based.
The order-based plagiarism involves none and random
obfuscations. The non-order based plagiarism involves translation
and summary obfuscation. They use Smith-Waterman algorithm
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to detect aligned sequences of document pairs and so, detect
the order-based plagiarism cases. If no aligned sequences have
been found, then document pairs are given to the clustering
component to detect non-order based plagiarism cases. Miguel et
al [
        <xref ref-type="bibr" rid="ref1 ref9">1, 9</xref>
        ] categorize document pairs of PAN 2014 corpus into three
categories: Verbatim, Summary and Other plagiarism cases and
set the parameters in their algorithm based on the categories. They
use the Longest Common Substring (LCS) algorithm to find every
single common sequence of word (th-Verbatim). If at least one
Verbatim case have been found, the document pair is considered
as Verbatim plagiarism. If no Verbatim case have been found and
the length of plagiarism fragments in the suspicious document is
much smaller than the length of source fragments, the document
pair is considered as Summary plagiarism, otherwise the
document pair is considered as Other plagiarism cases. Also
Palkovskii et al [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] in their algorithm use a graphical clustering
algorithm to detect type of plagiarism in a document pair. They
classify document pairs of PAN 2014 text alignment corpus into
four categories: Verbatim Plagiarism, Random Plagiarism,
Summary type Plagiarism and Undefined type. Afterward, they
set the parameters based on the detected type of plagiarism. In
Persian plagdet 2016 corpus there are three type of obfuscation:
3 http://fire.irsi.res.in/fire/2016/home
none, random and simulated. In our proposed approach, the
document pairs of the Persian plagdet 2016 corpus are classified
into two categories: Verbatim plagiarism and Simulated
plagiarism. We use SVM neural network to detect type of
plagiarism. The SVM neural network has been trained by type of
obfuscation in the Persian plagdet 2016 training corpus. Then we
set the parameters based on the detected type of plagiarism.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. METHODOLOGY</title>
      <p>
        Our proposed text alignment algorithm like many other text
alignment algorithms [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] includes four stages of preprocessing,
seeding, extension, and filtering. Each of these four stages will be
explained in this section. In addition, Figure 2 is an overarching
scheme of our text alignment algorithm that shows these four
stages.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3.1 Preprocessing</title>
      <p>
        In the preprocessing stage, first, the text is segmented into
sentences and then tokenized by STeP_1 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and Stopwords [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
are removed, and inflectional and derivational stems of tokens are
extracted and restored by STeP_1. Preprocessing is done for a pair
of suspicious and source document and the sentences of
suspicious and source document will be given the seeding stage.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3.2 Seeding</title>
      <p>
        In this stage, the purpose is to extract the similar sentence pairs
from source and suspicious documents that we call them seed. For
seeding, our method is initially based on the method introduced
by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We expanded the mentioned method by using SVM neural
net to predict the obfuscation type in order to adjust parameters to
gain better results. In this approach, based on vector space model
(VSM) method, first, tf-idf vector is calculated for all sentences of
suspicious and source documents. Where tf is the term frequency
in the corresponding sentence and idf is the inverse sentence
frequency. Then, the similarity of each sentence pair of suspicious
and source document is calculated using cosine measure and Dice
coefficient according to Eq. 1, 2 and 3.
      </p>
      <p>)
)
|
|
|
||</p>
      <p>
        |
|
|
|
|
(1)
(
(
{
Where is the vector of ith sentence from suspicious
document and is the vector from jth sentence of source and |.|
is the Euclidean length. Cosine measure and Dice coefficient are
calculated for all pairs of sentences and if the similarity of two
vectors of and is more than threshold of 0.3 (chosen
based on [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) based on the both criteria above, this pair of
sentences are considered as seed, and for the pairs of sentences
whose similarity is more than 0.1 and less than 0.3 (chosen based
on our experiments), the similarity will be evaluated semantically.
For this purpose, using SVM neural network4, the type of
obfuscation strategy used in the document pairs will be specified.
We use cosine similarity percentage between all pairs of sentences
of two suspicious and source documents to create our SVM input
vector. SVM input vector for suspicious and source document pair
is calculated as follows:
An 8-bit vector for each document pair is considered. The range
of similarities is divided into 8 intervals. Each bit of the vector
corresponds to one of these intervals and indicates if there are two
sentences in the document pair whose similarity is in the
corresponding interval. In other words, indicates sentence
similarity between the value and value. Where for
,
If there are sentence pairs whose cosine similarity are between
and in a document pair, then = 1; otherwise, = 0.
This vector is given to SVM neural network previously trained by
Persian Plagdet training dataset 2016, and document pair
obfuscation strategy is projected. We set maximum and minimum
similarity threshold in semantic similarity measure based on the
type of obfuscation and the amount of similarity between the pairs
of sentences. To calculate the semantic similarity we use FarsNet
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to extract synsets of each term and STeP_1 to extract
inflectional and derivational stems of each term. Thus, for each
term, a set of words called is considered as shown in Figure
3. Then, for each from vector , if overlaps
́ of each ́ of vector , of vector is replaced by
́ of vector . Finally, the similarity of cosine and Dice is
calculated for the two resulting vectors, and the similarity between
the results at this stage and results of cosine and Dice in the
previous stage are averaged; if the result is greater than the
threshold, the pair of and are considered as seed. The
set of seeds obtained in this stage enter the extension stage.
      </p>
    </sec>
    <sec id="sec-6">
      <title>3.3 Extension</title>
      <p>
        The purpose of the extension stage is the extraction of the longest
similar passages from the suspicious and source documents. As
shown in Figure 2, extension consists of two parts: clustering and
validation. In the clustering stage, the document is clustered into
pieces, so that each piece contains a number of seeds where the
(similarity) distance between them does not exceed a threshold. In
the validation stage, among the pair of passages created in the
clustering stage; those that are not similar enough are removed.
Again, for the extension stage, we adopt and enhance the method
proposed by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The difference is that in the validation stage, we
use semantic similarity measure instead of cosine measure to
determine the similarity between pairs of passages.
      </p>
      <sec id="sec-6-1">
        <title>4 http://www.csie.ntu.edu.tw/~cjlin/libsvm/</title>
        <sec id="sec-6-1-1">
          <title>Preprocessing</title>
          <p>Sentence Splitting</p>
          <p>Tokenizing
Remove Stopwords
Stemming
STeP_1</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>Extension</title>
          <p>Clustering
Validation</p>
        </sec>
        <sec id="sec-6-1-3">
          <title>Seeding</title>
          <p>tf_idf
Cosine Measure
Dice Coefficient
Classification
Setting Parameters</p>
          <p>Semantic
Similarity Measure
Training
Dataset
SVM
neural
network
FarsNet</p>
        </sec>
        <sec id="sec-6-1-4">
          <title>Filtering</title>
          <p>Resolving Overlapping
Removing Small Cases
Plagiarism
Passages</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>3.4 Filtering</title>
      <p>
        The filtering stage removes some passages that either overlap or
are too short. To remove overlapping passages we use the
proposed method in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].To remove too short passages, we use a
recursive algorithm. If the length of a passage is less than a
threshold, we first assume that other seeds should have been
existed in this passage, but we had not identified them. So we
decrease the threshold on semantic similarity measure and go
back to the seeding stage, and we extract the seeds based on the
new threshold; and repeat all the stages to remove the too short
parts. If the part was not big enough this time, the part will be
removed.
      </p>
      <p>… =
obfuscation and parameter settings based on the type of
obfuscation, precision and recall have been improved dramatically
on all types of obfuscation.</p>
    </sec>
    <sec id="sec-8">
      <title>4. RESULTS</title>
      <p>
        We implemented our algorithm in C#.Net and evaluated it based
on the PAN evaluation setup [
        <xref ref-type="bibr" rid="ref15 ref16 ref17">15, 16, 17</xref>
        ]. In the evaluating stage
we ran our algorithm on the Persian Plagdet 2016 training and test
dataset [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The results of this evaluation also are shown in Table
1. As can be seen, the results of our algorithm on both training
and test corpora are very close. Training corpus in this
competition includes a variety of obfuscation strategies including
None, Random and simulated obfuscation category. Table 2
shows the results of our algorithm on any of the obfuscation
strategies in training dataset. In Table 2, column P_1 shows the
results of our algorithm on the types of obfuscation in training
dataset where the semantic similarity measure is not used. P_2
column shows the algorithm results using semantic similarity
measure. Column P_3 shows the results of our algorithm after
adding the criterion of semantic similarity, and also adjusting the
parameters based on the detected type of obfuscation using neural
network. As can be seen, in column P_2, by adding semantic
similarity criteria, the recall for the types of obfuscation in
training corpus is improved, but the precision has been declined in
some cases while in the column P_3, it is seen that by adding a
neural network to the system for the diagnosis of type of
      </p>
    </sec>
    <sec id="sec-9">
      <title>5. Conclusions and Future Work</title>
      <p>We described our algorithm for the task of text alignment, and
presented the results of the evaluation of this algorithm on test and
training dataset in Persian Plagdet 2016, that it was the best result
compared with the results of other participants. In our method, we
used SVM neural network to identify the type of obfuscation and
then to set the parameters on the basis of obfuscation; the results
showed that this is effective in improving the precision and recall.
In the future, we are going to improve the semantic similarity
measure in the seeding stage of our system. We want to use the
neural network to estimate the semantic similarity of pair of
sentences. We also want to use methods such as genetic
algorithms to automatically adjust the parameters.</p>
    </sec>
    <sec id="sec-10">
      <title>6. REFERENCES</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Sanchez-Perez</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gelbukh</surname>
            ,
            <given-names>A. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Dynamically adjustable approach through obfuscation type recognition</article-title>
          .
          <source>In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum</source>
          , (Toulouse, France, September 8-
          <issue>11</issue>
          ,
          <year>2015</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>1391</volume>
          . CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Shamsfard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Shahedi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <article-title>STeP-1: standard text preparation for Persian language</article-title>
          , CAASL3 Third Workshop on Computational Approaches to Arabic ScriptLanguages.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Shamsfard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Developing FarsNet: A lexical ontology for Persian</article-title>
          .
          <source>proceedings of the 4th global WordNet conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Davarpanah</surname>
            ,
            <given-names>M. R.</given-names>
          </string-name>
          , sanji, M. and
          <string-name>
            <surname>Aramideh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Farsi lexical analysis and StopWord list</article-title>
          .
          <source>Library Hi Tech</source>
          , vol.
          <volume>27</volume>
          , pp
          <fpage>435</fpage>
          -
          <lpage>449</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>FIEDLER</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          and KANER,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2010</year>
          . Plagiarism Detection Services:
          <article-title>How Well Do They Actually Perform</article-title>
          .
          <source>IEEE Technology And Society Magazine</source>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Alzahrani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salim</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Abraham</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Understanding plagiarism linguistic patterns, textual features, and detection methods</article-title>
          .
          <source>IEEE Trans. SYSTEMS</source>
          ,
          <string-name>
            <given-names>MAN</given-names>
            , AND
            <surname>CYBERNETICS-PART</surname>
          </string-name>
          <string-name>
            <surname>C</surname>
          </string-name>
          :
          <article-title>APPLICATIONS</article-title>
          AND REVIEWS, vol.
          <volume>42</volume>
          , no. 2.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>A. M. E. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abdulla</surname>
            ,
            <given-names>H. M. D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Snasel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Survey of plagiarism detection methods</article-title>
          .
          <source>IEEE Fifth Asia Modelling Symposium (AMS)</source>
          , pp.
          <volume>39</volume>
          _
          <fpage>42</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Göring</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Towards data submissions for shared tasks: first experiences for the task of text alignment</article-title>
          .
          <source>Working Notes Papers of the CLEF 2015 Evaluation Labs, CEUR Workshop Proceedings, ISSN 1613-0073.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Sanchez-Perez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gelbukh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>The Winning Approach to Text Alignment for Text Reuse Detection at PAN 2014</article-title>
          .
          <article-title>In: Notebook for PAN at CLEF</article-title>
          <year>2014</year>
          .
          <article-title>(15-18</article-title>
          <string-name>
            <surname>September</surname>
          </string-name>
          , Sheffield, UK).
          <source>CEUR Workshop Proceedings, ISSN 1613-0073</source>
          , Vol.
          <volume>1180</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          .org, pp.
          <fpage>1004</fpage>
          -
          <lpage>1011</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Glinos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>A Hybrid Architecture for Plagiarism Detection-Notebook for PAN at CLEF 2014</article-title>
          .
          <article-title>CLEF 2014 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers, (
          <volume>15</volume>
          -
          <fpage>18</fpage>
          September, Sheffield, UK).
          <source>CEUR-WS.org. ISSN 1613-0073.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Palkovskii</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Belov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Developing HighResolution Universal Multi-Type N-Gram Plagiarism Detector-Notebook for PAN at CLEF 2014</article-title>
          .
          <article-title>CLEF 2014 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers, (
          <volume>15</volume>
          -
          <fpage>18</fpage>
          September, Sheffield, UK).
          <source>CEUR-WS.org. ISSN 1613-0073.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beyer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Busse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tippmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Overview of the 6th International Competition on Plagiarism Detection</article-title>
          . In: Working Notes for CLEF 2014 Conference, (Sheffield, UK,
          <fpage>15</fpage>
          -
          <lpage>18</lpage>
          September).
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>1180</volume>
          , pp.
          <fpage>845</fpage>
          -
          <lpage>876</lpage>
          . CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Waterman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>1981</year>
          .
          <article-title>Identification of common molecular subsequences</article-title>
          .
          <source>Journal of molecular biology</source>
          . Vol.
          <volume>147</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>195</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Asghari</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohtaj</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fatemi</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faili</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <year>2016</year>
          .
          <article-title>Algorithms and Corpora for Persian Plagiarism Detection: Overview of PAN at FIRE 2016</article-title>
          . In Working notes of FIRE 2016 -
          <article-title>Forum for Information Retrieval Evaluation, Kolkata</article-title>
          , India, December 7-
          <issue>10</issue>
          ,
          <year>2016</year>
          , CEUR Workshop Proceedings, CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>An Evaluation Framework for Plagiarism Detection</article-title>
          .
          <source>In 23rd International Conference on Computational Linguistics (COLING 10)</source>
          , pp.
          <fpage>997</fpage>
          -
          <lpage>1005</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
          </string-name>
          . B. and
          <string-name>
            <surname>Burrows</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2012</year>
          . Ousting Ivory Tower Research:
          <article-title>Towards a Web Framework for Providing Experiments as a Service</article-title>
          .
          <source>In 35th International ACM Conference on Research and Development in Information Retrieval (SIGIR 12)</source>
          , pp.
          <fpage>1125</fpage>
          -
          <lpage>1126</lpage>
          . ACM.
          <source>ISBN 978-1- 4503-1472-5.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Improving the Reproducibility of PAN's Shared Tasks: Plagiarism Detection, Author Identification, and Author Profiling</article-title>
          .
          <source>In Information Access Evaluation meets Multilinguality, Multimodality, and Visualization. 5th International Conference of the CLEF Initiative (CLEF 14)</source>
          , pp.
          <fpage>268</fpage>
          -
          <lpage>299</lpage>
          , Berlin Heidelberg New York. Springer.
          <source>ISBN 978-3-319-11381-4.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>