<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Plagiarism Detection Based on a Novel Trie-based Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alireza Talebpour</string-name>
          <email>Talebpour@sbu.ac.ir</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammad Shirzadi</string-name>
          <email>m.shirzadi@email.kntu.ac.ir</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zahra Aminolroaya</string-name>
          <email>z.aminolroaya@Mail.sbu.ac.ir</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CyberSpace Research, Institute, Shahid Beheshti University</institution>
          ,
          <addr-line>Tehran</addr-line>
          ,
          <country country="IR">Iran</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer, Engineer, Faculty of Computer Science, and Engineering, Shahid Beheshti University</institution>
          ,
          <addr-line>Tehran</addr-line>
          ,
          <country country="IR">Iran</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Laskoukelayeh, CyberSpace Research, Institute, Shahid Beheshti University</institution>
          ,
          <addr-line>Tehran</addr-line>
          ,
          <country country="IR">Iran</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays, plagiarism detection becomes as one of major problems in the text mining eld. New coming technologies have made plagiarisation easy and more feasible. Therefore, it is vital to develop automatic system to detect plagiarisation in di erent contents. In this paper, we propose a trie to compare source and suspicious text documents. We use PersianPlagDet text documents as a case study. Both character-based and knowledgebased techniques for detection purposes have improved our method. Besides, our fast algorithm for insertion and retrieval has made possible to compare long documents with high speed.</p>
      </abstract>
      <kwd-group>
        <kwd>Plagiarism detection</kwd>
        <kwd>Trie-based method</kwd>
        <kwd>Text mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Plagiarism means trying to pass o somebody else's words
as your own [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Plagiarism detection is the process of
locating text reuse within a suspicious document [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Nowadays,
with the advent of technologies like the internet and the
growth of digital content creation, plagiarism, especially in
the format of text from existed contents, becomes a growing
problem and one of the major problems in the text mining
eld. For example, plagiarism as a way to release the
pressure to publish papers pushes down the quality of scienti c
papers. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Lesk declares that, in some countries, 15% of
submissions to arXiv contain duplicated materials and are
plagiarized. Due to these problems, it is urgent to provide
a system to automatically detect plagiarism and validate
them.
      </p>
      <p>
        There have been many approaches proposed based on
lexical and semantic methods. On the one hand, the
plagiarisation problem could be reduced to the problem of nding
exact matched phrases, and, on the other hand, it could be
as hard as nding restated phrases. Due to what a problem
asked, di erent knowledge-based or character-based
techniques could be applied. One of the lexical database for
knowledge-based approach is the wordnet database. In this
database di erent words are grouped together based on their
cognitive synonyms[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This database could be used to nd
restated phrases. Words in di erent locations in sentences
may have di erent applications, so knowing syntactic
category (POS) of the words i.e. noun, verb, etc. could simplify
the problem of plagiarism detection.
      </p>
      <p>
        Plagiarized documents can be in any languages which need
di erent policies to be detected due to di erent semantics
and grammars. In this paper, we have proposed a novel
approach for the PAN FIRE Shared Task of Persian plagiarism
detection in the international contest PersianPlagDet 2016.
We have used a hybrid method considering both
characterbased and knowledge-based approaches. A Persian wordnet
database, Farsnet, is considered as our knowledge database
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Besides, we have applied POS tagging by using HAZM
package [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. By nding nouns and their synsets from the
Farsnet, we could more precisely save and retrieve suspicious
words from our proposed tree structure. In our plagiarism
detection methodology, we have applied a novel extended
pre x tree i.e. trie to store and retrieve documents. We not
only consider the task of text plagiarism detection but also
the algorithm computation time as an important factors.
1.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related works</title>
      <p>
        There are many studies to nd solutions for the
problems of plagiarism detection and document matching. In
the Nineties, studies on copy detection mechanisms of
digitalized documents have led to computerized detecting
plagiarism [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. By the growth of generated data, the speed of
plagiarism detectors has become an important criterion. in
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], a parameterized backward trie matching is considered
as a fast method for the problem of source and suspicious
documents alignment.
      </p>
      <p>
        The plagiarism detection problem is also studied in di
erent languages. For Persian language plagiarism detection,
In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], after preprocessing the source and suspicious
documents, di erent similarity measurements like \Jaccard
similarity coe cient", \Clough &amp; Stevenson metric" and \LCS"
are used for similarity comparisons between source and
suspicious documents. Also, by applying FarsNet, Rakian et.al
propose an approach, \Persian Fuzzy Plagiarism Detection
(PFPD)", to detect plagiarized cases [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>Our fast trie-based approach is proposed for the problem
of the persian language plagiarisation detection. We
describe the problem data and data preprocessing applied to
documents in section 2. Then, in section 3, the novel
approach for plagiarism detection is described, and, in section
4, algorithm evaluation measurement is described, and our
approach is evaluated. Finally, the results are concluded in
section 5.</p>
    </sec>
    <sec id="sec-3">
      <title>DATA</title>
      <p>
        The data is a set of suspicious and source text documents
released by PersianPlagDet competition [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In
PersianPlagDet data, the document plagiarisms could happen in
di erent ways: parts of a source text document could exactly
being copied into a suspicious text, parts of a source text
document with some random changes could being copied
into a suspicious text, and parts of a restated source text
document could be seen in a suspicious text.
2.1
      </p>
    </sec>
    <sec id="sec-4">
      <title>Data preparation</title>
      <p>Before applying plagiarism detection method, the source
and suspicious text documents should be prepared. We
explain the processes needed before plagiarism detection step
by step:</p>
      <sec id="sec-4-1">
        <title>Text tokenization and POS tagging</title>
        <p>
          We tokenize text documents into words. Tokenization is the
procedure of splitting a text into words, phrases, or other
meaningful parts, namely tokens [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. In addition to
tokenization, the exact position of tokens, word o sets, are stored.
A token o set represents the token character-based distance
from the beginning of the document. By applying the Hazm
POS tagger, we also specify part-of-speech of each word.
The nouns are important for us, and help us to compare
phrases for plagiarism detection purpose. Thus, nouns are
agged for the next stages of processing.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Text cleansing and normalization</title>
        <p>First, we normalize text documents. Normalization is the
task of transforming text characters into a unique and
normal form of a language. For example, we convert all Arabic
\yaa" and \Kaaf" to Persian \ye" and \Kaaf" for
preprocessing Persian text documents, and we unify all numbers with
di erent Persian and English unicodes. Punctuations are
also removed from text documents.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Removing stop words and frequent words</title>
        <p>Stop words are also removed from text data. Stop words
are words which are moved out from text data in processing
steps because they do not contain signi cant information.
First, a group of stop words has been selected which an
expert has proposed. Then, frequent words are also chosen
and removed with considering a frequency threshold value.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Stemming words</title>
        <p>The next step is to specify words stems. There are many
kinds of words in ections and derivations. The su ces \haa",
\aan", \yaan", \aat", \ien" and somtimes \gaan" could make
a single word plural. We remove these su ces from nouns.
Also, Arabic broken plurals are the most challenging kinds of
noun pluralization which cannot be distinguished by
removing some su ces. An expert has provided the words stems
by the help of Dehkhoda and Moein dictionaries which could
help us to convert Arabic broken plural nouns to singular
ones.</p>
      </sec>
      <sec id="sec-4-5">
        <title>Acquiring words synsets</title>
        <p>After de ning words part-of-speech, we search through the
Farsnet to nd the nouns cognitive synonyms, synsets. We
nd synsets because words may have be used instead of their
synonyms in di erent positions. For example, \computers"
may be used as \estimators" or \data processors". Like the
words o sets, the synsets o sets are stored. Notice that the
synset o sets are equal to the original words o sets.</p>
        <p>To solve plagiarism problem, o sets speci cation of word
tokens and also collecting noun words synsets and words
stems are basic satellite data to be used in our proposed
tree model, trie, which is explained in the next section.
3.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>METHODOLOGY</title>
      <p>After source and suspicious documents have been
preprocessed, we use a method to nd similar fragments and
their exact o sets in both suspicious and source les.
Before source and suspicious documents being compared,
documents are saved to and retrieved from a trie data structure.
In the next subsections, there would be a brief survey of trie
trees and an explanation of our new proposed trie.
3.1</p>
    </sec>
    <sec id="sec-6">
      <title>Brief survey of trie trees</title>
      <p>A tokenized document is a set of words which can be
stored in a dictionary. A trie data structure can be used to
insert and nd words in a dictionary in O(n), n represents a
single word length. The word \trie" is actually comes from
the \retrieval" which is its usage. In the trie tree, the pre x
tree, each node is a word or a pre x. All pre x characters of
a word are inserted as a node, and the last letter is agged
as the word end in trie. Trie trees could have &lt; key; value &gt;
data structure. Words with similar pre xes may have
similar subpaths. As an example brought in Figure 1, the word
\xy" value is \2". Besides, \xy" and \xyzb" words have similar
subpath. The node values are de ned based on the problem.
In the following, we describe the proposed trie and di erent
key values.</p>
      <p>In this paper, we use trie data structure to insert and
retrieve documents words due to trie properties i.e. fast
insertion and searching and its high adjustment to our problem
example, for a potential copied phrase P = fw1; w2; :::; w5g
in source and suspicious documents, if the synonym of w2
were used instead of w2, both w2 and its synonym are added
to the trie.</p>
      <p>Furthermore, if w2 were deliberately added or deleted
from the suspicious document, our plagiarism detector
system could detect the plagiarized section P correctly. This
feature is achieved because of the nature of the linked lists
which we could trace the front and rear of words with.
According to di erent POS in sentence, the words can be
weighted di erently for being added to removed intelligently.
4.</p>
    </sec>
    <sec id="sec-7">
      <title>EVALUATION</title>
      <p>
        We use macro-averaged precision and recall, granularity
measurements, and the plagdet score described in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The
precision and recall measurements evaluate the performance
of detection in character level, while granularity considers
the contiguity of text plagriasied phrases detected in source
and suspicious documents. The granularity of detections R
under true plagiarisms S is described as below [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ];
gran(S; R) =
1
      </p>
      <p>∑
jSRj s2SR
jRSj
solving i.e. de ning the o sets of plagiarized strings. Our
method for plagiarism detection is divided into two di erent
processes:</p>
      <sec id="sec-7-1">
        <title>Inserting documents to data structures</title>
        <p>After preprocessing both source and suspicious documents,
all the words with their exact positions in the source
document are inserted into trie, and the suspicious words are
added into an ordered list based on their position in the
document. According to the trie de nition, each trie node
is a part of preprocessed words. In the proposed trie, each
word has a \word positions" list which includes the word
occurance positions in the documents. Notice that the words
may have occurred in di erent positions in the document,
but they are only inserted once in the trie, and their
occurance positions are added in the words positions lists. Also,
the words positions lists are only considered for the nodes
which represent the last character of words. The more
repeated words include in the suspicious document, the faster
the trie can be constructed.</p>
        <p>Due to enhancing searching speed, It is better to save the
longer document in the trie, however we always save the
source doument in the trie for simplicity.</p>
      </sec>
      <sec id="sec-7-2">
        <title>Finding the longest plagiarized fragments</title>
        <p>To report plagiarized sections, it is important to nd
similar words based on their sequential occurrences in the source
and suspicious documents. The contiguity of words in
suspicious documents could simply be kept based on the applied
data structure i.e. ordered list. For the source documents,
the words positions lists added in the trie will help us to nd
the order of words in the source plagiarized sections.</p>
        <p>After constructing documents data structure, the longest
plagiarized fragments should be found in both source and
suspicious documents. Thus, we iterate over the suspicious
document words one by one and nd the corresponding words
in source trie. It is obvious that nding words in trie
contribute to obtaining the word positions in the source
document. Also, the detected plagiarized positions of the
suspicious document are added to the corresponding word
\suspicious positions" list in the trie.</p>
        <p>When a similar word is found in both documents the
information of the word front and rear words in source document
in also kept in the trie:</p>
        <p>consider Wp = fwp1 ; wp2 ; :::; wpn g is the list of suspicious
words in ordered list and Ws = fws1 ; ws2 ; :::; wsm g is the
ordered list of source words inserted in the trie. Where n is
the number of words in the suspicious document and m is
the number of distinct words in the source one. If wp1 = ws1
and wp2 = ws2 , then \ws2 ", \ws2 position in the source" and
\wp2 position in the suspicious" are added as the ws1 front
node \value", \words position list" and \suspicious position
list". Moreover, ws1 is added as the rst of a sentence into
the \sentence list". The process is also correct for rear nodes.</p>
        <p>Traversing the suspicious document thoroughly leads to
generating a set of linked lists helping to nd the plagiarized
fragments. The \sentence list" includes the rst of
plagiarized sections. By looking at the rst of sentences in the
sentence list and nding them in the trie, all the plagiarized
fragments could be found.</p>
        <p>Adding both the exact word and its synonyms (with the
help of Farsnet) to the trie would cause to nd the potential
similar sections which are plagiarized by restatement. For
(1)
(2)
(3)</p>
        <sec id="sec-7-2-1">
          <title>Where SR</title>
          <p>tions and RS</p>
        </sec>
        <sec id="sec-7-2-2">
          <title>S are the cases which are detected by detecR are the detections by considering s:</title>
          <p>SR = fsjs 2 S ^ 9r 2 R : r detects sg;</p>
          <p>
            RS = frjr 2 R ^ r detects sg:
Plagdet score is an overall score which considers the other
mentioned measurements. The Plagdet score overall score
is as follows [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ];
plagdet(S; R) =
          </p>
          <p>F
log2(1 + gran(S; R))
In which S and R are detections and true cases of a
plagiarism and F is F M easure, the weighted harmonic mean
of precision and recall which can be de ned as bellow;
F = (1 +
2):</p>
          <p>precision:recall
( 2:precision) + recall
If is not prede ned, we consider = 1.</p>
          <p>
            Table 1 shows the evaluation of our approach on the test
data released by PersianPlagDet 2016 competition which is
based on TIRA and the PAN evaluation setup [
            <xref ref-type="bibr" rid="ref10 ref12 ref18">10, 18, 12</xref>
            ].
Our approach high precision, recall and acceptable
granularity values contribute to admissible plagdet score.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSIONS</title>
      <p>The advents of digitalization and technology have
simplied the act of plagiarizing. Thus, it is crucial to develop
an automatic systems to detect plagiarisation in di erent
contents.</p>
      <p>We rst prepared the text data released by international
PersianPlagDet 2016 contest. We made the data ready by
preprocessing, tokenization and Morphological analysis (e.g.
POS tagging) before documents comparison. In this
paper, we have proposed a novel trie-based approach to save
and retrieve source and suspicious preparation documents
for solving the plagiarism detection problem. Fast
inserting and retrieval long sentences were our reasons to exploit
trie trees structures for the detection problem. Both nding
noun words and their synsets with saving them to our
extended trie have helped us to improve our text comparison
especially in the case of restatement phrase matching.</p>
      <p>To evaluate our algorithm, we used macro-averaged
precision and recall, granularity measurements, and the plagdet
score which were proposed by the PersianPlagDet
competition. High precision, recall and acceptable granularity made
the overall plagdet score for our algorithm admissible.
Besides, thanks to the help of our proposed trie, large
documents can be compared for the purpose of plagiarism
detection.</p>
      <p>In the next study, we will work on the contiguity of text
plagriasied phrase for better granularity results. Besides, we
will consider other part-of-speech synsets like verb synsets
to improve our algorithm performance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <fpage>2013</fpage>
          . Hazm. https://github.com/sobhe/hazm. (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <year>2016</year>
          .
          <article-title>PersianPlagDet 2016</article-title>
          . http://www.ictrc.ac.ir/ plagdet. (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Uysal</surname>
            <given-names>A. K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gunal</surname>
            <given-names>S.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>The impact of preprocessing on text classi cation</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>50</volume>
          ,
          <issue>1</issue>
          (
          <year>2014</year>
          ),
          <volume>104</volume>
          {
          <fpage>112</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Miller</surname>
            <given-names>G. A.</given-names>
          </string-name>
          <year>1995</year>
          .
          <article-title>WordNet: a lexical database for English</article-title>
          .
          <source>Commun. ACM</source>
          <volume>38</volume>
          ,
          <issue>11</issue>
          (
          <year>1995</year>
          ),
          <volume>39</volume>
          {
          <fpage>41</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Asghari</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khoshnava</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fatemi</surname>
            <given-names>O.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Faili</surname>
            <given-names>H.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Developing Bilingual Plagiarism Detection Corpus Using Sentence Aligned Parallel Corpus</article-title>
          .
          <article-title>Notebook for PAN at CLEF (</article-title>
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Asghari</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohtaj</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fatemi</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faili</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Potthast</surname>
            <given-names>M.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Algorithms and Corpora for Persian Plagiarism Detection: Overview of PAN at FIRE 2016</article-title>
          . In Working notes of FIRE 2016 -
          <article-title>Forum for Information Retrieval Evaluation (CEUR Workshop Proceedings</article-title>
          ).
          <source>CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Lesk</surname>
            <given-names>M.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>How many scienti c papers are not original?</article-title>
          <source>Proceedings of the National Academy of Sciences 112</source>
          ,
          <issue>1</issue>
          (
          <year>2015</year>
          ),
          <volume>6</volume>
          {
          <fpage>7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Mozgovoy</surname>
            <given-names>M.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>Enhancing computer-aided plagiarism detection</article-title>
          . University Of Joensuu.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Mahmoodi</surname>
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Varnamkhasti</surname>
            <given-names>M. M.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Design a Persian Automated Plagiarism Detector (AMZPPD)</article-title>
          .
          <source>arXiv preprint arXiv:1403.1618</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Potthast</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barron-Ceden~</surname>
            o
            <given-names>A.</given-names>
          </string-name>
          , and Rosso P.
          <year>2010</year>
          .
          <article-title>An Evaluation Framework for Plagiarism Detection</article-title>
          .
          <source>In 23rd International Conference on Computational Linguistics (COLING 10)</source>
          ,
          <article-title>Chu-Ren Huang</article-title>
          and Dan Jurafsky (Eds.).
          <article-title>Association for Computational Linguistics</article-title>
          , Stroudsburg, Pennsylvania,
          <volume>997</volume>
          {
          <fpage>1005</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Potthast</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barron-Ceden~</surname>
            o
            <given-names>A.</given-names>
          </string-name>
          , and Rosso P.
          <year>2010</year>
          .
          <article-title>An evaluation framework for plagiarism detection</article-title>
          .
          <source>In Proceedings of the 23rd international conference on computational linguistics: Posters. Association for Computational Linguistics</source>
          ,
          <volume>997</volume>
          {
          <fpage>1005</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Potthast</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            <given-names>E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Stein</surname>
            <given-names>B.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Improving the Reproducibility of PAN's Shared Tasks: Plagiarism Detection, Author Identi cation, and Author Pro ling</article-title>
          .
          <source>In Information Access Evaluation meets Multilinguality, Multimodality, and Visualization. 5th International Conference of the CLEF Initiative (CLEF 14)</source>
          , Evangelos Kanoulas, Mihai Lupu, Paul Clough, Mark Sanderson, Mark Hall, Allan Hanbury, and Elaine Toms (Eds.). Springer, Berlin Heidelberg New York,
          <volume>268</volume>
          {
          <fpage>299</fpage>
          . DOI:http://dx.doi.
          <source>org/10.1007/978-3-319-11382-1 22</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Shamsfard</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hesabi</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fadaei</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mansoory</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Famian</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagherbeigi</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fekri</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monshizadeh</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>and Assi S. M.</surname>
          </string-name>
          <year>2010</year>
          .
          <article-title>Semi automatic development of farsnet; the persian wordnet</article-title>
          .
          <source>In Proceedings of 5th Global WordNet Conference</source>
          , Mumbai, India, Vol.
          <volume>29</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Lea</surname>
            <given-names>M. R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Street</surname>
            <given-names>B.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>understanding textual practices in higher education</article-title>
          .
          <source>Writing: Texts, processes and practices (</source>
          <year>2014</year>
          ),
          <fpage>62</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>More</surname>
            <given-names>N.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Trie Data Structure</article-title>
          . http://www. ideserve.co.in/learn/trie
          <article-title>-insert-and-</article-title>
          <string-name>
            <surname>search.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Brin</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Garcia-Molina</surname>
            <given-names>H.</given-names>
          </string-name>
          <year>1995</year>
          .
          <article-title>Copy detection mechanisms for digital documents</article-title>
          .
          <source>In ACM SIGMOD Record</source>
          , Vol.
          <volume>24</volume>
          . ACM,
          <volume>398</volume>
          {
          <fpage>409</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Rakian</surname>
            <given-names>Sh.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Esfahani</surname>
            <given-names>F. S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Rastegari</surname>
            <given-names>H.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>A Persian Fuzzy Plagiarism Detection Approach</article-title>
          .
          <source>Journal of Information Systems and Telecommunication (JIST) 3</source>
          ,
          <issue>3</issue>
          (
          <year>2015</year>
          ),
          <volume>182</volume>
          {
          <fpage>190</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Gollub</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Burrows</surname>
            <given-names>S.</given-names>
          </string-name>
          <year>2012</year>
          . Ousting Ivory Tower Research:
          <article-title>Towards a Web Framework for Providing Experiments as a Service</article-title>
          .
          <source>In 35th International ACM Conference on Research and Development in Information Retrieval (SIGIR 12)</source>
          , Bill Hersh, Jamie Callan, Yoelle Maarek, and Mark Sanderson (Eds.). ACM,
          <volume>1125</volume>
          {
          <fpage>1126</fpage>
          . DOI:http://dx.doi.org/ 10.1145/2348283.2348501
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>