<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the AraPlagDet PAN@FIRE2015 Shared Task on Arabic Plagiarism Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Imene Bensalem</string-name>
          <email>bens.imene@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lahsen Abouenour</string-name>
          <email>abouenour@yahoo.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Imene Boukhalfa</string-name>
          <email>boukhalfa_imene@hotmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kareem Darwish</string-name>
          <email>kdarwish@qf.org.qa</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso</string-name>
          <email>prosso@dsic.upv.es</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Salim Chikhi</string-name>
          <email>slchikhi@yahoo.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>MISC Lab, Constantine 2 University</institution>
          ,
          <country country="DZ">Algeria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Mohammadia School of</institution>
          ,
          <addr-line>Engineers, Mohamed V Rabat</addr-line>
          ,
          <institution>University</institution>
          ,
          <country country="MA">Morocco</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>NLE Lab, PRHLT, Universitat Politècnica de València</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Qatar Computing Research, Institute</institution>
          ,
          <addr-line>Qatar Foundation, Doha</addr-line>
          ,
          <country country="QA">Qatar</country>
        </aff>
      </contrib-group>
      <fpage>111</fpage>
      <lpage>122</lpage>
      <abstract>
        <p>AraPlagDet is the first shared task that addresses the evaluation of plagiarism detection methods for Arabic texts. It has two subtasks, namely external plagiarism detection and intrinsic plagiarism detection. A total of 8 runs have been submitted and tested on the standardized corpora developed for the track. This overview paper describes these evaluation corpora, discusses the participants' methods, and highlights their building blocks that could be language dependent.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Despite the lack of large-scale studies on the prevalence of
plagiarism in the Arab world, the large number of news on this
phenomenon in media1 attests its pervasiveness. There are also
some studies that show the lack of awareness on the definition and
seriousness of plagiarism among Arab students [
        <xref ref-type="bibr" rid="ref3">3, 18</xref>
        ]. These
same studies suggest the use of plagiarism detection software as
one of the solutions to tackle the problem. In the last few years,
some papers have been published on Arabic plagiarism detection
[
        <xref ref-type="bibr" rid="ref6">6, 10, 19–21, 26, 28, 41</xref>
        ]. However, the proposed methods have
been evaluated using different corpora and strategies, which
makes the comparison between them very difficult. AraPlagDet is
the first shared task that addresses the detection of plagiarism in
Arabic texts. Our motivations to organize such a shared task are
to:
─ Contribute in raising the awareness in the Arab world on the
seriousness of plagiarism and the importance of its detection.
1 Some
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>ARAPLAGDET TASK DESCRIPTION</title>
      <p>AraPlagDet shared task involves two sub-tasks, namely: External
plagiarism detection and Intrinsic plagiarism detection. Each
participant was allowed to submit up to three runs in one or both
sub-tasks. From 2009 to 2011, PAN2 plagiarism detection
competitions have been organized with these two sub-tasks3.The
evaluation corpora in these competitions were mostly English.
Thus, AraPlagDet is the first plagiarism detection competition on
Arabic documents.</p>
      <p>External and intrinsic plagiarism detection tasks are significantly
different approaches for plagiarism detection. In the external
plagiarism detection sub-task, participants were provided with two
collections of documents, namely suspicious and source, and the
task is to identify the overlaps (exact or not) between them. In the
intrinsic plagiarism detection sub-task, participants were provided
with suspicious documents and the task is to identify in each
document the inconsistencies with respect writing style. This
approach is useful when the potential sources of plagiarism are
unknown, and this is still a less explored area in comparison with
the external approach.</p>
      <p>A total of 18 teams and individuals from different countries (six of
them are not Arab) registered in the shared task, which shows the
2 http://pan.webis.de
3 Since 2012 PAN plagiarism detection competition focuses on
the external approach.
interest of practitioners and researchers in this topic. However,
only three participants submitted their runs.</p>
    </sec>
    <sec id="sec-3">
      <title>3. EXTERNAL AND INTRINSIC</title>
    </sec>
    <sec id="sec-4">
      <title>PLAGIARISM DETECTION</title>
      <p>
        Given a document d and a potential source of plagiarism D’,
detecting plagiarism by the external approach consists in
identifying pairs (s , s’) from d and d’ (d’ ∊ D’) respectively, such
that s and s’ are highly similar. This similarity could has many
levels: s is an exact copy of s’, s was obtained by obfuscating s’
(e.g. paraphrasing, summarizing, restructuring ...etc) or s is
semantically similar to s’ but uses different words or even
different language. This problem has been tackled by many
researchers in the last decade using a plethora of techniques
related to information retrieval and near-duplicate detection.
Techniques are used on the one hand, to retrieve the source d’
from D’, and on the other hand, to make an extensive comparison
between d and d’. Examples of techniques used to compare
passages include character n-grams and kernels [16] and
skip-ngrams and exact matching [30]. The last trend is to adapt methods
to detect a kind of plagiarism obfuscation. For instance,
SanchezPerez et al.’s method [39] is oriented to detect plagiarism cases
that summarize the source passage. See Section 4 for more details
on the building blocks of external plagiarism detection methods.
Given a document d, detecting plagiarism by the intrinsic
approach consists in identifying in d the set of passages S, such
that each s ∈ S is different from the rest of the document with
respect to writing style. Then, techniques used in this approach
consist in finding the best textual features that are able to
distinguish the writing style of different authors in one document.
It is obvious that intrinsic plagiarism detection is strongly related
to authorship attribution [42], paragraph authorship clustering [12]
and detection of inconsistencies in multi-author documents [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Techniques used are related to feature extraction and
classification. For instance Stamatatos [43] used character
ngrams as features and a distance function for classification. Stein
et al. [45] used a vector space model of lexical and syntactic
features and supervised classification. See Section 5 for more
details on the building blocks of intrinsic plagiarism detection
methods.
      </p>
      <p>
        All the aforementioned methods were tested on English corpora,
namely PAN plagiarism detection corpora. Methods developed
and tested on Arabic documents are very few [
        <xref ref-type="bibr" rid="ref6">6, 10, 19–21, 26,
28, 41</xref>
        ]. As we mentioned above they were evaluated using
different strategies and corpora, which makes difficult to draw a
clear conclusion on their performance. Recently, an effort has
been made to build annotated corpora for external plagiarism
detection [40] and also intrinsic plagiarism detection [8].
However, they have been used only by their authors so far [11,
21].
      </p>
    </sec>
    <sec id="sec-5">
      <title>4. EXTERNAL PLAGIARISM</title>
    </sec>
    <sec id="sec-6">
      <title>DETECTION SUB-TASK</title>
      <p>We describe in this section the evaluation corpus and the
submitted methods in the external plagiarism detection sub-task.</p>
    </sec>
    <sec id="sec-7">
      <title>4.1 Corpus</title>
      <p>The collection of a large number of documents incorporating real
plagiarism may be difficult and hence not very practical.
Therefore, plagiarism detection corpora are usually built
automatically or semi-automatically by creating artificial
plagiarism cases and inserting them in host documents4. To this
end, it is essential to compile two sets of documents: i) the source
documents, from which passages of text are extracted; and ii) the
suspicious documents, in which the aforementioned passages are
inserted after undergoing (optionally) obfuscation processing.</p>
      <sec id="sec-7-1">
        <title>4.1.1 Source of Text</title>
        <p>
          To build our corpus for external plagiarism detection sub-task
(ExAra-2015 corpus), we used documents from the Corpus of
Contemporary Arabic (CCA)5 [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and Arabic Wikipedia6. The
CCA involves hundreds of documents in a variety of topics and
genres. Most of them have been collected from magazines. Our
motivation to use the CCA as the main source of text for our
corpus is three-fold:
─ The corpus documents have a variety of topics and genres.
        </p>
        <p>Such a variety is desirable, because it makes the plagiarism
detection corpus more realistic.
─ Each document is tagged with its topic, which is a favorable
feature in the process of creating artificial suspicious
documents. In this process, which attempts to imitate real
plagiarism, the inserted plagiarism cases should topically
match the topic of the suspicious (host) document.
─ The corpus is freely available and their developers were keen
to have copyright permissions from the owners of the collected
texts to use them for research purposes7.</p>
        <p>Besides CCA, we included in our corpus –specifically in the
source documents set– hundreds of documents from Arabic
Wikipedia. We collected them manually by selecting documents
that match the topics of the suspicious documents. These
documents have been incorporated in the corpus to baffle the
detection, and only few cases have been created from them.
Surprisingly, we realized8 that many of the collected Wikipedia
articles (notably biographies) contain exact or near exact copies of
large passages from the CCA documents. This fact resulted in
plagiarism cases that are not annotated in the corpus. To address
this issue, we applied a simple 5-grams method to identify these
cases of ‘real’ plagiarism between the suspicious documents and
the collected Wikipedia documents, and we discarded from the
corpus the Wikipedia documents involving the detected passages9.</p>
      </sec>
      <sec id="sec-7-2">
        <title>4.1.2 Obfuscations</title>
        <p>We created two kinds of plagiarism cases: artificial (created
automatically) and simulated (created manually). For the
automatically created cases, we used the strategy of phrase
shuffling and word shuffling. To avoid producing cases that have
the same pattern of shuffling, we applied to the cases of the test
4 There is also the manual approach, which consists in asking a
group of people to write essays and plagiarize. This method
produces realistic plagiarism, however it is costly in terms of
material and human resources and time [36].
5 http://www.comp.leeds.ac.uk/eric/latifa/research.htm
6 http://ar.wikipedia.org
7 We contacted Eric Atwell (the co-developer of CCA) who gives
us the permission to use CCA documents in our corpus.
8 We started to be aware of this issue thanks to AraPlagDet
participants who pointed out the existence of some plagiarism
cases that have not been annotated in ExAra sample corpus
which has been released before the official training corpus.
9 Annotating the plagiarism in these documents would be a better
solution but we chose to discard them because of time
limitation.
corpus a different algorithm than the one used for the training
corpus.</p>
        <p>Regarding manually created plagiarism, we employed two
obfuscation strategies: synonym substitution and paraphrasing.
Both of them are described below.</p>
      </sec>
      <sec id="sec-7-3">
        <title>4.1.2.1 Manual Synonym Substitution</title>
        <p>To create plagiarism cases with this obfuscation, we did the
following:
─ Manually replaced some words with their synonyms. We used
as source of synonyms Almaany dictionary10, the Microsoft
Word synonym checker, Arabic WordNet Browser11, and the
synonyms provided by Google translate12. It should be noted
that an Arabic singular noun may have multiple plural forms
that are synonymous. For example, the word 'ةريزج' (jazira–
island) has the plurals 'رئازج' (jazair) and 'رزج' (juzur).
10 http://www.almaany.com/
11 http://globalwordnet.org/arabic-wordnet/awn-browser/
12 https://translate.google.com
─ Added diacritics (short vowels) to some words, where
diacritics in Arabic are optional and their inclusion or
exclusion are orthographically acceptable. Consequently, we
can have for a word w whose length is n letters, at least 2n
different representations. For example, the different
representations of the word 'قح ' (haq– truth) with and without
diacritics are depicted in Fig.1.
wٌitقhoحutletters’ diacrٌitics.</p>
        <p>
          Fig. 1. Different representations of the same word with and
We decided to substitute words with their synonyms manually (no
matter it is time-consuming) after many attempts to perform this
task automatically. Despite our efforts to obtain exact synonyms
by using part of speech tagging and lemmatization, our attempts
produced either passages with totally different meanings from the
original ones (poor precision) or very few passages with
substituted words (poor recall). These unsuccessful attempts could
be respectively attributed to:
(i) The high ambiguity of Arabic language: researchers estimated
the average number of ambiguities for a token in Arabic
language is 8 times higher than in most other languages [15].
Therefore, it is not surprising to find it difficult to select
automatically the appropriate synonym in a given context.
(ii) The limited coverage of lexical resources: in our experiments
we used Arabic WordNet as a source of synonyms.
Unfortunately, this resource, which is one of the most
important and freely available linguistic resources for Arabic,
contains only 9.7 % of the estimated Arabic vocabulary [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
Hence, the very low recall of the automatic synonym
substitution is quite justified.
        </p>
      </sec>
      <sec id="sec-7-4">
        <title>4.1.2.2 Manual Paraphrasing</title>
        <p>Cases produced with this obfuscation strategy are the most
realistic ones in our corpus. This is because the passages to be
obfuscated have been selected manually from the source and then
paraphrased manually. The results are plagiarism cases that are
very close in terms of topic to the suspicious documents that host
them. In this type of obfuscation, all kinds of modifications were
applied (restructuring, synonym substitution, removing repetitions
…etc.), provided that the meaning of the original passage is
maintained.</p>
        <p>Due to the dullness and slowness of the manual process13, we
produced 338 cases with synonym substitution obfuscation and
only 44 cases with paraphrasing obfuscation. See Table 1 for more
detailed statistics.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>4.2 Methods Description</title>
      <p>Three participants submitted their runs. Since multiple
submissions were allowed, two participants submitted three runs.
Therefore, we collected a total of seven runs. Two participants
13 We are aware about the possibility to use the crowdsourcing to
allow the creation of a large number of plagiarism cases
manually [37]. However, apart from some few volunteers we
crafted the cases (of the synonym substitution and paraphrasing)
ourselves because of the lack of financial resources.
among the three submitted working notes describing their
methods. Following, we summarize the work of these two
participants.</p>
      <sec id="sec-8-1">
        <title>4.2.1 Generic Process</title>
        <p>External plagiarism detection methods involve mainly two phases:
the source retrieval and the text alignment [35]. For a given
suspicious document d, the source retrieval phase consists of
selecting from the available set of source documents D, a subset
D' of documents that are the most likely source of plagiarism.
Text alignment is the process of extensively comparing d with
each document in D' in order to determine the similar passages.
Fig.2 depicts the building blocks of these two phases. PAN
competition series on plagiarism detection has contributed
significantly to defining these phases and setting their
terminology14. Therefore, the detailed explanation of these phases
with their building blocks could be found in PAN overview papers
[31–35, 38]. In this paper, we are just adopting this terminology to
describe the methods of participants.</p>
        <sec id="sec-8-1-1">
          <title>Source retrieval Text alignment</title>
        </sec>
        <sec id="sec-8-1-2">
          <title>Chunking: segmenting the suspcious document into chunks.</title>
          <p>Keyphrase extraction:
extracting keyphrease
form each chunk.</p>
          <p>Queries formulation:
combining keyphrases
and creating one (or
more) query for each
chunk.</p>
          <p>Search Control :
scheduling and submitting
the queries to a search
engin that indexes the
source documents.</p>
          <p>Candidate Filtering:
selecting from the search
results the (source)
documents that are
worthy of the text
alignment phase.</p>
          <p>Seeding: extracting units
(relativelly short) of text
from the suspcious and
the source documents and
detecting matchs between
them.</p>
        </sec>
        <sec id="sec-8-1-3">
          <title>Extension: merging the adjacent matched seeds to form aligned plagiarism passages.</title>
          <p>Passage Filtering:
discarding passages
judged irrelivant.</p>
        </sec>
      </sec>
      <sec id="sec-8-2">
        <title>4.2.2 Participants Methods</title>
        <p>
          We describe in this subsection the methods of Magooda et al.[23]
and Alzahrani[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Magooda et al. used two different approaches
for the source retrieval and three for text alignment and combined
them in different ways in the three submitted methods:
Magooda_1, Magooda_2, and Magooda_3. Alzahrani submitted
one method. Tables 2 and 3 provide details on these approaches.
In what follows we discuss the submitted methods regarding two
aspects: scalability and language dependence regardless their
performance that will be discussed later.
14 The source retrieval phase is often also called heuristic retrieval
and candidate retrieval. The text alignment phase has been
called also detailed analysis and detailed comparison.
15 Diacritics removal, and letters normalization are not reported in
Magooda et al. working notes [23]. We found out about that
because of a discussion with the first author.
        </p>
      </sec>
      <sec id="sec-8-3">
        <title>4.2.2.1 Scalability</title>
        <p>First, it should be noted that our evaluation corpus could be
considered medium-sized especially in comparison with the PAN
competition corpora [31–35, 38]. Furthermore, we did not
determine in the competition the retrieval techniques to use.
Nonetheless, to avoid being merely a lab method, it is important
for any plagiarism detection approach to deal with large sets of
documents by using appropriate retrieval techniques. Magooda et
al. in their three methods used the Lucene search engine and two
indexing approaches as shown in Table 2. Therefore, their
methods could be used with a large collection of source
documents, and could be adapted to be deployed online with a
commercial search engine, which is an obvious solution to adopt
if the source of plagiarism is the web as pointed out by Potthast et
al.[33].</p>
        <p>As for Alzahrani’s method, it is clear that it is not ready to be
employed if the web is the source of plagiarism for two reasons: i)
its retrieval model is not structured to be used with search engines.
(for example, there is no query formulation, see Table 2); and ii) it
is based on fingerprinting all the source documents, and entails an
exhaustive comparison between the n-grams of the suspicious
document and each source document, which is not workable if the
source of plagiarism is extremely large, as the web. Nonetheless,
her method could be feasible when the source of plagiarism is
local and not too large, as in the case of detecting plagiarism
between students’ assignments. Still, even with the intension to be
used offline, this method could possibly use retrieval techniques
based, for example, on inverted indexes instead of fingerprints
similarity to allow for the processing of a large number of
documents in reasonable time. Malcolm and Lane [25] discuss the
importance of scalability even for offline plagiarism detectors.</p>
      </sec>
      <sec id="sec-8-4">
        <title>4.2.2.2 Language Dependence</title>
        <p>Regarding this aspect, Magooda et al. reported the use of
twolanguage dependent processing in the source retrieval phase:
stemming queries before submitting them to the search engine and
extracting named entities. In the text alignment phase, words are
stemmed in the skip-gram approach. Moreover, their methods
preprocess the text by removing diacritics and normalizing letters15.
Alzahrani method is nearly language independent. The only
reported language-specific process was stop words removal. It
was applied as a pre-processing step on suspicious and source
documents.</p>
        <p>ٌةايحلاٌدنهللٌءارزوٌةسيئرٌلوأ ٌ"يدناغٌاريدنإ "ٌتشاع
اهتابلقتٌلكبٌةيسايسلا
ٌهايحلاٌدنٌهللٌءا ٌرَزٌوٌة ٌسَيئٌرٌلو اٌ"ىدناغٌاريدن ا"ٌت ٌشَا ٌعَ
اٌَهتاَبٌلقتٌلٌ كبٌٌةيٌسايٌسلا
Fig. 3. Two passages with the same words but the 2nd passage
contains some letters with diacritics (highlighted in green) and
a substitution of some interchangeable letters (highlighted in
yellow). A simple plagiarism detector may fail to match them.
Since the external plagiarism detection is a retrieval task, we think
that challenges of Arabic IR hold for Arabic plagiarism detection.
Arabic IR is challenging because the high inflection of Arabic and
the complexity of its morphology. Arabic stems are derived from
a set of a few thousand roots by fitting the roots into stem
templates. Stems can accept attached prefixes and suffixes that
include prepositions, determiners, and pronouns. Those are
sometimes obstacles to match similar texts [22]. Moreover, unlike
many other languages, Arabic writing includes diacritics that are
pronounced, but often not written. As opposed to the Latin
languages, the use of diacritics in Arabic is not restricted to some
letters, they could be rather placed on every letter. Indeed, in
Arabic IR, diacritics are typically removed [13, 17]. Another issue
that affects Arabic IR and consequently Arabic plagiarism
detection is the fact that Arabic has some letters that are
frequently used interchangeably such as: (ي , ى), (ا,ٌأ , إٌ,ٌآ) and (ه,ٌةٌ
) hence the need of a letter normalization pre-processing. If the
orthographic normalization (diacritics removal and letter
normalization) is not employed, a plagiarism detection system
may fail to match similar passages even if they have exactly the
same words. See Fig. 3 for an illustration.
4.3</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Evaluation</title>
      <sec id="sec-9-1">
        <title>4.3.1 Baseline</title>
        <p>We employed a simple baseline, which entails detecting common
chunks of word 5-grams between the suspicious documents and
the source documents and then merging the adjacent detected
chunks if the distance between them is smaller than 800
characters. Short passages (&lt; 100 characters) are then filtered out.
Since it is primarily based on matching n-grams, it should detect
mainly plagiarism cases that are not obfuscated.</p>
      </sec>
      <sec id="sec-9-2">
        <title>4.3.2 Measures</title>
        <p>The methods were evaluated using the character-based macro
precision and recall in addition to the granularity, and ranked
using the plagdet that combines these measures in one measure.
All these measures are computed using the set of the plagiarism
cases annotated in the corpus (the actual cases) and the set of the
plagiarism cases detected by the method (the detected cases).
The precision and recall count the proportion of the true positive
part in each detected and actual case respectively. An average of
these proportions is then computed. Their formulas are presented
in the equations 1 and 2 where S is the set of the actual plagiarism
cases and R is the set of the detected plagiarism cases.
A plagiarism detection method may generate overlapping or
multiple detections for a single plagiarism case. Thus, granularity
is used to average the number of the detected cases for each actual
case as depicted in the formula 3. is the set of the actual
cases that have been detected, and are the detected cases
that intersect with a given actual case s. It is clear that the optimal
value of the granularity is 1, and it means that for each actual case,
at most only one case has been detected (i.e. not many
overlapping or adjacent cases).</p>
        <p>To rank methods a combination of the three measures is applied in
the plagdet as expressed in the formula 4 where F1 is the
harmonic mean of precision and recall.
–
–
–
–</p>
        <sec id="sec-9-2-1">
          <title>Generate word 3-grams for both suspicious and source documents and compute Jaccard similarity between them.</title>
        </sec>
        <sec id="sec-9-2-2">
          <title>Keep the source document if Jaccard ≥ 0.1</title>
          <p>Keep the pair where passages are equivalent, else discard it if:
- passages length &lt; threshold
- the number of the words matches &lt; threshold</p>
        </sec>
        <sec id="sec-9-2-3">
          <title>Where :</title>
          <p>∈
∈
∈
∈
∈
See [37] for more information on plagiarism detection evaluation
measures. Table 4 provides the performance results of the
participants’ methods as well as the baseline on the test corpus.</p>
        </sec>
      </sec>
      <sec id="sec-9-3">
        <title>4.3.3 Overall Results</title>
        <p>As shown in Table 4, four methods outperform the baseline in
terms of the plagdet. In terms of precision, of the majority of
methods are good, but none of them performed better than the
baseline. Regarding the recall, the best three methods have
acceptable scores, but the rest of methods’ scores are more or less
close to the baseline. All the methods have a granularity of more
than 1.05, which is not a very good score in comparison with what
has been achieved by the state-of-the-art methods (see for
example PAN2014 competition results [34]).</p>
      </sec>
      <sec id="sec-9-4">
        <title>4.3.4 Detailed Results</title>
        <p>The goal of this section is to provide an in-depth look at the
behavior of methods. Table 5 presents the performance of
participants’ methods on the test corpus according to some
parameters namely cases length, type of plagiarism and
obfuscation.
(1)
(2)
(3)
(4)</p>
        <p>Interestingly, Table 5 reveals that the three methods of Magooda
et al. are the only ones that detect cases with word shuffling
obfuscation. This explains the low overall recall of Palkovskii
[29] and Alzahrani methods. It seems that the algorithm employed
to shuffle words generates cases that are difficult to detect by the
fingerprinting approach used in Alzahrani source retrieval phase.
Magooda_1 and Magooda_2 methods perform better than
Magooda_3 with respect to word shuffling cases. This is thanks to
the common words approach which is able to match similar
passages no matter the order of words. Regarding the impact of
the case length, all the methods perform better with medium
cases.</p>
        <p>All the methods achieved a very high recall in detecting cases
without obfuscation. Whereas the manual paraphrasing cases are
the most challenging to detect after the word shuffling cases.</p>
      </sec>
      <sec id="sec-9-5">
        <title>4.3.5 Analysis of the False Positive Cases</title>
        <p>Typically, it is easy to obtain a reasonable precision. This could be
observed in the majority of the results in Table 4. This behavior
was observed also in PAN shared task on plagiarism detection
[34]. Since Palkovskii_2 method is the least precise among all the
submitted methods, we have been keen to understand the
underlying reason behind its poor precision score. An examination
of its outputs revealed that around 60% of the utterly false positive
cases (cases whose precision is 0) stem from documents with
religious content. We went one step further and looked into the
text of these cases. It turned out that the phrase "ملسوٌهيلعٌالهٌىلص "
was the underlying seed of many false positive cases. This phrase,
which translates as "may Allah honor him and grant him peace", is
a commonly used expression in Arabic (written and even spoken)
after each mention of the prophet Muhammad. Another kind of
false positive cases that stem from religion-related texts, are
quotations from Quran and Hadith (sayings of the prophet
Muhammad). Some false positive cases in the Palkovskii_2 run
and even in the other methods’ runs belong to that kind. For
instance, Quranic verses represent 6% of the utterly false positive
cases in Magooda_2 run.</p>
        <p>The detected plagiarism case in the suspicious document
ٌةيفوصلاٌ تاعامجلاٌ كلذكوٌ ،ةنسلاٌ راصنأو ٌ ملسوٌ هيلعٌ ٌاله ىلص ٌ دمحم
ٌريثأتبٌةينيدٌةعزنٌهيدلٌتناكٌيذلاٌظفاحٌنكل ٌ...ةددعتملاٌاهفئاوطوٌاهقرطب
ٌيهو ٌ ملسوٌ هيلعٌٌالهىلص ٌدمحمٌانديسٌ بابشٌ ةعامجٌيفٌهتيغبٌدجوٌهتأشن
ٌ.8391ٌ ماعٌ ةاتفلاٌ رصموٌ نيملسملاٌ ناوخلإاٌ نعٌ نوقشنمٌ اهسسأٌ ةعامج
ٌظفاحٌ مضناٌ ناوخلإاٌ ةلظمٌ تحتٌ اوناكٌ كاذنآٌ ن ينيدتملاٌ ةيبلاغٌ نأٌ مغرو
ٌنورهجيٌ مهنأٌ اهئانبأٌ يفٌ ىريٌ ناكٌ هنلأٌ ؛ 8391ٌ ماعٌ دمحمٌ بابشٌ ةعامجل
ٌ،ركنملاٌنعٌنوهنيوٌفورعملابٌنورمأيٌمئلاٌةمولٌٌالهيفٌنوشخيٌلاوٌقحلاب</p>
        <p>ٌملسو هيلعٌٌالهىلص ٌي بنلاٌل وقبًٌلا مع ٌريمأوٌك لمٌن يبٌمهدنع ٌق رفٌلا
The detected plagiarism case in the source document
ٌنوكيٌنأٌلمتحي ٌ"ةعامجللٌقرافملاٌنيدلاٌنمٌقراملا "ٌ:ملسوٌهيلعٌٌالهىلص
ٌنأٌ ىلإٌ اذهٌ هيأرٌ يفٌ ةيميتٌ نباٌ دنتسيو ٌ .دترملاٌ لا ٌ ق يرطلاٌ عطاقٌ ب راحملا
ٌي ضرٌةشئاع ٌن ع ٌوحنلاٌاذهٌىلع ٌةرسَّفمٌت ءاجٌدقٌ،روكذملاٌث يدحللٌةياور
ٌنأ ٌ-اهنعٌٌالهيضر -ٌةشئاعٌنعٌهدنسبٌدوادٌوبأٌهاورٌامٌوهٌكلذوٌ،اهنعٌاله
ملسوٌهيلعٌٌالهىلص ٌٌالهلوس
رFig. 4. A detected plagiarism case by Palkovskii_2 method.</p>
        <p>It is obvious that this case has been detected because the
common phase "ملسو هيلع اله ىلص " ("may Allah honor him and
grant him peace") has been used as a seed. The extension step
produces a pair of passages that are not similar.</p>
        <p>It is an important feature for any plagiarism detection system to
not consider common phrases and quotations as plagiarism cases
unless they appear as a part of a larger plagiarism case. In Arabic
texts and notably in texts about religious topics, quotations from
Quran and Hadith are very common. Moreover, there are some
religious phrases that could be repeated many times in documents.
The expression "ملسوٌ هيلعٌ الهٌ ىلص " ("may Allah honor him and
grant him peace") is an example of such common phrases. In the
ExAra test corpus, it appears 185 times in the suspicious
documents and 171 times in the source documents. This increases
the risk of obtaining many short false positive cases. Still, this
issue could be addressed simply by filtering out the very short
detected cases. In the baseline method for example, we apply such
a filter and we obtain very high precision. The problem is that the
common religious phrase may appear many times even in the
same document. For example the expression "ملسوٌ هيلعٌ الهٌ ىلص "
("may Allah honor him and grant him peace") occurs 29 times in
the ‘suspicious-document0014’ and 52 times in
‘sourcedocument00223’. This increases not only the risk of obtaining
short false positive cases (of some few words) but also longer
cases when the adjacent seeds are merged in the extension step
(see Section 4.2.1). We observed many cases of this kind in
Palkivskii_2 method output. See Fig. 4 for an illustration.
Citing religious texts is common in Arabic writing. Moreover,
many of the Arab countries are incorporating religion in their
public schools curricula [14]. Therefore, we believe in the need to
have plagiarism detectors that are able to cope with the
characteristics of this kind of Arabic texts.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>5. INTRINSIC PLAGIARISM</title>
    </sec>
    <sec id="sec-11">
      <title>DETECTION SUB-TASK</title>
      <p>Only one participant submitted a run to this sub-task. Following,
we describe the corpus, the method and its evaluation.</p>
    </sec>
    <sec id="sec-12">
      <title>5.1 Corpus</title>
      <p>Sources of plagiarism are omitted in the intrinsic plagiarism
detection evaluation corpus. Thus, a plagiarism case in this corpus
is defined by its position and its length in the suspicious document
only. For AraPlagDet intrinsic plagiarism detection sub-task, we
used the InAra corpus [8] for the training phase. For the test
phase, we built another corpus which had similar characteristics to
the training one. Table 6 provides statistics on both training and
test corpora. As shown in this table, all the cases are without
obfuscation. This is because the goal is to evaluate the ability of
methods to detect the style shift, and obfuscating the plagiarism
cases may bear more difficulties to the task. Further information
on the creation of these corpora could be found in [9] and [8].</p>
      <sec id="sec-12-1">
        <title>5.2.1 Generic Process</title>
        <p>Most of intrinsic plagiarism detection methods in the literature
entail five main building blocks which are depicted in Fig. 5.</p>
        <p>These are inspired from the authorship verification approach [44]
and have not been changed in the past decade.
Case length
very short</p>
        <p>short
.753 .763 .693 .935 .978 .616 .493 .404 .747 .600 .679 .483 .431 .470 .548 .548 1.017 1.000 1.019 1.000 1.011 1.005 1.000 1.000 .741 .672 .677 .637 .594 .531 .519 .465
.862 .853 .807 .997 .998 .925 .647 .551 .850 .783 .818 .513 .505 .494 .554 .554 1.011 1.003 1.009 1.008 1.083 1.020 1.002 1.002 .850 .814 .807 .674 .634 .635 .596 .551
word shuffling
.890 .871 .890 .000 .000 .000 .000 .000 .657 .492 .657 .000 .000 .000 .000 .000 1.081 1.044 1.081</p>
        <sec id="sec-12-1-1">
          <title>Intrinsic plagiarism detection builiding blocks Pre-processing: cleaning the text from noisy information.</title>
        </sec>
        <sec id="sec-12-1-2">
          <title>Document chunking: segmenting the suspicious</title>
          <p>document to uniform units such us paragraphs, sentences,
or sliding window of N words or characters.</p>
          <p>Style features extraction: representing each chunk (in
some methods the whole document as well) as a vector of
features.</p>
          <p>Plagiarized fragments identification: using heuristics to
decide whether the chunk is plagiarized or not based on
its features.</p>
          <p>Post-processing: merging ajacent chunks or/and filtering
out some detected passages.</p>
        </sec>
      </sec>
      <sec id="sec-12-2">
        <title>5.2.2 Participant Method</title>
        <p>In this section, we describe the method of Mahgoub et al.[24],
which is the only participant in the intrinsic plagiarism detection
sub-task. Mahgoub et al. reported in their working notes that their
method is similar to the one proposed by Zechner et al.[46]. It is
based on computing the cosine distance between the Vector Space
Model (VSM) of the suspicious document and the VSM of each
chunk. Table 7 describes the method according to the generic
framework depicted in Fig. 5.</p>
      </sec>
      <sec id="sec-12-3">
        <title>5.2.3 Language Dependence</title>
        <p>It seems that features extraction is the most affected part by the
language of the processed document. Three features extracted in
Mahgoub et al. method are dependent to the language: it is
obvious that any language has its own approaches for POS
tagging and its own list of stop words. Moreover, Arabic, being a
right-to-left language, has some punctuation marks adapted to
that, such as the comma (،) and the question mark (؟).</p>
      </sec>
      <sec id="sec-12-4">
        <title>5.3.1 Baseline</title>
        <p>We used a method based on character n-gram classes as features
and naïve Bayes as a classification model. It is almost the same
method described in [11] but with some modifications in the
length of the sliding window in the segmentation strategy. This
method is language-independent, and it allows for obtaining
performance values comparable to the ones of the best intrinsic
plagiarism detection methods namely Oberreuter and Velásquez
[27] and Stamatatos [43] methods. The evaluation measures are
the same used for the external plagiarism detection (see section
4.3.2)</p>
      </sec>
      <sec id="sec-12-5">
        <title>5.3.2 Overall Results</title>
        <p>As shown in Table 8, Mahgoub et al.’s method performance is
lower than the baseline. This is in line with the performance of the
original method [46] that obtained a plagdet score of 0.177 on the
PAN09 corpus [38].</p>
      </sec>
      <sec id="sec-12-6">
        <title>5.3.3 Detailed Results</title>
        <p>Unlike the external approach, we think that the performance of the
intrinsic approach could be influenced by the document length and
the percentage of plagiarism it incorporates. Table 9 presents the
performance of Mahgoub et al. and the baseline methods on the
test corpus according to the aforementioned parameters in
addition to the case length. The segmentation strategy of the
baseline does not produce short chunks, and that is why the
precision is not computed in detected short cases. However, the
actual short cases are detected with high recall. For both methods,
the best performance is obtained in the medium cases, the short
documents and the documents with much plagiarism. Nonetheless,
since we have only two methods, we cannot generalize any
observed pattern.</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>6. CONCLUSION</title>
      <p>AraPlagDet is the first shared task on plagiarism detection on
Arabic texts. Participants were allowed to submit up to three runs
in both the external and intrinsic plagiarism detection sub-tasks
and a total of eight systems were finally submitted. In the external
plagiarism detection sub-task most of the submitted methods were
able to detect cases without obfuscation with a high performance.
The obfuscated cases were more or less challenging. This is
consistent with methods tested on PAN corpora [7]. As for the
intrinsic plagiarism detection, it is still a very challenging task.
We hope that the evaluation corpora we developed will help to
foster research on Arabic plagiarism detection from both
perspectives.
precision
granularity</p>
    </sec>
    <sec id="sec-14">
      <title>7. ACKNOWLEDGMENTS</title>
      <p>Many thanks to AraPlagDet participants for their devoted work on
developing and testing their methods on AraPlagDet corpora.
The work of the 3rd author was in the framework the
DIANAAPPLICATIONS-Finding Hidden Knowledge in Texts:
Applications (TIN2012-38603-C02-01) research project.
The research of the 4th author was carried out in the framework of
the grant provided by the Council for the Development of Social
Science Research in Africa (CODESRIA) Ref. SGRT. 38/T13.
recall
b
u
o
g
h
a
M
e
n
i
l
e
s
a</p>
      <p>B
1.000
1.000
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Abouenour</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bouzoubaa</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>On the evaluation and improvement of Arabic WordNet coverage and usability</article-title>
          .
          <source>Language Resources and Evaluation</source>
          .
          <volume>47</volume>
          ,
          <year>2013</year>
          (
          <year>2013</year>
          ),
          <fpage>891</fpage>
          -
          <lpage>917</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Akiva</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Identifying Distinct Components of a Multi-Author Document</article-title>
          .
          <source>European Intelligence and Security Informatics Conference (EISIC) August</source>
          <volume>22</volume>
          -24, Odense, Denmark (
          <year>2012</year>
          ),
          <fpage>205</fpage>
          -
          <lpage>209</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Al-Jundy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Plagiarism Detection Software in the Digital Eenvironment Available across the Web: an Evaluation Study (In Arabic)</article-title>
          .
          <source>International Journal of Library and Information Sciences. 1</source>
          ,
          <issue>2</issue>
          (
          <year>2014</year>
          ),
          <fpage>34</fpage>
          -
          <lpage>93</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Al-Sulaiti</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Atwell</surname>
            ,
            <given-names>E.S.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>The design of a corpus of Contemporary Arabic</article-title>
          .
          <source>International Journal of Corpus Linguistics</source>
          .
          <volume>11</volume>
          ,
          <issue>2</issue>
          (
          <year>2006</year>
          ),
          <fpage>135</fpage>
          -
          <lpage>171</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Alzahrani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Arabic Plagiarism Detection Using Word Correlation in N-Grams with K-Overlapping Approach Working Notes for PAN-AraPlagDet at FIRE 2015</article-title>
          .
          <article-title>Workshops Proceedings of the Seventh International Forum for Information Retrieval Evaluation (FIRE</article-title>
          <year>2015</year>
          ), Gandhinagar, India (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Alzahrani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Salim</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Plagiarism Detection In Arabic Scripts Using Fuzzy Information Retrieval</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>Proceedings of 2008 Student Conference on Research and Development (SCOReD</source>
          <year>2008</year>
          ),
          <fpage>26</fpage>
          -
          <lpage>27</lpage>
          Nov.
          <year>2008</year>
          , Johor, Malaysia (
          <year>2008</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vila</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>AntòniaMartí</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Plagiarism meets Paraphrasing : Insights for the Next Generation in Automatic Plagiarism Detection</article-title>
          .
          <source>Computational Linguistics</source>
          .
          <volume>39</volume>
          ,
          <issue>4</issue>
          (
          <year>2012</year>
          ),
          <fpage>917</fpage>
          -
          <lpage>947</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Bensalem</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Chikhi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>A New Corpus for the Evaluation of Arabic Intrinsic Plagiarism Detection</article-title>
          .
          <source>CLEF</source>
          <year>2013</year>
          ,
          <article-title>LNCS</article-title>
          , vol.
          <volume>8138</volume>
          (Heidelberg,
          <year>2013</year>
          ),
          <fpage>53</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Bensalem</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Chikhi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Building Arabic corpora from Wikisource</article-title>
          .
          <source>2013 ACS International Conference on Computer Systems and Applications (AICCSA)</source>
          , Fes/Ifran (May.
          <year>2013</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Bensalem</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Chikhi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2012</year>
          . Intrinsic Plagiarism Detection in Arabic Text :
          <source>Preliminary Experiments. 2nd Spanish Conference on Information Retrieval (CERI</source>
          <year>2012</year>
          )
          <article-title>(Valencia</article-title>
          , Spain,
          <year>2012</year>
          ),
          <fpage>325</fpage>
          -
          <lpage>329</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Bensalem</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Chikhi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Intrinsic Plagiarism Detection using N-gram Classes</article-title>
          .
          <source>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , Doha, Qatar, October
          <volume>25</volume>
          -
          <fpage>29</fpage>
          (
          <year>2014</year>
          ),
          <fpage>1459</fpage>
          -
          <lpage>1464</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Brooke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Hirst</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Paragraph Clustering for Intrinsic Plagiarism Detection using a Stylistic VectorSpace Model with Extrinsic Features - Notebook for PAN at CLEF 2012</article-title>
          .
          <article-title>CLEF 2012 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers,
          <volume>17</volume>
          -
          <fpage>20</fpage>
          September, Rome, Italy (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Darwish</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Magdy</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Arabic Information Retrieval</article-title>
          .
          <source>Foundations and Trends® in Information Retrieval. 7</source>
          ,
          <issue>4</issue>
          (
          <year>2013</year>
          ),
          <fpage>239</fpage>
          -
          <lpage>342</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Faour</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Religious Education and Pluralism in Egypt and Tunisia</article-title>
          . Carnegie Papers.
          <source>Carnegie [23] [24] [25] [26] [27] [28] Endowment for International Peace.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Farghaly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Shaalan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Arabic Natural Language Processing: Challenges and Solutions</article-title>
          .
          <source>ACM Transactions on Asian Language Information Processing (TALIP)</source>
          .
          <volume>8</volume>
          ,
          <issue>4</issue>
          (
          <year>2009</year>
          ),
          <volume>14</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          :
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Grozea</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gehl</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Popescu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <article-title>ENCOPLOT: Pairwise sequence matching in linear time applied to plagiarism detection</article-title>
          .
          <source>Proceedings of the SEPLN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 09)</source>
          (
          <year>2009</year>
          ),
          <fpage>10</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Habash</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Introduction to Arabic Natural Language processing</article-title>
          . Morgan &amp; Claypool.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Hosny</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Fatima</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Attitude of Students Towards Cheating</article-title>
          and Plagiarism: University Case Study.
          <source>Journal of Applied Sciences. 14</source>
          ,
          <issue>8</issue>
          (
          <year>2014</year>
          ),
          <fpage>748</fpage>
          -
          <lpage>757</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Hussein</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>A Plagiarism Detection System for Arabic Documents</article-title>
          .
          <source>Intelligent Systems</source>
          '
          <year>2014</year>
          .
          <string-name>
            <given-names>D.</given-names>
            <surname>Filev</surname>
          </string-name>
          , J.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Rutkowski</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Sgurev</surname>
            , E. Sotirova,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Szynkarczyk</surname>
          </string-name>
          , and S. Zadrozny, eds. Springer International Publishing.
          <volume>541</volume>
          -
          <fpage>552</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Jadalla</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Elnagar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>A Plagiarism Detection System for Arabic Text-Based Documents</article-title>
          .
          <source>PAISI</source>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          LNCS vol.
          <volume>7299</volume>
          (
          <year>2012</year>
          ),
          <fpage>145</fpage>
          -
          <lpage>153</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siddiqui</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mansoor Jambi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Imran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bagais</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Query Optimization in Arabic Plagiarism Detection : An Empirical Study</article-title>
          .
          <source>International Journal of Intelligent Systems and Applications. 7</source>
          ,
          <issue>1</issue>
          (Dec.
          <year>2015</year>
          ),
          <fpage>73</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Larkey</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Connell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>Light stemming for Arabic information retrieval</article-title>
          .
          <source>Arabic Computational Morphology</source>
          . Springer.
          <fpage>221</fpage>
          -
          <lpage>243</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Magooda</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahgoub</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rashwan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fayek</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Raafat</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>RDI System for Extrinsic Plagiarism Detection (RDI_RED) Working Notes for PAN-AraPlagDet at FIRE 2015</article-title>
          .
          <article-title>Workshops Proceedings of the Seventh International Forum for Information Retrieval Evaluation (FIRE</article-title>
          <year>2015</year>
          ), Gandhinagar, India (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Mahgoub</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Magooda</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rashwan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fayek</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Raafat</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>RDI System for Intrinsic Plagiarism Detection (RDI_RID) Working Notes for PAN-AraPlagDet at FIRE 2015</article-title>
          .
          <article-title>Workshops Proceedings of the Seventh International Forum for Information Retrieval Evaluation (FIRE</article-title>
          <year>2015</year>
          ), Gandhinagar, India (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Malcolm</surname>
            ,
            <given-names>J. a.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lane</surname>
            ,
            <given-names>P.C.R.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Tackling the PAN'09 external plagiarism detection corpus with a desktop plagiarism detector</article-title>
          .
          <source>Proceedings of the SEPLN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 09)</source>
          (
          <year>2009</year>
          ),
          <fpage>29</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Menai</surname>
            ,
            <given-names>M.E.B.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Detection of Plagiarism in Arabic Documents</article-title>
          .
          <source>International Journal of Information Technology and Computer Science</source>
          .
          <volume>10</volume>
          ,
          <string-name>
            <surname>September</surname>
          </string-name>
          (
          <year>2012</year>
          ),
          <fpage>80</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Oberreuter</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Velásquez</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style</article-title>
          .
          <source>Expert Systems with Applications</source>
          .
          <volume>40</volume>
          ,
          <issue>9</issue>
          (Jul.
          <year>2013</year>
          ),
          <fpage>3756</fpage>
          -
          <lpage>3763</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>Omar</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alkhatib</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Dashash</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>The Implementation of Plagiarism Detection System in Health Sciences Publications in Arabic and English Languages</article-title>
          .
          <source>International Review on Computers and Software (I.RE.CO.S.)</source>
          . 8,
          <string-name>
            <surname>April</surname>
          </string-name>
          (
          <year>2013</year>
          ),
          <fpage>915</fpage>
          -
          <lpage>919</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Palkovskii</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Submission to AraPlagDet PAN@FIRE2015 Shared Task on Arabic Plagiarism Detection</article-title>
          .
          <source>Workshops Proceedings of the Seventh International Forum for Information Retrieval Evaluation (FIRE</source>
          <year>2015</year>
          ), Gandhinagar, India.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Palkovskii</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Belov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2014</year>
          .
          <string-name>
            <surname>Developing HighResolution Universal Multi- Type N-Gram</surname>
            <given-names>Plagiarism</given-names>
          </string-name>
          <string-name>
            <surname>Detector</surname>
          </string-name>
          .
          <source>Working Notes Papers of the CLEF 2014 Evaluation Labs</source>
          (
          <year>2014</year>
          ),
          <fpage>984</fpage>
          -
          <lpage>989</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>Barrón-cedeño,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Eiselt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            and
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>2010</year>
          .
          <article-title>Overview of the 2nd International Competition on Plagiarism Detection. Notebook Papers of CLEF 2010 LABs and Workshops (Padua</article-title>
          , Italy,
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eiselt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Overview of the 3rd International Competition on Plagiarism Detection. Notebook Papers of CLEF 2011 LABs and Workshops</article-title>
          ,
          <source>September</source>
          <volume>19</volume>
          -
          <fpage>22</fpage>
          (Amsterdam, The Netherland, Sep.
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          2012.
          <article-title>Overview of the 4th International Competition on Plagiarism Detection</article-title>
          .
          <article-title>CLEF 2012 Evaluation Labs</article-title>
          and Workshop -Working Notes Papers,
          <volume>17</volume>
          -
          <fpage>20</fpage>
          September, Rome, Italy (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beyer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Busse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tippmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Overview of the 6th International Competition on Plagiarism Detection</article-title>
          .
          <source>Working Notes Papers of the CLEF 2014 Evaluation Labs</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tippmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <article-title>Overview of the 5th International Competition on Plagiarism Detection</article-title>
          .
          <article-title>CLEF 2013 Evaluation Labs</article-title>
          and Workshop -Working Notes Papers,
          <volume>23</volume>
          -
          <fpage>26</fpage>
          September, Valencia, Spain (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Völske</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <string-name>
            <given-names>Crowdsourcing</given-names>
            <surname>Interaction</surname>
          </string-name>
          <article-title>Logs to Understand Text Reuse from the Web. 51st Annual Meeting of the Association of Computational Linguistics (ACL</article-title>
          <year>2013</year>
          )
          <article-title>(</article-title>
          <year>2013</year>
          ),
          <fpage>1212</fpage>
          -
          <lpage>1221</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          2010.
          <article-title>An Evaluation Framework for Plagiarism Detection</article-title>
          .
          <source>Proceedings of the 23rd International Conference on Computational Linguistics (COLING</source>
          <year>2010</year>
          )
          <article-title>(Stroudsburg</article-title>
          , USA,
          <year>2010</year>
          ),
          <fpage>997</fpage>
          -
          <lpage>1005</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eiselt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Overview of the 1st International Competition on Plagiarism Detection</article-title>
          .
          <source>Proceedings of the SEPLN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 09)</source>
          (
          <year>2009</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <string-name>
            <surname>Sanchez-Perez</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gelbukh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <article-title>The winning approach to text alignment for text reuse detection at PAN 2014: Notebook for PAN at CLEF 2014</article-title>
          .
          <article-title>Working Notes Papers of the CLEF 2014 Evaluation Labs</article-title>
          .
          <volume>1180</volume>
          , (
          <year>2014</year>
          ),
          <fpage>1004</fpage>
          -
          <lpage>1011</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <string-name>
            <surname>Siddiqui</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Mansoor</given-names>
            <surname>Jambi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Omar Elhaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            and
            <surname>Bagais</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Developing an Arabic Plagiarism Detection Corpus</article-title>
          .
          <source>Computer Science &amp; Information Technology (CS &amp; IT)</source>
          .
          <volume>4</volume>
          , (
          <year>2014</year>
          ),
          <fpage>261</fpage>
          -
          <lpage>269</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <string-name>
            <surname>Soori</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prilepok</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Platos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berhan</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Snasel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Text Similarity Based on Data Compression in Arabic</article-title>
          .
          <source>AETA</source>
          <year>2013</year>
          :
          <article-title>Recent Advances in Electrical Engineering</article-title>
          and
          <string-name>
            <given-names>Related Sciences. I.</given-names>
            <surname>Zelinka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.H.</given-names>
            <surname>Duy</surname>
          </string-name>
          , and J. Cha, eds. Springer Berlin Heidelberg. 211-
          <fpage>220</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>A Survey of Modern Authorship Attribution Methods</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          .
          <volume>60</volume>
          ,
          <issue>3</issue>
          (
          <year>2009</year>
          ),
          <fpage>538</fpage>
          -
          <lpage>556</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Intrinsic Plagiarism Detection Using Character n-gram Profiles</article-title>
          .
          <source>Proceedings of the SEPLN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 09)</source>
          (
          <year>2009</year>
          ),
          <fpage>38</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lipka</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Intrinsic Plagiarism Analysis</article-title>
          .
          <source>Language Resources and Evaluation</source>
          .
          <volume>45</volume>
          ,
          <issue>1</issue>
          (Jan.
          <year>2011</year>
          ),
          <fpage>63</fpage>
          -
          <lpage>82</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and Meyer zu Eissen,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <year>2007</year>
          .
          <article-title>Intrinsic Plagiarism Analysis with Meta Learning</article-title>
          .
          <source>Proceedings of the SIGIR'07 International Workshop on Plagiarism Analysis</source>
          ,
          <string-name>
            <given-names>Authorship</given-names>
            <surname>Identification</surname>
          </string-name>
          , and
          <string-name>
            <surname>Near-Duplicate Detection</surname>
          </string-name>
          (PAN
          <year>2007</year>
          ), Amsterdam, Netherlands (Jul.
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          <string-name>
            <surname>Zechner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muhr</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kern</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Granitzer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          <string-name>
            <surname>External</surname>
          </string-name>
          and
          <article-title>Intrinsic Plagiarism Detection Using Vector Space Models</article-title>
          .
          <source>Proceedings of the SEPLN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 09)</source>
          (
          <year>2009</year>
          ),
          <fpage>47</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>