Automatic evaluation of existing plagiarism detection tools
Siwar Nadhri, Maryam Elamine and Lamia Hadrich Belguith

University of Sfax, MIRACL Research Laboratory, Sfax, Tunisia


                 Abstract
                 The vast expansion of data over the Internet, as well as the ease with which people may access
                 it, has resulted in several issues, including authorship attribution, copyrights, plagiarism, etc.
                 Indeed, plagiarism is an increasing problem among various domains mainly in journalism,
                 politics, academia, etc. Plagiarism is the act of attributing to oneself the work of another
                 without citing the original source. Consequently, plagiarism detection tools are emerging.
                 Nevertheless, the choice of the most effective tool remains a serious matter for the users. Thus,
                 in this paper, we present our proposed method for automatically evaluating plagiarism
                 detection systems. Hence, we tested three existing tools: WCopyfind, Compare It and Compare
                 Suite and observed their behavior on a French and English corpora. The preliminary results
                 indicate the superiority of Compare Suite, accuracy wise, and of Compare It in execution time.
                 We also remarked the wide difference in the comportment of the tools using French and
                 English corpora.

                 Keywords 1
                 Extrinsic plagiarism detection, plagiarism detection tools, automatic evaluation, software
                 testing.

1. Introduction
   The growth of the media has made it possible to obtain a large amount of data [8, 4]. In fact,
information technology has evolved rapidly in the last decade [3]. The easy availability of the Internet
poses a danger to information integrity and every data can be plagiarized [7]. Plagiarism is unethical
and a serious offence [1]. It comprises a danger to the instructive cycle since understudies might get
acknowledgment for another person’s work or complete courses without really accomplishing the ideal
learning results [11]. Actually, plagiarists use different methods to shroud their illegal activities, like
revising parts of the copied text, changing a few words with their equivalents, and so forth [13].
Recognizing plagiarism is an everlasting concern inside universities, and recent years have witnessed
surprising advances in plagiarism detection tools [5]. Anti-plagiarism tools, also known as text-
matching tools, are expected to use state-of-the-art methods to detect plagiarism. Current systems are
rather great at identifying copy/paste cases. Nevertheless, with the variety of forms of plagiarism
ranging from a simple reformulation to a complex level of obfuscation including translation, the
capability of these tools is always put to question [6, 11]. The incapacity of anti-plagiarism tools can be
worrying, especially, since a modern research reported that 70% of students have confessed to
plagiarizing, with about half being guilty of an earnest cheating offence on a written assignment [2].
   Our aim in this paper is to test the capability of three free existing tools, namely: WCopyfind2 ,
Compare It 3and Compare Suite4. The choice of these three systems is because they are open access,
free and function on an offline database unlike other free tools that only search the Internet for possible
plagiarism cases. We tested them on a French corpus that we created from simple copy/paste cases and

Tunisian Algerian Conference on Applied Computing (TACC 2021), December 18–20, 2021, Tabarka, Tunisia
EMAIL : siwar.nadhri1@gmail.com (S. Nadhri); mary.elamine@gmail.com (M. Elamine); lamia.belguith@fsegs.usf.tn (L. H. Belguith)

              ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

              CEUR Workshop Proceedings (CEUR-WS.org)
2
  https://plagiarism.bloomfieldmedia.com/software/wcopyfind/
3
  https://www.grigsoft.com/wincmp3.htms
4
  https://comparesuite.com/
an English corpus provided by the PAN@CLEF5 competition containing cases of obfuscation ranging
from simple to complex levels. Our experiments show that Compare It is the fastest of the three systems.
Compare Suite is the slowest, but it is the most efficient. All three programs have a serious encoding
problem with French language. This inspired us to experiment another language (English). Despite the
fact that the English corpus contains various levels of obfuscation, preliminary results prove that the
three plagiarism detection tools functioned better using the before mentioned corpora, rather than the
French corpus, which was mostly comprised of copy/paste cases.
    The remainder of this paper is organized as follows. Section 2 presents a literature review. Section
3 describes our proposed method followed by the presentation of our experiments and results in Section
4. Finally, Section 5 gives some concluding remarks followed by future work directions.


2. Related Works
    Since the turn of the century, not only has the subject of plagiarism been highly appreciated, but also
has text-matching software, which is utilized to discover suspected plagiarized passages in manuscripts.
Many scientific publications have discussed text matching software solutions in terms of classification,
comparative research, overview, and comparison.
    In their work, [11] have defined two comparison criteria for the evaluation of 15 tools tested using
documents in eight languages. The first criterion is the coverage of the tool, that ranges on a scale of 0
(worst) to 5 (best). It comprises four main requirements, which are: language comparison (i.e., the
languages covered by the tools), types of plagiarism sources (Wikipedia extracts, open-access papers,
student theses, and online documents), plagiarism forms (copy-paste, synonym replacement, manual
paraphrase, translation) and plagiarism detection based on a single-source or multi-source documents.
The second criterion is the usability of the tool, i.e., collecting the viewpoint of end-users; it is based
on a sum of points (0, 0.5 or 1 point). After many experiments, the authors concluded that the tools’
usability performance is superior to their coverage performance because they do not detect all text
similarity and suffer from false positives.
    Shkodkina and Pacauskas [12] compared three plagiarism detection systems. They tested the tools
based on a set of criteria and features, in the academic context in Ukraine. The compared tools are
Unicheck6 , eTXT7 and Turnitin8 . The authors chose these systems since they are available in Ukraine.
The authors suggested four criteria, where each of them has a set of features: (1) affordability, (2)
material support, (3) functionality, and (4) showcasing. The authors enumerated some assets and
handicaps of each program and concluded that eTXT is more appropriate for personal use, while either
Turnitin or Unicheck are more suitable for institutional use. Specifically, Unicheck appears to be one
of the most appropriate and efficient systems for Ukrainian universities.
    In their investigation, [10] completed a comparative analysis of five systems using the same eight
articles in two test series. The first test comprised articles that had not been altered; the second test
included articles that had been manually modified by rearranging terms in the text. The percentage of
plagiarism discovered and the time spent by the systems checking the articles were the main focus of
their investigation. Then, the authors employed a multi-criteria decision-making for choosing the best
system. However, they did not give a clear indication of the comparison purpose or how much
plagiarism was discovered by the systems. They also looked at usability through the lens of a criterion
called ”additional support”, which included the ability to alter content directly on the website and
multilingual checking.
    In his study, [9] tested a variety of plagiarism detection tools. The author categorized them into free
and non-free systems. He compared the most popular tools from each category. For the first category,
the author concluded that the features of free plagiarism detection tools range from one another,
therefore, it is best to try them all in order to choose the ideal one for individual necessities. As for the


5
  https://pan.webis.de/
6
  https://unicheck.com/
7
  https://www.etxt.biz/?lang=en
8
  https://www.turnitin.com/
second category, according to the author, the best thorough plagiarism detection tool is iThenticate 9 .
In any case, it is likewise the most costly for individual users. Nevertheless, universities, research
focuses and associations can manage the cost of the significant expense of the program.
    In addition to scientific research, some anti-plagiarism tools conduct, in a quest for self-evaluation,
an investigation on the performance of the existing plagiarism detection systems. In 2019, Scribbr10 , a
paid anti-plagiarism software, conducted a study to compare its performance against that of other
systems. The comparative analysis involved 10 plagiarism detection tools (paid and free). The study
included two forms of plagiarism: direct (by using a 100% plagiarized document with extracts from
magazines, books and Internet sites) and dispersed (by using a real document with original paragraphs
and 50% plagiarized segments). Table 1 presents the best anti-plagiarism software for 2019 according
to the evaluation conducted by Scribbr11 .

Table 1
The top 10 anti-plagiarism software of 2019
                            Identified plagiarism                       Identified plagiarism
  Plagiarism detection
                            for a 50% plagiarized                      for a 100% plagiarized   Overall Accuracy
           tool
                                  document                                    document
          Scribbr                    44 %                                        75 %
        Ephorus12                    23 %                                        61 %
        Quetext13                    29 %                                        53 %
                   14
      Compilatio                     28 %                                        51 %
         BibMe15                     19 %                                        57 %
                  16
        Plagscan                     17 %                                        58 %
      Plagramme17                    16 %                                        61 %
      Grammarly18                    0%                                          24 %
                     19
     Smallseotools                   5%                                          28 %
      SE Reports20                   4%                                          34 %


3. Proposed Method
   In this section, we present our proposed method for evaluating plagiarism detection tools. Our
proposed method comprises five main steps namely: Document analysis, output unification, output
tagging, post-processing and tools evaluation process (Figure 1).


9
  https://www.ithenticate.com/
10
   https://www.scribbr.fr/logiciel-anti-plagiat/
11
   https://www.scribbr.fr/le-plagiat/meilleur-logiciel-anti-plagiat/
12
   https://www.ephorus.com/
13
   https://www.quetext.com/
14
   https://www.compilatio.net/
15
   https://www.easybib.com/grammar-and-plagiarism/
16
   https://www.plagscan.com/fr/
17
   https://www.plagramme.com/
18
   https://www.grammarly.com/plagiarism-checker
19
   https://smallseotools.com/plagiarism-checker/
20
   https://searchenginereports.net/plagiarism-checker
Figure 1: Main steps of our proposed method


3.1.         Corpus creation
    In our work, we aspire to, automatically, be able to distinguish the best tool to identify plagiarism
cases in academia. Indeed, we decided to work in French because it is the language in which the majority
of student reports are written at our universities. Thus, we created a French corpus inspired by PAN-
PC-09 corpora21. We gathered a set of 200 documents from the site ”Thèses.fr22”. Although the
documents are all in French, they are multi-genres (Economics, Computer Science, Physics, etc.). The
collected documents are all in PDF, since we cannot work directly with this file format, we converted
all documents into TXT format. Then, after close inspection, we removed a set of encrypted documents
(Figure 2). Consequently, in final, we have 140 documents in our corpus. In fact, for the purpose of our
experiments, we created a set of 30 plagiarized documents from the collection of 140 documents. These
documents were the fruit of copy/paste cases. No reformulation or obfuscation was done. In fact, our
corpus comprises an average of 75608 words per document. As for the created Fake documents, they
contain, approximately, between 10000 and 20000 words each. We also created, for each Fake
document, an XML file comprising a thorough description of the source of plagiarism as well as the
start and the end of plagiarism. Figure 3 presents an example of an XML description of a plagiarized
document from our corpus.


Figure 2: Example of an encrypted document


21
     https://webis.de/data/pan-pc-09.html
22
     http://www.theses.fr/
Figure 3: Example of an XML description for a plagiarized document from our corpus


3.2.    Document Analysis
   In this step, we analyzed the suspect document with the tools we aim to compare: WCopyfind,
Compare It and Compare Suite. Each system generated an HTML output that we will be using in further
steps. For the first tool, the output consists of an HTML file comprising the chosen settings, the
source(s) of plagiarism and the similarity rate. Any plagiarism is colored in red or black, otherwise
(Figure 4).


Figure 4: Output of WCopyfind

   For the second tool, the HTML report contains the analysis’ statistic as well as a colored Side-by-
Side comparison: the left side refers to the source document, whereas the right is for the suspect
document. The black-colored segments point the plagiarized parts (i.e., these parts are for sure
plagiarized). Each color, otherwise, is considered non-plagiarism. The red color indicates that these
parts are unique. Green is for the parts belonging only to the source. Blue implies that some minor
changes to the text have been identified (Figure 5).
Figure 5: Output of Compare It

   Finally, for the third tool, the HTML file comprises the analysis’ statistics, the chosen options for
the comparison and the colored plagiarism report. The white background designates the presence of
plagiarism, a colored one indicates otherwise. The blue background implies that the designated parts
are not plagiarized, red stands for parts that belong only to the source document, whereas the green
background indicates that a few changes to the text have been detected (Figure6).


Figure 6: Output of Compare Suite


3.3.    Output unification
    As previously mentioned, each system has its own output for the same suspect document. In order
to proceed our evaluation, we need to clean the output of each tool. For WCopyfind, the HTML code
of the file is compressed in one line. Using the <br> tag, we split the lines to obtain a structured
document. For Compare It, the output contains a Side-by-Side structure: one part indicating the
plagiarized document and another for the source. We eliminated the parts referring to the source
document, thus keeping only the description of the plagiarized one. As for Compare Suite, we
discovered that some of the lines were divided (i.e., a tagged line is written on two or more lines before
the end of the tag), therefore, we reorganized the document in such a way that, each line is enclosed in
its appropriate opening and closing tag. Then, we removed the parts of the file referring to the source
document, in this case the parts with the red background.
3.4.    Output Tagging
   Following the obtained documents from the unification step, we tagged each output produced from
the plagiarism detection algorithms. For the parts identified as plagiarized by the tools, we added the
tag <tag_pl>, otherwise we add the tag <tag_npl>. For WCopyfind, we searched for the part in the
HTML code indicating the presence of a red font and encased it in the tags ” <tag_pl> ... <\tag_pl>” to
indicate that this part is plagiarized. We enclosed the remaining parts in the tags ” <tag_npl> ...
<\tag_npl>”. For Compare It, in accordance to the «classes» present in the code, we tagged the
document correspondingly. Finally, for Compare Suite, following the HTML tags indicating the
background of the text, we added our tags (either plagiarized or not).


3.5.    Post-processing
We had a lot of encoding issues because we were working with a French corpus. Actually, each of the
tools examined produced a lot of noise, including several unusual and encrypted characters such as:
”ST, PU2, x92, etc.”. Given the nature of the obtained outputs, we cleaned the noise as much as possible.
Some of the characters, however, persisted in the documents, requiring a manual intervention; these
cases were kept, since our aim is to implement an automatic evaluation. In addition, as part of the post-
processing, we also eliminated some of the unnecessary tags such as : <html>, <head>, <body>, etc.
As a result, we obtained for each system, a file containing only parts labeled with either <tag_pl> (for
the plagiarized parts) or <tag_npl> otherwise. Hereafter, we present in Figure 7 some examples of the
encrypted characters. Figures 8 and 9 illustrate an example of system output before and after post-
processing, respectively.


Figure 7: Examples of some encrypted characters


Figure 8: Example of output before post-processing


Figure 9: Example of output after post-processing


3.6.    Tools evaluation process
   In this step, we created an evaluation file tagged accordingly to the plagiarism identified with each
of the tools. Thus, we obtained a single document cluttered with tags. Therefore, we performed a
reduction to the added tags, i.e., instead of having a collection of consecutive tags, we reduced them in
accordance of whether there exists plagiarism cases or not. In what follows, we give an illustrative
example: if in a given line, we have <pl_sys3><pl_sys2><pl_sys1> (this indicates that this part is
identified as plagiarized by the three tools), we reduced them in one tag <pl_comm>. In case we have
these tags <pl_sys2><pl_sys1>, we reduced them to <pl_sys2_sys1>; this indicates that the first tool
”sys1” and second tool ”sys2” identified this part as plagiarized, whereas the third system did not
identify it. Figures 10 and 11 show an example of a file before and after tag reduction.


Figure 10: Example of an evaluation file after adding the tags in accordance to each tool


Figure 11: Example of an evaluation file after tag reduction


4. Experiments and Results
   In this section, we present our experiments and we give our remarks considering the evaluation of
the tools. The purpose of our work was to compare existing plagiarism detection tools, and to be able
to observe the behavior of each tool. In fact, in our assessment, we created an evaluation document
displaying the agreement and disagreement of the experimented tools. Figure 12 gives an example of
an evaluation document. In fact, after tagging our documents and creating the evaluation file, we colored
each area of the document based on the new tags, using the original document as a reference (the given
suspect document). For instance, if all tools have identified the same part as plagiarized, the latter will
be colored in Red. If only the first and second tool have identified a part as plagiarized, it will be colored
in Pink, etc.


Figure 12: Final output for the evaluation of a suspect document with the plagiarism detection tools

    As we previously mentioned, we faced many obstacles with the French corpus. This encouraged us
to test the plagiarism detection tools on a different language. Consequently, we chose to work with the
English language and we experimented the PAN-PC-09 corpus23 (PAN Plagiarism Corpus 2009), which
is a collection of more than 28000 documents: 14429 source documents (source of plagiarism),
collected from Project Gutenberg24. A set of 14428 suspicious documents in which artificial plagiarism
has been automatically inserted. The plagiarism cases have been constructed using a so-called random
plagiarist, i.e., a computer program which constructs plagiarism according to a number of random
variables. The variables include the percentage of plagiarism in the whole corpus, the percentage of
plagiarism per document, the length of a single plagiarized section and the degree of obfuscation per
section. In our work, we experimented 110 suspicious documents, 55 of which included instances of
plagiarism. The others are designated as plagiarism suspects; however, they do not contain any
plagiarism.
    As we carried out our experiments, we discovered that, although the English corpus comprises
different levels of obfuscation, the three plagiarism detection tools performed better on this corpus
rather than with the French corpus. Nevertheless, all three systems have issues with paraphrased
segments. Table 2 presents the evaluation of a plagiarized document with the compared tools.

Table 2
Evaluation of the tested tools
                                           WCopyfind             Compare It             Compare Suite
              License                         Free                 Free                Free for 30 days
                                    English, French, Italian,
     Supported Languages                                        No information         No information
                                     Dutch, German, etc.
         Execution time                 ̴ 53.48 seconds         ̴ 50.4 seconds          ̴ 548.8 seconds
            Accuracy                        10.862 %               84.493 %                 96.356 %

    All three tools are free, with the exception of Compare Suite, which is only free for 30 days.
WCopyfind is capable of analyzing documents from different languages: English, Italian, French,
Dutch, German, etc. As for the other tools, we have no clear information on the supported languages.
The fastest tool is Compare It with a response time, approximately equal to, 50.4 seconds. It is worth
noting that each tool generates a similarity rate for the identified plagiarism. Since we have the statistics
of the plagiarized documents in our XML description, we are able to compute the accuracy of each tool.
It is true that Compare Suite is the slowest of the three tools. However, it is the most efficient in
identifying plagiarism cases with an accuracy of 96.356%.


5. Conclusion
    In this paper, we presented our method for the automatic evaluation of existing plagiarism detection
tools. In our present work, we focused mainly on French and English documents. With French, we faced
an encoding problem, which affected the accuracy of the tools. Indeed, although our French corpus
comprises mainly cases of word-for-word (copy/paste) plagiarism, the tools were not able to identify
some of the plagiarized parts. Despite the fact that with English, the documents contain cases of
obfuscation, the tools performed better while using it. However, we noticed that they are feeble in
identifying paraphrases. It is worth noting that, for the majority of research works, the tools are tested
by a multitude of users. The final decision is influenced by the reports given by them. However, in our
work, given that we have the output of the anti-plagiarism system, we are able to automatically compare
it and evaluate its performance. As future works, we aim to consider other languages (different corpora),
and more tools, mainly free, the focus of our evaluation will be systems that are, if possible, both online
(searches the Web for plagiarism cases) and offline (searches a set of given documents).


23
     https://zenodo.org/record/3250083#.YUrXIrhKjIW
24
     https://www.gutenberg.org/
6. References
[1] A. Nair, A. Nair, G. Nair, P. Prabhu and S. Kulkarni: Semantic Plagiarism Detection System for
     English Texts, International Research Journal of Engineering and Technology (IRJET), 7,5, e-
     ISSN: 2395-0056, p-ISSN: 2395-0072, 2020.
[2] A. Patil and N. Bomanwar: Survey on Different Plagiarism Detection Tools and Software’s,
     International Journal of Computer Science and Information Technologies (IJCSIT), Vol. 7 (5), pp.
     2191-2193, 2016.
[3] A. Pratomo, A. Irawan and M. Risa: Similarity detection design using Winnowing Algorithm as
     an effort to apply green computing, Journal of Physics: Conference Series, doi:10.1088/1742-
     6596/1450/1/012065, 2020.
[4] D. Sakamoto and K. Tsuda: A Detection Method for Plagiarism Reports of Students, in: 23rd
     International Conference on Knowledge-Based and Intelligent Information & Engineering
     Systems, Procedia Computer Science, 159 (2019), 1329–1338.
[5] I. Ben Salem, P. Rosso and S. Chikhi: On the use of character N-grams as the only intrinsic
     evidence of plagiarism, Language Resources and Evaluation 53(3), pp. 363–396, 2019.
[6] K. Vani and D. Gupta: Study on Extrinsic Text Plagiarism Detection Techniques and Tools.
     Journal of Engineering Science and Technology, 9(4), 150–164. doi:10.25103/jestr.094.23, 2016.
[7] M. Elamine, F. Bougares, S. Mechti and L. Belguith: Extrinsic plagiarism detection for French
     language with word embeddings, in: 19th International Conference on Intelligent Systems Design
     and Applications (ISDA), 2019.
[8] M. Elamine, S. Mechti and L. Hadrich Belguith: Hybrid plagiarism detection method for French
     language, International Journal of Hybrid Intelligent Systems, vol. 16, no. 3, pp. 163-175,
     September 2020.
[9] M. N. Nahas: Survey and Comparison between Plagiarism Detection Tools, American Journal of
     Data     Mining      and    Knowledge       Discovery,      vol.    2(2),     pp.    50-53,     doi:
     10.11648/j.ajdmkd.20170202.12, 2017.
[10] Š. Křížková, H. Tomášková and M. Gavalec: Preference comparison for plagiarism detection
     systems. In O. Cordón (Ed.). Proceedings of the IEEE International Conference on Fuzzy Systems
     (FUZZ-IEEE), Vancouver, Canada. 1760–1767. doi: 10.1109/FUZZ-IEEE.2016.7737903, 2016.
[11] T. Foltynek, D. Dlabolova, J. Mudra, D. Weber-Wulff, A. Anohina-Naumeca, L. Kamzola, S.
     Kleanthous, S. Razi, J. Kravjar and J. G. Dib: Testing of support tools for plagiarism detection, In
     proceedings of 5th international conference, Plagiarism across europe and beyond, 2020.
[12] Y. Shkodkina and D. Pacauskas: Comparative Analysis of Plagiarism Detection Systems. Business
     Ethics and Leadership, 1(3), 27–35. doi: 10.21272/bel.1(3), pp. 27-35, 2017.
[13] Z. Iqbal, S. Murtaza and H. Ayub: Handling Illusive Text In Document To Improve Accuracy Of
     Plagiarism Detection Algorithm, Preprint DOI: 10.31219/osf.io/hq2j8, 2020.