<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards Automated Student Programming Homework Plagiarism Detection</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Aleksejs</forename><surname>Grocevs</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Riga Technical University</orgName>
								<address>
									<settlement>Riga</settlement>
									<country key="LV">Latvia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Natālija</forename><surname>Prokofjeva</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Riga Technical University</orgName>
								<address>
									<settlement>Riga</settlement>
									<country key="LV">Latvia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards Automated Student Programming Homework Plagiarism Detection</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E698D2E87F657A13CC14990EC8F9ED1E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T23:33+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>plagiarism detection, automation</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In the emerging world of information technologies, a growing number of students is choosing this specialization for their education. Therefore, the number of homework and laboratory research assignments that should be tested is also growing. The majority of these tasks is based on the necessity to implement some algorithm as a small program. This article discusses the possible solutions to the problem of automated testing of programming laboratory research assignments.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The course "Algorithmization and Programming of Solutions" is offered to all the first-year students of The Faculty of Computer Science and Information Technology (~500 students) in Riga Technical University and it provides the students the basics of the algorithmization of computing processes and the technology of program design using Java programming language (the given course and the University will be considered as an example of the implementation of the automated testing). During the course eight laboratory research assignments are planned, where the student has to develop an algorithm, create a program and submit it to the education portal of the University. The VBA test program was designed as one of the solutions, the requirements for each laboratory assignment were determined and the special tests have been created. At some point, however, the VBA offered options were no longer able to meet the requirements, therefore the activities on identifying the requirements for the automation of the whole cycle of programming work reception, testing and evaluation have begun <ref type="bibr" target="#b0">[Gro16]</ref>.</p><p>At the moment all of the assessment checking, test executing and running as well as plagiarism check/percentage evaluation is done manually. It is obvious that an ordinary human cannot keep the source code of five hundred similar programs in mind, nor is he able to measure their similarity. That is why similar researches exist that also suggest automated testing implementation [Poze15], <ref type="bibr">[Hasen13]</ref>.</p><p>In order to aid the manual testing the academic department of the University has developed a VBA script that automatically executes and checks the assessments for a compiled program; the test results are recorded in an Excel file. This improvement has allowed to speed up the evaluation process making it possible for the professor to focus more on the source code, which is a more important than an actual test run.</p><p>Some authors [Cheng11], [Thieb15] are offering to use the automated verification of test runs and a plagiarism check as a separate Moodle module, which implements many previously defined criteria at the time the assessment is uploaded to Moodle. However, none of these researches provide the complete solution to this problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Choosing the Metrics and Evaluating Implementation Possibilities</head><p>In order to get a grade, the student has to implement the required algorithm as a program, submit the program in a binary (compiled) form as well as providing its source code. The professor has to test both the program using a subset of pre-calculated input/output data pairs, and its source code equivalence to the binary representation, including the compilation ability and the absence of errors while executing it. It is imperative to evaluate the factors affecting the whole process and to choose the metrics for intercomparison.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Identified Metrics</head><p>Algorithm implementation validation speed is criteria that reflects how fast can the professor obtain the program and its source code, execute and test it using the predefined test patterns. In the University the student has to upload the result of his work to the "Homework" section of the Moodle education portal, where the professor should run and evaluate it after downloading. Evaluation quality -even in case of simple tasks, such as "implementing a list sorting algorithm" there might be many border cases which can lead to incorrect execution results, but sometimes may not be checked due to time limitation. When using the automated testing solutions, it is possible to pre-create the large number of different tests that check all the aspects of the performance of the given assessment.</p><p>Report preparation -it is important to put the data of successful/unsuccessful test runs together and to inform the student of the result. When using on-line solutions there is a possibility to send a test-passed-successfully notification as soon as an assessment has been uploaded and tested. It is also important to record the summary of all students' assessments in form of a table that can later be used to be uploaded to the University education portal.</p><p>Plagiarism check -even if the tasks are relatively similar it is crucial to make sure the source code was written by the student himself and not plagiarized from a colleague. This check should be made once by a professor in order to avoid the code consecutive altering so it can successfully pass the check.</p><p>Safety check -the binary representation of the program and its source code are usually tested separately in order to save time, and this is most commonly done in that particular order. Although, it is not always possible to quickly detect the malicious code, which is meant to erase or alter the test results or even the testing system itself, when the solution consists of several files or modules. Therefore, while testing the compiled programs they should be treated as potential malware or viruses that may damage the test environment. Ideally they should be executed in the sandbox environment which ensures the isolation of the potential threat.</p><p>Bug fix tracking -if the code has been partially altered (either on the professor's request or during the debugging process) the modified parts of the file are indistinguishable from the previous code in the file. Therefore, the professor has to manually compare two versions of the file in order to detect these changes, or even look through the entire source code file. This problem can be solved by using version control system (VCS). The Moodle system itself does not provide neither an option nor plugin for that, so this possibility can only be considered in the context of implementing it in on-site solution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Automated Detection Solution Development</head><p>While creating an on-site solution using the third-party developer tools, it is possible to meet all the abovementioned needs. We propose to use Gogs as a Git-repositorythe storage for the source code of all programming assessments; Jenkins as a Continuous Integration server; Docker as a sandbox-environment and Apache Solr or other full-text search system as a plagiarism checker. Many [Cosma12], [Poze15] have approached the problem of detecting the plagiarism in the source code, and some authors [Kiku15] are proposing the solutions that are consistent with the University requirements and can be integrated into the suggested infrastructure. Our suggested infrastructure involves the following workflow:</p><p>1. The students codes the task in the program code and uploads/updates it using Git; 2. Jenkins checks all the students' repositories once per minute, sends a plagiarism check request to Sorl whenever a new commit appears, creates a separate sandboxenvironment, compiles a program within it and runs the tests. If the tests are passed -Moodle API is used to mark the student's assessment as successfully completed and to upload the source code his behalf. If the tests are not passed or the harmful code was found -the student is notified by an e-mail and has to go back to step one. The Docker sandbox container is being automatically removed either when the tests are completed or by timeout. 3. After the assessment submission deadline, the professor sets all Git-repositories to read-only mode, runs a Jenkins-plugin, which sends a plagiarism check request for each of the assessments to Solr and records the additional data on plagiarism score for every task's source code to Moodle. 4. The only action left for the professor is to check the source codes or to compare the latest source code version with the previous using the built-in Gogs tools.</p><p>Such an approach provides the level of checking speed and quality as well as a report preparation level similar to that of the Moodle plugin. It is possible to use outer systems for plagiarism checks since Jenkins API allows to connect the various external services.</p><p>Safety check is at a high level due to the isolated container (sandbox) use, so the risk of system infestation by the malicious code and other destructive actions is minimized.</p><p>Tracking of error correction is a standard feature of the Gogs-repository which facilitates the comparison of file versions and change-tracking.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Automated vs. Manual Verification</head><p>The speed and the quality of manual check of each assessment will be definitely lower compared with the automated check. The numerical representation of absolute comparison results is impossible in that case due to average human concentration and performance abilities of each individual. Report preparation is done manually, therefore both the error rate and the time consumed will be considerably higher when compared to an automated check.</p><p>Plagiarism check in general will be less accurate with a large amount of assessments, however human perception makes detecting such cases reflexively or by using additional environment information possible (the plagiarism rate is higher if the students are friends). Furthermore, biased perception and evaluation of the work are also possible due to the human factor.</p><p>Safety check depends on the set of rules enforced by the University. These rules define how to verify the programming assessments and how to run the executable files. It is most likely that a professor will primarily check the compiled program in order to return it for correction in case of a program failing the tests. This saves him the time for compiling the program from its source code.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Plagiarism Detection Interference</head><p>What kind of refactoring, code change or cheating manipulations should the ideal tool detect, how many differences in the code may be considered accidental similarity and when will the code be suspiciously identical? It is difficult to draw a line under this question, multiple studies [Mann06], <ref type="bibr" target="#b8">[Ji07]</ref> suggest that ~25%-70% similarity should be considered normal variation (regardless of the code change types involved) and plagiarism itself is source code reproduction with a small number of routine transformations, i.e., without core logic changes and without deep knowledge or understanding of execution flow. The most modern IDEs provide a wide variety of refactoring approaches to change the code structure without affecting the "business logic" -external behavior of the code. However, opposite to a general intention -to reduce code complexity, students can use refactoring to obscure and hide code similarities.</p><p>It is obvious that a person, who is not familiar with program development or programming language at all, but knows how to use refactoring tools and interpret IDE warnings/errors, can modify a source code that way that it will be undetected most of the time by most of the available tools. For example, IntelliJ IDEA, one of the most powerful Integrated Development Environments available on the market, provide wide variety of the refactoring tools for multiple supported languages, e.g. Java, PHP, Python, JavaScript, Kotlin and others. To aid development for programmers (and, unfortunately to make plagiarism masking easier) IDEA refactoring possibilities can cheat almost all source code plagiarism detection tools. Hereafter we are planning to investigate the integration with anti-plagiarism systems and the available implementations in this area in details, as well as to create an on-site solution for integrating it into the educative process. Additionally, to address this issue, our team will continue the research on this problem -further research will be directed towards the next program execution phase, bytecode and instruction execution flow analysis <ref type="bibr" target="#b9">[Gro17]</ref>. We believe that this approach will eliminate the need of source code comparison and filter out all kind of refactoring in a natural way, using built-in code inlining and other compiler-specific optimizations. In this case, we can compare the final execution flow using abstract syntax trees, bytecode meta-model generation and other low-level techniques.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. IntelliJ IDEA first-level refactoring menu</figDesc><graphic coords="5,229.07,201.98,138.20,290.95" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Proposed plagiarism detection flow</figDesc><graphic coords="6,186.10,147.98,235.42,118.50" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The capabilities of automated functional testing of programming assignments</title>
		<author>
			<persName><forename type="first">Aleksejs</forename><surname>Grocevs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Natālija</forename><surname>Prokofjeva</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procedia-Social and Behavioral Sciences</title>
		<imprint>
			<biblScope unit="volume">228</biblScope>
			<biblScope unit="page" from="457" to="461" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Introduction of the automated assessment of homework assignments in a university-level programming course</title>
		<author>
			<persName><forename type="first">M</forename><surname>Poze15] Pozenel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Furst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Mahnicc</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">formation and Communication Technology, Electronics and Microelectronics (MIPRO)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2015-05">2015. May. 2015</date>
			<biblScope unit="page" from="761" to="766" />
		</imprint>
	</monogr>
	<note>38th International Convention on</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Paperless subjective programming assignment assessment: a first step</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Hasen13] Hansen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Computing Sciences in Colleges</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="116" to="122" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">nExaminer: A semi-automated computer programming assignment assessment framework for Moodle</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Cheng11 ; Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Monahan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mooney</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Automatic evaluation of computer programs using Moodle&apos;s virtual programming lab (VPL) plug-in</title>
		<author>
			<persName><forename type="first">D</forename><surname>Thieb15 ; Thiébaut</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Computing Sciences in Colleges</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="145" to="151" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">An approach to source-code plagiarism detection and investigation using latent semantic analysis</title>
		<author>
			<persName><forename type="first">G</forename><surname>Cosma12 ; Cosma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on</title>
		<imprint>
			<biblScope unit="volume">61</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="379" to="394" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
	<note>Computers</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A Source Code Plagiarism Detecting Method Using Sequence Alignment with Abstract Syntax Tree Elements</title>
		<author>
			<persName><forename type="first">H</forename><surname>Kiku15] Kikuchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Goto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wakatsuki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Nishino</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Software Innovation (IJSI)</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="41" to="56" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Similarity and originality in code: plagiarism and normal variation in student assignments</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mann06 ; Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Frew</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">InProceedings of the 8th Australasian Conference on Computing Education</title>
				<imprint>
			<publisher>Australian Computer Society, Inc</publisher>
			<date type="published" when="2006-01-01">2006 Jan 1</date>
			<biblScope unit="volume">52</biblScope>
			<biblScope unit="page" from="143" to="150" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Understanding the evolution process of program source for investigating software authorship and plagiarism</title>
		<author>
			<persName><forename type="first">J</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Woo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Cho</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICDIM&apos;07. 2nd International Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2007-10-28">2007. 2007 Oct 28</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="98" to="103" />
		</imprint>
	</monogr>
	<note>InDigital Information Management</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Modern programming assignment verification, testing and plagiarism detection approaches</title>
		<author>
			<persName><forename type="first">Aleksejs</forename><surname>Grocevs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Natālija</forename><surname>Prokofjeva</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
