<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Academic Integrity revealed by Machine Learning Methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jennifer Landes</string-name>
          <email>Jennifer.Landes@hnu.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sonja Köppl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meike Klettke</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Academic Cheating, Online Exam, Machine Learning, Clustering, Empirical Evaluation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hochschule Neu-Ulm, Faculty of Business and Economics</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Regensburg University, Faculty of Computer Science and Data Science</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Academic integrity in higher education can be influenced by individual or by institutional factors. Cheating behavior undermines the academic integrity of the learning environment and can have negative consequences for both the individual student and the academic community. To understand the factors that influence the cheating behavior of students, a quantitative study was conducted, specifically focusing on the types of exams and assignments that are most susceptible to cheating. The collected data has been analysed with Machine Learning methods and the results have been visualised. This survey is a part of a dissertation project and the survey results will be used for an eye-tracking experiment to measure cheating behavior of students. Long-term aim is to develop online exam methods which are not susceptible to certain cheating methods.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1. Introduction
cation, and it is essential for students to maintain ethical
behavior and honesty in their academic pursuits. It can
be influenced by individual student characteristics or by
institutional factors [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. To underpin the importance
of academic integrity, Mccabe et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] discuss several
ifndings: integrity; cheating is prevalent and increasing;
college is a critical time for ethical development;
students face significant pressures to cheat; students are
being taught that cheating is acceptable; and the fact,
that today’s college students will become tomorrow’s
leaders. However, there has been a growing concern
regarding academic dishonesty among students, especially
during the Covid-19 semesters. During these courses,
which were mainly teached online, the suspicion grew,
that many students took advantage to cheat. Therefore,
there is a high necessary to look deeper in the factors,
which influence cheating and in the cheating behavior
in online exams. Academic misconduct among students
has been a persistent concern for educational institutions.
Cheating behavior undermines the academic integrity of
the learning environment and can have as well as
negative consequences for the individual student and for the
academic community.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Structure of the article.</title>
      <p>This paper presents in the
Chapter 2 a short insight in two related works, which
dealt with academic cheating. This will be followed by
Chapter 3, where the collected influence factors for
academic cheating will be presented and the data collection
following Chapter 4, includes the data analysis, first the
descriptive values of the study and the two clustering
methods K-menas and DBscan. In the last chapter present
a discussion and interpretation of both clustering results,
a study outlook and the study limitations.</p>
      <p>Aim of the work. In this paper, the issue of academic
misconduct will be analysed. To understand the
diferent factors that influence cheating behavior of students,
a quantitative study at Hochschule Neu-Ulm was
conducted, specifically focusing on the types of exams and
assignments most susceptible to cheating. The collected
data was first visualised and in a second step analysed
with Machine Learning methods. The analysis was
conducted by these steps: A descriptive analysis to reveal
statistical information of the dataset, a selection of the
dataset focusing on used cheating methods, a clustering
of selection with k-Means and DBSCAN, a matching of
clustering results to the complete dataset, a comparison
of both clustering results and finally the interpretation
of both results.</p>
      <sec id="sec-2-1">
        <title>2. Related</title>
      </sec>
      <sec id="sec-2-2">
        <title>Work</title>
        <p>
          A study by Janke et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] examined factors regarding
cheating behavior among students. The sudden shift to
online teaching and exams during the COVID-19
pandemic led to a rise in cheating rates. The study proposed
selective behavioral change, and the strong threat to
inCEUR
        </p>
        <p>ceur-ws.org
survey in Germany in November/December 2020,
reaching 3,005 students from all federal states and various
types of academic institutions. After reducing, the
survey included 1,608 students with diverse characteristics,
including gender, age, and academic background. The
results indicate that the majority of students had no prior
experience with online exams, and most of them
perceived online exams as less controllable and more prone
to cheating than traditional exams. However, the study
found no evidence of a general increase in academic
dishonesty, although the use of unauthorized aids during
online exams was more common than in traditional
exams. Overall, the study suggests that the shift to online
exams is not necessarily associated with a higher risk of
academic dishonesty, but it requires careful monitoring
and preventive measures to maintain academic integrity.</p>
        <p>
          Mccabe et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] conducted a large-scale study on
cheating in academic institutions over a fity-year period.
They found that most college-bound students are exposed
to cheating cultures during their high school years and
that more than two-thirds of college students engaged
in academic dishonesty in the previous year. Cheating is
prevalent in graduate and professional schools, with
varying levels in diferent fields. The authors also found that
there has been a shift in cheating-related attitudes and
definitions among students, and both individual and
contextual factors influence academic integrity and cheating
behavior. They suggest that a strong ethical
environment, fostered by factors such as peer disapproval and
a well-run honor code, can play a key role in reducing
cheating.
are the basic concept for the survey design [
          <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
          ].
        </p>
        <sec id="sec-2-2-1">
          <title>3.2. Survey design</title>
          <p>To examine individual as well as contextual factors,
which influence the cheating behavior, a student
survey was designed and conducted. The study involved
the creation of an online survey with Lime Survey that
captured information on both personal and academic
ac3. Data Collection tivities of the students. The survey also captured the
different cheating methods that students were aware of and
3.1. Methodology when they would apply them. The survey was distributed
Prior work shows, that the focus layed on measuring to all students of the Hochschule Neu-Ulm through an
the cheating amount in online exams. So, therefore it is email distribution list during the time of 06.12.2022 to
needed to examine the influence factors in detail. On the 02.01.2023. Additionally, the survey was also presented
one hand, the used cheating method as well as the task in four lectures of industrial engineering by Professor Dr.
method can be explored, which has a higher risk for cheat- Sonja Köppl to students from the first to fith semester of
ing. As literature reveals, academic misconduct can be their bachelor. The survey consisted of 42 questions
diinfluenced by a variety of factors, which can be classified vided into 5 groups, and it took approximately 12 minutes
as extrinsic and intrinsic motivation. Intrinsic motiva- to complete. The groups were divided as follows:
tion refers to subjective and individual factors stemming • Part A: General questions about the course of
from the student’s personality, including self-motivation, study
self-eficacy, job opportunities, and adaptive compara- • Part B: General questions about personal life
tive behavior. Extrinsic motivation refers to situational • Part C: Questions about exams
and organizational factors that afect the student from • Part D: Questions about cheating
outside, such as living conditions, family circumstances, • Part E: Demographic questions
friends or classmates, learning mechanisms, examination
form, course structure, instructor, and technical issues. Part A included questions about the course of study,
Sanctions can also have an impact on academic miscon- semester, and grade point average. The next section
exduct. Figure 1 depicts the main influence factors, which amined student satisfaction with their studies and the
university, personal motivation, and academic pressure. Part
B comprised questions on lecture preparation, leisure
activities, interests, part-time jobs, volunteer work, social
media behavior, family obligations, and religiosity.
Section C focused on online exam participation, equipment
requirements, and comparisons between face-to-face and
online exams in terms of comfort, fairness, and
performance. Part D of the questionnaire dealt with questions
about attitudes towards cheating, consequences of
cheating, known cheating methods, the influence of the
lecturer on cheating behavior, and the application of
cheating methods in exams and task types. The final section
E of the questionnaire collected demographic data such
as age, gender, and living arrangements.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>4. Data Analysis</title>
        <sec id="sec-2-3-1">
          <title>4.1. Descriptive Analysis</title>
          <p>The analysed demographic data of the students included
their course of study, semester, age, gender, and place
of residence. Most participants came from the course
Business Administration (19.21%), followed by
Industrial Engineering (15.68%), Business Psychology (13.72%),
Healthcare Management (13.33%), and Information
Management and Corporate Communication (11.37%).</p>
          <p>The students’ average age was 23.32 years, with a range
from 18 to 56 years old. Most participants were in their
4th semester and the average grade was 2.14. More
females (57.58%) than males (33.46%) completed the survey,
and most lived with their parents (43.2%).</p>
          <p>Regarding satisfaction (compare Figure 2), 77.73% were
satisfied with their studies, and 49.4% with the university.
22.83% felt high pressure to perform, 37.4% felt some
pressure, and 20.07% felt no pressure. 47.26% reported
feeling motivated in their studies.</p>
          <p>On average, participants spent 10.26 hours on hobbies
and sports. Meeting friends was the most popular hobby
(172 participants), followed by going to the gym (105),
reading (90), and going to a bar or club (89). Playing
poker (5), handball (4), and martial arts (4) were the least
popular hobbies.</p>
          <p>Tasks and Methods. An analysis of cheating methods
was made, the results reveal, that the five most commonly
used methods are cheating sheets, communication with
others, preparation of material, use of multiple devices
and translation programs. An additional analysis shows
the occurrence of cheating per task type and per exam
type for each cheating method (compare Figures 3 and
4).</p>
          <p>The digital exam forms are:
• Oral: An exam conducted through spoken
communication between the examiner and the
student on a video conference.</p>
          <p>(a) Living Habits
(b) Satisfaction with Studies, Academic Pressure and
Motivation
• Written: Students write their answers in a digital
format and upload it to a portal or send it to the
examiner.
• IT Pool: Students are all examined on computers
in an IT pool and have limited access to programs
and internet.
• Take Home Moodle Test: An exam administered
through the Moodle learning management
system, completed by students outside the classroom.
The test has to be completed in a limited time like
a real exam.
• Take Home Moodle Assignment: An assignment
given to students through Moodle to be
completed outside the classroom. The time space is
not limited to an exam time duration.</p>
          <p>And the task types are:
• Definition Task: A task that requires students to
provide the meaning or definition of a concept or
term.
• Transfer Task: A task that assesses the ability
of students to apply knowledge or skills learned
read from prepared texts during oral exams or refer to
notes during open-ended questions. The fourth method
is the use of multiple devices during the exam. Students
may use a second screen or another device to display
notes, definitions, or other materials during the exam.
This method is commonly used in take-home Moodle
exlation programs during the exam. This type of cheating
occurs in take-home exams, where students may use
online translation programs to translate questions and
provide answers in a diferent language. This method
is commonly used in open-ended questions. The results
strongly indicate, that digital exam formats have much
higher rates in cheating potential.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>4.2. Clustering</title>
          <p>To gain insights in the collected data, two clustering
methods were chosen to combine data and to identify
similar groups of patterns in student behavior.
Clustering is a method of unsupervised learning and involves
the use of an unlabeled dataset consisting of a collection
of examples {  }=1 . Here, each {  } represents a feature
vector, and the objective of an unsupervised learning
algorithm is to develop a model that can process a feature</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Implementation.</title>
      <p>
        A dataframe object was created
convector x and transform it into either another vector or a
taining only the cheating methods: analog cheat sheet,
value that can be employed to address a practical problem.
manipulated exam materials, displaying content on main
The developed model assigns each feature vector in the
or second screen, displaying content on other devices,
dataset an identification number for its respective cluster
virtual camera, audio signals in ear, faking technical
prob[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. K-means was chosen due to its widespread usage and
lems, reading prepared texts, translation programs,
comreputation as a simple and eficient clustering algorithm.
municating with other students, and copying solutions
Its popularity makes it an ideal choice for establishing
from others.
a benchmark and facilitating comparisons with other
Then, a principal component analysis was conducted
clustering methods. As a second method, DBSCAN was to reduce the dimensionality of this dataset. The
numselected as a density-based algorithm, ofering an
alternaber of principal components was determined using the
tive approach to centroid-based techniques like K-means. calculation of the ”explained variance ratio”. The
analThe aim was to investigate whether this density-based
ysis revealed that 6 principal components were needed
approach would yield notable distinctions in results and
to obtain suficient information for clustering. In this
capture clusters that may be overlooked by K-means. In
case, there are six principal components: the first
prink-means, the clusters are named in numerical order,
startcipal component explains 32.07% of the total variance,
ing from 0. This naming convention is used to distinguish
the second principal component 11.27%, the third
prinand identify individual clusters in the algorithm’s results. cipal component 8.46%, the fourth principal component
In DBSCAN, the clusters are named based on the
signifi7.96%, the fith principal component 6.53%, and the sixth
cance of cluster assignments. Outlier points, which do
principal component 5.75%.
not belong to any cluster, are often labeled as -1. The first
Based on this data, clustering with k-Means was
percluster is labeled as 0, and the second cluster is labeled as formed. The visualizations in 2D and 3D in Figure 5
1. This naming convention allows clear diferentiation
show three distinct clusters. During the clustering
proof outliers from actual clusters and provides a unique
cess, the value of the k parameter was manipulated to
identification for each cluster.
4.2.1. k-Means
The well-known k-Means clustering algorithm [7] forms
 clusters around centroids in a feature space whereby 
is a predefined input parameter. In each step the distance
of each data point to each centroid is calculated and the
function

∑
=1 xj∈ 
 =
      </p>
      <p>∑ |xj −   |2
is optimized whereby xj represents a data point and  
represents a centroid of the cluster   . After each step,
the cluster centers are updated until there are no further
changes (convergence of the algorithm). With that, the
algorithm forms  non-overlapping clusters. [8, 7, 9]
Data Preparation. The first step in the process
involved importing an Excel spreadsheet using Pandas. The
explore its efect on the resulting clusters. Various
visualizations were explored using diferent numbers of
clusters. Through an evaluation of the results, it was
observed that the grouping exhibited the highest eficacy
and meaningfulness when employing n=3 clusters. This
decision was made by considering both the
interpretability and distinctiveness of the resulting clusters. By opting
for three clusters, the visual representation depicted clear
boundaries and discernible patterns, facilitating a
comprehensive understanding of the underlying structure
present in the data.
# k − Means − M o d e l l i n g
km_model = KMeans ( n _ c l u s t e r s = 3 ,</p>
      <p>r a n d o m _ s t a t e = 4 2 ) . f i t ( p r i n c i p a l D f 1 )
s n s . r e l p l o t ( x = ” p r i n c i p a l ␣ component ␣
1 ” , y = ” p r i n c i p a l ␣ component ␣ 2 ” ,
hue = ” c l u s t e r ” , d a t a = c l u s t e r _ y )
variables were converted from binary responses (Yes/No) Interpretation.</p>
      <p>To analyse the clusters, a column with
to numerical values (0/1). The missing values for age, the respective cluster was appended to the original table.
semester, and great point average (GPA) were replaced
with their mean. Next, the missing values were filled
usAfter that, each cluster was filtered, and an individual
evaluation was made based on the mean values for each
ing the ”StandardScaler” method for data normalization. category in each cluster.</p>
      <p>Then, one hot encoding was performed on the
categorical variables (cheating attitude, major, gender, residence,
motivation, performance pressure, technical equipment,
preferred exam format, consequences of cheating,
satis</p>
      <p>Cluster 0 shows an increased tendency to cheat. In this
cluster, almost all means of the cheating methods used
are the highest. The cluster can be categorized as follows:
The average GPA is the highest at 2.08 compared to the
faction with studies, and interest in technology) to con- other two clusters, and the age of 22.98 indicates that
vert them into numerical data.</p>
      <p>this group is the youngest compared to the other clusters.
(a) 2D Cluster Visualisation
(b) 3D Cluster Visualisation</p>
      <p>Business Psychology. The participants do not have a high
technical interest, which could lead to a decrease in the
incentive to use and experiment with technical cheating
methods.
4.2.2. DBSCAN
DBSCAN is a density-based clustering algorithm. This
algorithm requires the definition of two hyperparameters,
 and  .  defines the radius of the neighborhood around
each data point and is used to associate the data points to
a cluster,  defines the minimum number of data points
of each cluster. The clustering process can be defined as
follows:
• Let  be the set of  data points, and let   be the</p>
      <p>-th data point.
• The neighbourhood of   within the radius  is
defined as:   (  ) = {  |(  ,   ) ≤ } , where
dist(  ,   ) is the distance between   and   .
• A core point is defined as a data point that has
at least n data points within its neighbourhood:
core point ∶   ∈  ∣ |  (  )| ≥  .
• A border point is a data point that is not a
core point but is within the neighbourhood of
a core point: border point ∶   ∈  ∣ ∃  ∈
 ,   is a core point and   ∈   (  ).
• A noise point is a data point that is neither a core
point nor a border point [10, 11].
Implementation. The algorithms does the same
preprocessing steps as the k-Means method. Then, the
DBSCAN model is initialized with an value for E of 2.2 and
The participants are predominantly male, have a high minimum number of samples of 15 to form a dense
retechnical interest, live in their own apartment or a shared gion (in the source code the variable eps is used for E and
lfat, are on average between the 3rd and 4th semester, and min_samples presents the value of n data points). The
study Digital Enterprise Management, Game-Production model is then applied to a standardized dataset, X-stand1,
Management, Information Management in Healthcare, and the resulting cluster labels are printed.
Business Informatics, or Industrial Engineering. Further- Next, the DBSCAN algorithm is applied to a dataset,
more, the evaluation shows that the average values for principalDf1, and the resulting clusters are visualized by
extensive and predominantly very time-consuming and a scatter plot with the principal components on the x and
active hobbies as well as time for voluntary and social y axes, and the clusters indicated by diferent colors (see
media activities are the highest. The participants are also Figure 6).
motivated and have high performance pressure, which The resulting cluster labels are converted into a Pandas
would increase the tendency to cheat. Regarding the Series and added to the original one-hot-encoded dataset.
exam format, the participants perceive online exams as The observations are then grouped by their cluster
numfairer and more pleasant than, for example, Cluster 1, bers and the mean values of each column in each cluster
which tends towards presence formats. Measures such as are calculated and printed to the console. Determining
failing the exam or being excluded from the exam would the means of the points in DBSCAN allows for the
repredeter cheating. sentation of a cluster by providing a central point that</p>
      <p>In a stark contrast to Cluster 0, it is evident that par- can describe or visually represent the cluster.
ticipants in Cluster 2 view cheating as unethical, are
religious, and prefer presence formats. The participants
predominantly study Business Administration, Digital
Medicine and Care Management, Physician Assistant, or
# I n i t M o d e l Enterprise Management, Information Management and
d b s c a n = DBSCAN ( e p s = 2 . 2 , Corporate Communications, Business Information
Sysm i n _ s a m p l e s = 1 5 ) tems, and Industrial Engineering programs.
d b s c a n . f i t ( p r i n c i p a l D f 1 ) Compared to Cluster 0, significant diferences are
ob# V i s u a l i z e t h e c l u s t e r s served in the following categories: there is no academic
p l t . f i g u r e ( f i g s i z e = ( 5 , 5 ) ) pressure in Cluster -1, and there is no preference for any
p r i n c i p a l D f 1 = p r i n c i p a l D f 1 . rename particular type of examination format. However, online
( c o l u m n s = { ” p r i n c i p a l ␣ component examinations are perceived as fairer. The consequence of
␣ 1 ” : ” PC1 ” , ” p r i n c i p a l ␣ component being expelled from university is a significant deterrent
␣ 2 ” : ” PC2 ” } ) against cheating.
s n s . s c a t t e r p l o t ( x= ” PC1 ” , y= ” PC2 ” , Cluster 0, on the other hand, comprises participants
d a t a = p r i n c i p a l D f 1 , hue = d b s c a n . with a negative attitude towards cheating, and the mean
l a b e l s _ , p a l e t t e = ” S e t 1 ” ) values for cheating methods are not as high as those in
Cluster -1. The hobbies of these participants include rock
climbing, basketball, and poker playing. The represented
Interpretation. The majority of male participants has study programs are Healthcare Business Administration
a high interest in technology and an increased likeli- and Data Science Management.
hood of using cheating methods found in Cluster -1. The Cluster 1 consists of participants, mostly female and
recheating methods employed by these participants include ligious, with a low interest in technology and the lowest
displaying content on the main screen, second screen, likelihood of employing cheating methods. The preferred
or other devices, using a virtual camera, receiving audio cheating method in this cluster is using an analogue cheat
signals in the ear, pretending technical problems, reading sheet. The study programs represented in this cluster
prepared texts, using translation programs, communicat- are Business Administration, Digital Medicine and Care
ing with other students, completely copying solutions, Management, and Game Production and Management.
having someone else take the exam, cheating on take- Hobbies include yoga, pilates, handball, and socializing
home exams and submissions, cheating on pool exams, with friends. Participants in this cluster are aware of
cheating on written Zoom exams. the consequences of cheating, such as being excluded</p>
      <p>Cluster -1 is characterized by a higher frequency of so- from the exam, failing the exam, and having to give an
cial media activities and hobbies such as football, tennis, oral explanation before the exam, which acts as a
deterdancing, yoga, fitness, martial arts, horse riding, jogging, rent against cheating. They perceive in-person exams as
chess, painting, cinema, and bars/clubs. Participants in fairer. These participants are highly motivated and feel
this cluster report the highest number of volunteer hours, significant academic pressure.
which is almost double the number reported by
participants in other clusters. The attitude towards cheating
in this cluster is generally permissive, with a tendency 5. Discussion and Results
to cheat. Participants are primarily enrolled in Digital
This paper aimed to identify and reveal factors that
influence academic misconduct based on relevant
literature by using clustering algorithms. A survey was
developed to obtain necessary information through a
quantitative study in multiple categories. 460 students
participated in the survey, of which 263 completed
the survey in its entirety. The results revealed that
cheating behavior among students is influenced by
various factors, including personal factors such as
working time, family situation, academic pressure or
organisational factors like the exam format. The analysis
was carried out through a descriptive analysis and two
diferent clustering methods k-Means and DBSCAN. The
clustering process involved clustering a dataset that
contained only cheating methods, and then assigning the
resulting groups to all categories. The clusters generated
by both methods exhibit significant similarities, but
there are also some diferences.
• Both analyses identify clusters with participants
who have a negative attitude towards cheating
(Cluster 1 in DBSCAN, Cluster 2 in k-Means).
• Both methods identify a group with a tendency
to cheat: Cluster 0 in k-Means is characterized by
predominantly male participants with a high
technical interest and a tendency to cheat, whereas
Cluster -1 in DBSCAN is characterized by male
participants with a high interest in technology
and a permissive attitude towards cheating.
• Online exams are perceived as fairer compared
to in-person exams by certain clusters (Cluster 0
in k-Means, and in both clusters in DBSCAN).
• Both analyses identify participants with high
aca</p>
      <p>demic pressure and motivation in their studies.</p>
      <p>Diferences
• The two clustering analyses identify diferent</p>
      <p>numbers of clusters and their characteristics.
• The cheating methods used by participants in the</p>
      <p>diferent clusters vary across the two analyses.
• Cluster 2 in k-Means is characterized by younger
male participants with a high technical interest
who have a tendency to cheat, whereas there is
no corresponding cluster in DBSCAN.
• The hobbies and study programs of participants
in the diferent clusters difer between the two
analyses.</p>
      <p>The study and analysis provided insights into the
factors that influence cheating behavior among students.</p>
      <p>The descriptive analysis revealed the prefered cheating
method or exam format, time spent for their private
hobbies, interests, living habits, working or volunteering
hours, motivation or academic pressure of the students.</p>
      <p>The results suggest that the exam format, academic
pressure and the perceived fairness are significant predictors
of cheating behavior. Students who reported high levels
of motivation and academic pressure were more likely to
engage in cheating behavior.</p>
      <p>For the cluster analysis in k-Means as well as DBSCAN
the information was selected based on the cheating
methods and then mapped to the complete data set. Both
clustering results reveal tendencies, that a high technical
interest and the online format influence a higher rate in
cheating. Furthermore, the clustering identified in both
methods a group of younger male students with a large
number of hobbies and social media hours which use
several cheating methods. It also showed, that the
participants with a lower cheating tendency have a ethical
attitude, prefer presence formats, are at a higher age and
are aware of the consequences, when they get caught in
exam cheating.</p>
      <p>This study has implications for educators and academic
institutions, highlighting the need to address academic
integrity issues and to create a culture of academic honesty.</p>
      <p>Further research is necessary to explore how academic
institutions can efectively address academic integrity
issues and promote ethical behavior among students.</p>
      <p>Future Work. This analysis is part of a PhD project,
which identifies and evaluates the attitude and habits of
students in regard to academic cheating. Further
analysis with other machine learning algorithms are planned.</p>
      <p>In a next step, a second data collection at Regensburg
University is planned to compare the data sets between
boths institutions. Furthermore an eye tracking study
will be conducted, to reveal patterns in eye moving while
students are cheating.</p>
      <p>Study Limitations. In this data analysis missing
values got replaced by their mean values. In the further
PhD thesis there will be used alternative strategies to
deal with missing values, like to use zero values or use
diferent case scenarios.
national Publishing, Cham, 2021, pp. 3–22. URL:
https://doi.org/10.1007/978-3-030-71270-9_1.
[7] A. K. Jain, R. C. Dubes, Algorithms for clustering</p>
      <p>data, Prentice-Hall, Inc., 1988.
[8] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko,</p>
      <p>R. Silverman, A. Wu, An eficient k-means
clustering algorithm: analysis and implementation, IEEE
Transactions on Pattern Analysis and Machine
Intelligence 24 (2002) 881–892. doi:10.1109/TPAMI.
2002.1017616, conference Name: IEEE
Transactions on Pattern Analysis and Machine Intelligence.
[9] K. P. Sinaga, M.-S. Yang, Unsupervised
K</p>
      <p>Means Clustering Algorithm, IEEE Access 8
(2020) 80716–80727. doi:10.1109/ACCESS.2020.</p>
      <p>2988796, conference Name: IEEE Access.
[10] J. Sander, M. Ester, H.-P. Kriegel, X. Xu,
Density</p>
      <p>Based Clustering in Spatial Databases: The
Algorithm GDBSCAN and Its Applications, Data Mining
and Knowledge Discovery 2 (1998) 169–194. URL:
https://doi.org/10.1023/A:1009745219419.
[11] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A
Density</p>
      <p>Based Algorithm for Discovering Clusters in Large</p>
      <p>Spatial Databases with Noise (1996).
Full descriptive analysis: https://www.dropbox.com/s/nevkfbxueqakcwn/Deskriptive%20Auswertung.pdf?dl=0
https://www.dropbox.com/s/bnksjy0r1x6upt0/Umfrage_484229_Untersuchung_von_Einflussfaktoren_auf_die_Studienleistung_bei_online_Prfungen.pdf?dl=0</p>
      <p>Full set of survey data: https://www.dropbox.com/s/8p6sz7gfg4rnykv/Umfragedaten.xlsx?dl=0
Category
Academic semester
Grade point average
Alter
Cheating methods
Exam format where
cheating occurred
Hobbies / Interest
Interest in Technology
Not interested at all
Working Hours
Volunteering
Volunteering hours
Time Social Media
Family situation
Fields of study
Gender
Motivation /
Pressure /
Satisfaction
Exam Format /
Technical Equipment</p>
      <p>Cluster 0
3,73529412
2,08294118
22,98
Manipulated exam materials, Display
content main, second screen, other devices,
Virtual camera, Audio signal in ear,
Pretending technical dificulties, Reading
prepared texts, Translation programs,
Communication with others, Copying
complete solutions, Someone else takes exam
Pool Exam
Tennis, Dancing, Yoga, Gym,
Pilates, Climbing, Martial arts,
Meeting Friends, Painting, Bar/Club,
Politics, Crafting, Riding
very interested
6,94117647
0,40625
3,08823529
12
Taking care of siblings
Digital Enterprise Management,
Game-Production Management,
Information Management in
Healthcare, Business
Informatics, Industrial Engineering
Male
Motivated, high pressure,
high satisfaction on studies
Technical equipment missing
or limited, participation in online exams
possible, comfortable with online exam
fairer with online exam
Living situation</p>
      <p>Own apartment, shared flat
Consequences
to deter cheating</p>
      <p>Failing the exam,
exclusion from exam
Attitude</p>
      <p>Attitude towards cheating
(cheating = yes)</p>
      <p>Figure 7: Selected DBSCAN mean values</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>I. Krumpal</surname>
          </string-name>
          , R. Berger (Eds.),
          <source>Devianz und Subkulturen: Theorien, Methoden und empirische Befunde</source>
          ,
          <source>Kriminalität und Gesellschaft</source>
          , Springer Fachmedien Wiesbaden, Wiesbaden,
          <year>2020</year>
          . URL: http://link.springer.com/10.1007/ 978-3-
          <fpage>658</fpage>
          -27228-9.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McCabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. D.</given-names>
            <surname>Butterfield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. K.</given-names>
            <surname>Treviño</surname>
          </string-name>
          ,
          <article-title>Cheating in college: why students do it and what educators can do about it</article-title>
          , The Johns Hopkins University Press, Baltimore,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Janke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Rudert</surname>
          </string-name>
          , Ä. Petersen,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Fritz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Daumiller</surname>
          </string-name>
          ,
          <article-title>Cheating in the wake of COVID-19: How dangerous is ad-hoc online testing for academic integrity?</article-title>
          ,
          <source>Computers and Education Open</source>
          <volume>2</volume>
          (
          <year>2021</year>
          )
          <article-title>100055</article-title>
          . URL: https://www.sciencedirect. com/science/article/pii/S2666557321000264.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hillebrecht</surname>
          </string-name>
          ,
          <article-title>Einflussfaktoren des Studienerfolgs im Vollzeit-Studium</article-title>
          ,
          <source>in: Studienerfolg von berufsbegleitend Studierenden</source>
          , Springer Fachmedien Wiesbaden, Wiesbaden,
          <year>2019</year>
          , pp.
          <fpage>77</fpage>
          -
          <lpage>124</lpage>
          . URL: http: //link.springer.com/10.1007/978-3-
          <fpage>658</fpage>
          -26164-
          <issue>1</issue>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Konegen-Grenier</surname>
          </string-name>
          ,
          <article-title>Studierfähigkeit und Hochschulzugang</article-title>
          .,
          <source>Kölner Texte &amp;[und] Thesen</source>
          . 61,
          <string-name>
            <surname>Deutscher</surname>
            <given-names>Instituts-Verl.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Köln</surname>
          </string-name>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. K.</given-names>
            <surname>Sehgal</surname>
          </string-name>
          ,
          <source>Machine Learning Concepts</source>
          , in: P. Gupta,
          <string-name>
            <surname>N. K.</surname>
          </string-name>
          Sehgal (Eds.),
          <article-title>Introduction to Machine Learning in the Cloud with Python: Concepts and Practices</article-title>
          , Springer Inter-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>