-

1613-0073

Academic Integrity revealed by Machine Learning Methods

Jennifer Landes

Jennifer.Landes@hnu.de 0

Sonja Köppl

Meike Klettke

Academic Cheating, Online Exam, Machine Learning, Clustering, Empirical Evaluation

0 Hochschule Neu-Ulm, Faculty of Business and Economics 1 Regensburg University, Faculty of Computer Science and Data Science

Academic integrity in higher education can be influenced by individual or by institutional factors. Cheating behavior undermines the academic integrity of the learning environment and can have negative consequences for both the individual student and the academic community. To understand the factors that influence the cheating behavior of students, a quantitative study was conducted, specifically focusing on the types of exams and assignments that are most susceptible to cheating. The collected data has been analysed with Machine Learning methods and the results have been visualised. This survey is a part of a dissertation project and the survey results will be used for an eye-tracking experiment to measure cheating behavior of students. Long-term aim is to develop online exam methods which are not susceptible to certain cheating methods.

1. Introduction cation, and it is essential for students to maintain ethical behavior and honesty in their academic pursuits. It can be influenced by individual student characteristics or by institutional factors [ 1 ]. To underpin the importance of academic integrity, Mccabe et al. [ 2 ] discuss several ifndings: integrity; cheating is prevalent and increasing; college is a critical time for ethical development; students face significant pressures to cheat; students are being taught that cheating is acceptable; and the fact, that today’s college students will become tomorrow’s leaders. However, there has been a growing concern regarding academic dishonesty among students, especially during the Covid-19 semesters. During these courses, which were mainly teached online, the suspicion grew, that many students took advantage to cheat. Therefore, there is a high necessary to look deeper in the factors, which influence cheating and in the cheating behavior in online exams. Academic misconduct among students has been a persistent concern for educational institutions. Cheating behavior undermines the academic integrity of the learning environment and can have as well as negative consequences for the individual student and for the academic community.

Structure of the article.

This paper presents in the Chapter 2 a short insight in two related works, which dealt with academic cheating. This will be followed by Chapter 3, where the collected influence factors for academic cheating will be presented and the data collection following Chapter 4, includes the data analysis, first the descriptive values of the study and the two clustering methods K-menas and DBscan. In the last chapter present a discussion and interpretation of both clustering results, a study outlook and the study limitations.

Aim of the work. In this paper, the issue of academic misconduct will be analysed. To understand the diferent factors that influence cheating behavior of students, a quantitative study at Hochschule Neu-Ulm was conducted, specifically focusing on the types of exams and assignments most susceptible to cheating. The collected data was first visualised and in a second step analysed with Machine Learning methods. The analysis was conducted by these steps: A descriptive analysis to reveal statistical information of the dataset, a selection of the dataset focusing on used cheating methods, a clustering of selection with k-Means and DBSCAN, a matching of clustering results to the complete dataset, a comparison of both clustering results and finally the interpretation of both results.

2. Related Work

A study by Janke et al. [ 3 ] examined factors regarding cheating behavior among students. The sudden shift to online teaching and exams during the COVID-19 pandemic led to a rise in cheating rates. The study proposed selective behavioral change, and the strong threat to inCEUR

ceur-ws.org survey in Germany in November/December 2020, reaching 3,005 students from all federal states and various types of academic institutions. After reducing, the survey included 1,608 students with diverse characteristics, including gender, age, and academic background. The results indicate that the majority of students had no prior experience with online exams, and most of them perceived online exams as less controllable and more prone to cheating than traditional exams. However, the study found no evidence of a general increase in academic dishonesty, although the use of unauthorized aids during online exams was more common than in traditional exams. Overall, the study suggests that the shift to online exams is not necessarily associated with a higher risk of academic dishonesty, but it requires careful monitoring and preventive measures to maintain academic integrity.

Mccabe et al. [ 2 ] conducted a large-scale study on cheating in academic institutions over a fity-year period. They found that most college-bound students are exposed to cheating cultures during their high school years and that more than two-thirds of college students engaged in academic dishonesty in the previous year. Cheating is prevalent in graduate and professional schools, with varying levels in diferent fields. The authors also found that there has been a shift in cheating-related attitudes and definitions among students, and both individual and contextual factors influence academic integrity and cheating behavior. They suggest that a strong ethical environment, fostered by factors such as peer disapproval and a well-run honor code, can play a key role in reducing cheating. are the basic concept for the survey design [ 4, 5 ].

3.2. Survey design

To examine individual as well as contextual factors, which influence the cheating behavior, a student survey was designed and conducted. The study involved the creation of an online survey with Lime Survey that captured information on both personal and academic ac3. Data Collection tivities of the students. The survey also captured the different cheating methods that students were aware of and 3.1. Methodology when they would apply them. The survey was distributed Prior work shows, that the focus layed on measuring to all students of the Hochschule Neu-Ulm through an the cheating amount in online exams. So, therefore it is email distribution list during the time of 06.12.2022 to needed to examine the influence factors in detail. On the 02.01.2023. Additionally, the survey was also presented one hand, the used cheating method as well as the task in four lectures of industrial engineering by Professor Dr. method can be explored, which has a higher risk for cheat- Sonja Köppl to students from the first to fith semester of ing. As literature reveals, academic misconduct can be their bachelor. The survey consisted of 42 questions diinfluenced by a variety of factors, which can be classified vided into 5 groups, and it took approximately 12 minutes as extrinsic and intrinsic motivation. Intrinsic motiva- to complete. The groups were divided as follows: tion refers to subjective and individual factors stemming • Part A: General questions about the course of from the student’s personality, including self-motivation, study self-eficacy, job opportunities, and adaptive compara- • Part B: General questions about personal life tive behavior. Extrinsic motivation refers to situational • Part C: Questions about exams and organizational factors that afect the student from • Part D: Questions about cheating outside, such as living conditions, family circumstances, • Part E: Demographic questions friends or classmates, learning mechanisms, examination form, course structure, instructor, and technical issues. Part A included questions about the course of study, Sanctions can also have an impact on academic miscon- semester, and grade point average. The next section exduct. Figure 1 depicts the main influence factors, which amined student satisfaction with their studies and the university, personal motivation, and academic pressure. Part B comprised questions on lecture preparation, leisure activities, interests, part-time jobs, volunteer work, social media behavior, family obligations, and religiosity. Section C focused on online exam participation, equipment requirements, and comparisons between face-to-face and online exams in terms of comfort, fairness, and performance. Part D of the questionnaire dealt with questions about attitudes towards cheating, consequences of cheating, known cheating methods, the influence of the lecturer on cheating behavior, and the application of cheating methods in exams and task types. The final section E of the questionnaire collected demographic data such as age, gender, and living arrangements.

4. Data Analysis 4.1. Descriptive Analysis

The analysed demographic data of the students included their course of study, semester, age, gender, and place of residence. Most participants came from the course Business Administration (19.21%), followed by Industrial Engineering (15.68%), Business Psychology (13.72%), Healthcare Management (13.33%), and Information Management and Corporate Communication (11.37%).

The students’ average age was 23.32 years, with a range from 18 to 56 years old. Most participants were in their 4th semester and the average grade was 2.14. More females (57.58%) than males (33.46%) completed the survey, and most lived with their parents (43.2%).

Regarding satisfaction (compare Figure 2), 77.73% were satisfied with their studies, and 49.4% with the university. 22.83% felt high pressure to perform, 37.4% felt some pressure, and 20.07% felt no pressure. 47.26% reported feeling motivated in their studies.

On average, participants spent 10.26 hours on hobbies and sports. Meeting friends was the most popular hobby (172 participants), followed by going to the gym (105), reading (90), and going to a bar or club (89). Playing poker (5), handball (4), and martial arts (4) were the least popular hobbies.

Tasks and Methods. An analysis of cheating methods was made, the results reveal, that the five most commonly used methods are cheating sheets, communication with others, preparation of material, use of multiple devices and translation programs. An additional analysis shows the occurrence of cheating per task type and per exam type for each cheating method (compare Figures 3 and 4).

The digital exam forms are: • Oral: An exam conducted through spoken communication between the examiner and the student on a video conference.

(a) Living Habits (b) Satisfaction with Studies, Academic Pressure and Motivation • Written: Students write their answers in a digital format and upload it to a portal or send it to the examiner. • IT Pool: Students are all examined on computers in an IT pool and have limited access to programs and internet. • Take Home Moodle Test: An exam administered through the Moodle learning management system, completed by students outside the classroom. The test has to be completed in a limited time like a real exam. • Take Home Moodle Assignment: An assignment given to students through Moodle to be completed outside the classroom. The time space is not limited to an exam time duration.

And the task types are: • Definition Task: A task that requires students to provide the meaning or definition of a concept or term. • Transfer Task: A task that assesses the ability of students to apply knowledge or skills learned read from prepared texts during oral exams or refer to notes during open-ended questions. The fourth method is the use of multiple devices during the exam. Students may use a second screen or another device to display notes, definitions, or other materials during the exam. This method is commonly used in take-home Moodle exlation programs during the exam. This type of cheating occurs in take-home exams, where students may use online translation programs to translate questions and provide answers in a diferent language. This method is commonly used in open-ended questions. The results strongly indicate, that digital exam formats have much higher rates in cheating potential.

4.2. Clustering

To gain insights in the collected data, two clustering methods were chosen to combine data and to identify similar groups of patterns in student behavior. Clustering is a method of unsupervised learning and involves the use of an unlabeled dataset consisting of a collection of examples { }=1 . Here, each { } represents a feature vector, and the objective of an unsupervised learning algorithm is to develop a model that can process a feature

Implementation.

A dataframe object was created convector x and transform it into either another vector or a taining only the cheating methods: analog cheat sheet, value that can be employed to address a practical problem. manipulated exam materials, displaying content on main The developed model assigns each feature vector in the or second screen, displaying content on other devices, dataset an identification number for its respective cluster virtual camera, audio signals in ear, faking technical prob[ 6 ]. K-means was chosen due to its widespread usage and lems, reading prepared texts, translation programs, comreputation as a simple and eficient clustering algorithm. municating with other students, and copying solutions Its popularity makes it an ideal choice for establishing from others. a benchmark and facilitating comparisons with other Then, a principal component analysis was conducted clustering methods. As a second method, DBSCAN was to reduce the dimensionality of this dataset. The numselected as a density-based algorithm, ofering an alternaber of principal components was determined using the tive approach to centroid-based techniques like K-means. calculation of the ”explained variance ratio”. The analThe aim was to investigate whether this density-based ysis revealed that 6 principal components were needed approach would yield notable distinctions in results and to obtain suficient information for clustering. In this capture clusters that may be overlooked by K-means. In case, there are six principal components: the first prink-means, the clusters are named in numerical order, startcipal component explains 32.07% of the total variance, ing from 0. This naming convention is used to distinguish the second principal component 11.27%, the third prinand identify individual clusters in the algorithm’s results. cipal component 8.46%, the fourth principal component In DBSCAN, the clusters are named based on the signifi7.96%, the fith principal component 6.53%, and the sixth cance of cluster assignments. Outlier points, which do principal component 5.75%. not belong to any cluster, are often labeled as -1. The first Based on this data, clustering with k-Means was percluster is labeled as 0, and the second cluster is labeled as formed. The visualizations in 2D and 3D in Figure 5 1. This naming convention allows clear diferentiation show three distinct clusters. During the clustering proof outliers from actual clusters and provides a unique cess, the value of the k parameter was manipulated to identification for each cluster. 4.2.1. k-Means The well-known k-Means clustering algorithm [7] forms clusters around centroids in a feature space whereby is a predefined input parameter. In each step the distance of each data point to each centroid is calculated and the function ∑ =1 xj∈ =

∑ |xj − |2 is optimized whereby xj represents a data point and represents a centroid of the cluster . After each step, the cluster centers are updated until there are no further changes (convergence of the algorithm). With that, the algorithm forms non-overlapping clusters. [8, 7, 9] Data Preparation. The first step in the process involved importing an Excel spreadsheet using Pandas. The explore its efect on the resulting clusters. Various visualizations were explored using diferent numbers of clusters. Through an evaluation of the results, it was observed that the grouping exhibited the highest eficacy and meaningfulness when employing n=3 clusters. This decision was made by considering both the interpretability and distinctiveness of the resulting clusters. By opting for three clusters, the visual representation depicted clear boundaries and discernible patterns, facilitating a comprehensive understanding of the underlying structure present in the data. # k − Means − M o d e l l i n g km_model = KMeans ( n _ c l u s t e r s = 3 ,

r a n d o m _ s t a t e = 4 2 ) . f i t ( p r i n c i p a l D f 1 ) s n s . r e l p l o t ( x = ” p r i n c i p a l ␣ component ␣ 1 ” , y = ” p r i n c i p a l ␣ component ␣ 2 ” , hue = ” c l u s t e r ” , d a t a = c l u s t e r _ y ) variables were converted from binary responses (Yes/No) Interpretation.

To analyse the clusters, a column with to numerical values (0/1). The missing values for age, the respective cluster was appended to the original table. semester, and great point average (GPA) were replaced with their mean. Next, the missing values were filled usAfter that, each cluster was filtered, and an individual evaluation was made based on the mean values for each ing the ”StandardScaler” method for data normalization. category in each cluster.

Then, one hot encoding was performed on the categorical variables (cheating attitude, major, gender, residence, motivation, performance pressure, technical equipment, preferred exam format, consequences of cheating, satis

Cluster 0 shows an increased tendency to cheat. In this cluster, almost all means of the cheating methods used are the highest. The cluster can be categorized as follows: The average GPA is the highest at 2.08 compared to the faction with studies, and interest in technology) to con- other two clusters, and the age of 22.98 indicates that vert them into numerical data.

this group is the youngest compared to the other clusters. (a) 2D Cluster Visualisation (b) 3D Cluster Visualisation

Business Psychology. The participants do not have a high technical interest, which could lead to a decrease in the incentive to use and experiment with technical cheating methods. 4.2.2. DBSCAN DBSCAN is a density-based clustering algorithm. This algorithm requires the definition of two hyperparameters, and . defines the radius of the neighborhood around each data point and is used to associate the data points to a cluster, defines the minimum number of data points of each cluster. The clustering process can be defined as follows: • Let be the set of data points, and let be the

-th data point. • The neighbourhood of within the radius is defined as: ( ) = { |( , ) ≤ } , where dist( , ) is the distance between and . • A core point is defined as a data point that has at least n data points within its neighbourhood: core point ∶ ∈ ∣ | ( )| ≥ . • A border point is a data point that is not a core point but is within the neighbourhood of a core point: border point ∶ ∈ ∣ ∃ ∈ , is a core point and ∈ ( ). • A noise point is a data point that is neither a core point nor a border point [10, 11]. Implementation. The algorithms does the same preprocessing steps as the k-Means method. Then, the DBSCAN model is initialized with an value for E of 2.2 and The participants are predominantly male, have a high minimum number of samples of 15 to form a dense retechnical interest, live in their own apartment or a shared gion (in the source code the variable eps is used for E and lfat, are on average between the 3rd and 4th semester, and min_samples presents the value of n data points). The study Digital Enterprise Management, Game-Production model is then applied to a standardized dataset, X-stand1, Management, Information Management in Healthcare, and the resulting cluster labels are printed. Business Informatics, or Industrial Engineering. Further- Next, the DBSCAN algorithm is applied to a dataset, more, the evaluation shows that the average values for principalDf1, and the resulting clusters are visualized by extensive and predominantly very time-consuming and a scatter plot with the principal components on the x and active hobbies as well as time for voluntary and social y axes, and the clusters indicated by diferent colors (see media activities are the highest. The participants are also Figure 6). motivated and have high performance pressure, which The resulting cluster labels are converted into a Pandas would increase the tendency to cheat. Regarding the Series and added to the original one-hot-encoded dataset. exam format, the participants perceive online exams as The observations are then grouped by their cluster numfairer and more pleasant than, for example, Cluster 1, bers and the mean values of each column in each cluster which tends towards presence formats. Measures such as are calculated and printed to the console. Determining failing the exam or being excluded from the exam would the means of the points in DBSCAN allows for the repredeter cheating. sentation of a cluster by providing a central point that

In a stark contrast to Cluster 0, it is evident that par- can describe or visually represent the cluster. ticipants in Cluster 2 view cheating as unethical, are religious, and prefer presence formats. The participants predominantly study Business Administration, Digital Medicine and Care Management, Physician Assistant, or # I n i t M o d e l Enterprise Management, Information Management and d b s c a n = DBSCAN ( e p s = 2 . 2 , Corporate Communications, Business Information Sysm i n _ s a m p l e s = 1 5 ) tems, and Industrial Engineering programs. d b s c a n . f i t ( p r i n c i p a l D f 1 ) Compared to Cluster 0, significant diferences are ob# V i s u a l i z e t h e c l u s t e r s served in the following categories: there is no academic p l t . f i g u r e ( f i g s i z e = ( 5 , 5 ) ) pressure in Cluster -1, and there is no preference for any p r i n c i p a l D f 1 = p r i n c i p a l D f 1 . rename particular type of examination format. However, online ( c o l u m n s = { ” p r i n c i p a l ␣ component examinations are perceived as fairer. The consequence of ␣ 1 ” : ” PC1 ” , ” p r i n c i p a l ␣ component being expelled from university is a significant deterrent ␣ 2 ” : ” PC2 ” } ) against cheating. s n s . s c a t t e r p l o t ( x= ” PC1 ” , y= ” PC2 ” , Cluster 0, on the other hand, comprises participants d a t a = p r i n c i p a l D f 1 , hue = d b s c a n . with a negative attitude towards cheating, and the mean l a b e l s _ , p a l e t t e = ” S e t 1 ” ) values for cheating methods are not as high as those in Cluster -1. The hobbies of these participants include rock climbing, basketball, and poker playing. The represented Interpretation. The majority of male participants has study programs are Healthcare Business Administration a high interest in technology and an increased likeli- and Data Science Management. hood of using cheating methods found in Cluster -1. The Cluster 1 consists of participants, mostly female and recheating methods employed by these participants include ligious, with a low interest in technology and the lowest displaying content on the main screen, second screen, likelihood of employing cheating methods. The preferred or other devices, using a virtual camera, receiving audio cheating method in this cluster is using an analogue cheat signals in the ear, pretending technical problems, reading sheet. The study programs represented in this cluster prepared texts, using translation programs, communicat- are Business Administration, Digital Medicine and Care ing with other students, completely copying solutions, Management, and Game Production and Management. having someone else take the exam, cheating on take- Hobbies include yoga, pilates, handball, and socializing home exams and submissions, cheating on pool exams, with friends. Participants in this cluster are aware of cheating on written Zoom exams. the consequences of cheating, such as being excluded

Cluster -1 is characterized by a higher frequency of so- from the exam, failing the exam, and having to give an cial media activities and hobbies such as football, tennis, oral explanation before the exam, which acts as a deterdancing, yoga, fitness, martial arts, horse riding, jogging, rent against cheating. They perceive in-person exams as chess, painting, cinema, and bars/clubs. Participants in fairer. These participants are highly motivated and feel this cluster report the highest number of volunteer hours, significant academic pressure. which is almost double the number reported by participants in other clusters. The attitude towards cheating in this cluster is generally permissive, with a tendency 5. Discussion and Results to cheat. Participants are primarily enrolled in Digital This paper aimed to identify and reveal factors that influence academic misconduct based on relevant literature by using clustering algorithms. A survey was developed to obtain necessary information through a quantitative study in multiple categories. 460 students participated in the survey, of which 263 completed the survey in its entirety. The results revealed that cheating behavior among students is influenced by various factors, including personal factors such as working time, family situation, academic pressure or organisational factors like the exam format. The analysis was carried out through a descriptive analysis and two diferent clustering methods k-Means and DBSCAN. The clustering process involved clustering a dataset that contained only cheating methods, and then assigning the resulting groups to all categories. The clusters generated by both methods exhibit significant similarities, but there are also some diferences. • Both analyses identify clusters with participants who have a negative attitude towards cheating (Cluster 1 in DBSCAN, Cluster 2 in k-Means). • Both methods identify a group with a tendency to cheat: Cluster 0 in k-Means is characterized by predominantly male participants with a high technical interest and a tendency to cheat, whereas Cluster -1 in DBSCAN is characterized by male participants with a high interest in technology and a permissive attitude towards cheating. • Online exams are perceived as fairer compared to in-person exams by certain clusters (Cluster 0 in k-Means, and in both clusters in DBSCAN). • Both analyses identify participants with high aca

demic pressure and motivation in their studies.

Diferences • The two clustering analyses identify diferent

numbers of clusters and their characteristics. • The cheating methods used by participants in the

diferent clusters vary across the two analyses. • Cluster 2 in k-Means is characterized by younger male participants with a high technical interest who have a tendency to cheat, whereas there is no corresponding cluster in DBSCAN. • The hobbies and study programs of participants in the diferent clusters difer between the two analyses.

The study and analysis provided insights into the factors that influence cheating behavior among students.

The descriptive analysis revealed the prefered cheating method or exam format, time spent for their private hobbies, interests, living habits, working or volunteering hours, motivation or academic pressure of the students.

The results suggest that the exam format, academic pressure and the perceived fairness are significant predictors of cheating behavior. Students who reported high levels of motivation and academic pressure were more likely to engage in cheating behavior.

For the cluster analysis in k-Means as well as DBSCAN the information was selected based on the cheating methods and then mapped to the complete data set. Both clustering results reveal tendencies, that a high technical interest and the online format influence a higher rate in cheating. Furthermore, the clustering identified in both methods a group of younger male students with a large number of hobbies and social media hours which use several cheating methods. It also showed, that the participants with a lower cheating tendency have a ethical attitude, prefer presence formats, are at a higher age and are aware of the consequences, when they get caught in exam cheating.

This study has implications for educators and academic institutions, highlighting the need to address academic integrity issues and to create a culture of academic honesty.

Further research is necessary to explore how academic institutions can efectively address academic integrity issues and promote ethical behavior among students.

Future Work. This analysis is part of a PhD project, which identifies and evaluates the attitude and habits of students in regard to academic cheating. Further analysis with other machine learning algorithms are planned.

In a next step, a second data collection at Regensburg University is planned to compare the data sets between boths institutions. Furthermore an eye tracking study will be conducted, to reveal patterns in eye moving while students are cheating.

Study Limitations. In this data analysis missing values got replaced by their mean values. In the further PhD thesis there will be used alternative strategies to deal with missing values, like to use zero values or use diferent case scenarios. national Publishing, Cham, 2021, pp. 3–22. URL: https://doi.org/10.1007/978-3-030-71270-9_1. [7] A. K. Jain, R. C. Dubes, Algorithms for clustering

data, Prentice-Hall, Inc., 1988. [8] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko,

R. Silverman, A. Wu, An eficient k-means clustering algorithm: analysis and implementation, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002) 881–892. doi:10.1109/TPAMI. 2002.1017616, conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence. [9] K. P. Sinaga, M.-S. Yang, Unsupervised K

Means Clustering Algorithm, IEEE Access 8 (2020) 80716–80727. doi:10.1109/ACCESS.2020.

2988796, conference Name: IEEE Access. [10] J. Sander, M. Ester, H.-P. Kriegel, X. Xu, Density

Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications, Data Mining and Knowledge Discovery 2 (1998) 169–194. URL: https://doi.org/10.1023/A:1009745219419. [11] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A Density

Based Algorithm for Discovering Clusters in Large

Spatial Databases with Noise (1996). Full descriptive analysis: https://www.dropbox.com/s/nevkfbxueqakcwn/Deskriptive%20Auswertung.pdf?dl=0 https://www.dropbox.com/s/bnksjy0r1x6upt0/Umfrage_484229_Untersuchung_von_Einflussfaktoren_auf_die_Studienleistung_bei_online_Prfungen.pdf?dl=0

Full set of survey data: https://www.dropbox.com/s/8p6sz7gfg4rnykv/Umfragedaten.xlsx?dl=0 Category Academic semester Grade point average Alter Cheating methods Exam format where cheating occurred Hobbies / Interest Interest in Technology Not interested at all Working Hours Volunteering Volunteering hours Time Social Media Family situation Fields of study Gender Motivation / Pressure / Satisfaction Exam Format / Technical Equipment

Cluster 0 3,73529412 2,08294118 22,98 Manipulated exam materials, Display content main, second screen, other devices, Virtual camera, Audio signal in ear, Pretending technical dificulties, Reading prepared texts, Translation programs, Communication with others, Copying complete solutions, Someone else takes exam Pool Exam Tennis, Dancing, Yoga, Gym, Pilates, Climbing, Martial arts, Meeting Friends, Painting, Bar/Club, Politics, Crafting, Riding very interested 6,94117647 0,40625 3,08823529 12 Taking care of siblings Digital Enterprise Management, Game-Production Management, Information Management in Healthcare, Business Informatics, Industrial Engineering Male Motivated, high pressure, high satisfaction on studies Technical equipment missing or limited, participation in online exams possible, comfortable with online exam fairer with online exam Living situation

Own apartment, shared flat Consequences to deter cheating

Failing the exam, exclusion from exam Attitude

Attitude towards cheating (cheating = yes)

Figure 7: Selected DBSCAN mean values

[1] I. Krumpal , R. Berger (Eds.), Devianz und Subkulturen: Theorien, Methoden und empirische Befunde , Kriminalität und Gesellschaft , Springer Fachmedien Wiesbaden, Wiesbaden, 2020 . URL: http://link.springer.com/10.1007/ 978-3- 658 -27228-9.

[2]

D. L.

McCabe ,

K. D.

Butterfield ,

L. K.

Treviño , Cheating in college: why students do it and what educators can do about it , The Johns Hopkins University Press, Baltimore, 2012 .

[3]

Janke ,

S. C.

Rudert , Ä. Petersen,

T. M.

Fritz ,

Daumiller , Cheating in the wake of COVID-19: How dangerous is ad-hoc online testing for academic integrity? , Computers and Education Open 2 ( 2021 ) 100055 . URL: https://www.sciencedirect. com/science/article/pii/S2666557321000264.

[4]

Hillebrecht , Einflussfaktoren des Studienerfolgs im Vollzeit-Studium , in: Studienerfolg von berufsbegleitend Studierenden , Springer Fachmedien Wiesbaden, Wiesbaden, 2019 , pp. 77 - 124 . URL: http: //link.springer.com/10.1007/978-3- 658 -26164- 1 _ 3 .

[5]

Konegen-Grenier , Studierfähigkeit und Hochschulzugang ., Kölner Texte &[und] Thesen . 61, Deutscher

Instituts-Verl.

, Köln , 2002 .

[6]

Gupta ,

N. K.

Sehgal , Machine Learning Concepts , in: P. Gupta, N. K. Sehgal (Eds.), Introduction to Machine Learning in the Cloud with Python: Concepts and Practices , Springer Inter-