=Paper=
{{Paper
|id=Vol-2945/53-VML-ConfWS21_paper_7
|storemode=property
|title=Counteracting Exam Cheating by Leveraging Configuration and Recommendation Techniques
|pdfUrl=https://ceur-ws.org/Vol-2945/53-VML-ConfWS21_paper_7.pdf
|volume=Vol-2945
|authors=Viet-Man Le,Thi Ngoc Trang Tran,Martin Stettinger,Lisa Weißl,Alexander Felfernig,Müslüm Atas,Seda Polat Erdeniz,Andrei Popescu
|dblpUrl=https://dblp.org/rec/conf/confws/LeTSWFAE021
}}
==Counteracting Exam Cheating by Leveraging Configuration and Recommendation Techniques==
Counteracting Exam Cheating by Leveraging Configuration and Recommendation Techniques Viet-Man Le and Thi Ngoc Trang Tran and Alexander Felfernig and Müslüm Atas and Lisa Weißl and Andrei Popescu and Martin Stettinger and Seda Polat-Erdeniz 1 Abstract. Exam cheating indicates behaviors of students to fraud- therefore, more challenging to be detected and counteracted com- ulently achieve their desired grades through various forms, such as pared to traditional learning formats [37]. In this context, looking for item harvesting, item pre-knowledge, item memorizing, collusion effective approaches to avoid exam cheating behaviors has become and answer copying, and answer checking from available sources. one of the most critical challenges of education institutions. This ac- Such dishonesty behaviors become manifest in e-learning scenarios, tion is crucial to assure the integrity of student work and to increase where exams are often conducted via online assessment platforms trust in online education systems [10]. without the physical supervision of proctors. In this paper, we pro- While extensive research has been conducted to detect cheating pose an approach to counteract exam cheating based on configura- reasons as well as factors affecting students’ exam cheating behav- tion and recommendation techniques. Our approach allows exam- iors [9, 11, 15, 16, 30], there exist only a few studies that pro- iners to configure questions and exams using feature models. We pose solutions for counteracting or avoiding such dishonesty behav- support the configuration of parameterized questions, which helps to iors. Most of these studies target at preventing exam cheating in on- generate a large number of exam instances. Besides, a content-based line exams (i.e., exams conducted via Internet-based platforms) [37]. recommendation mechanism is integrated into the exam configura- Alessio et al. [2] and Dendir and Maxwell [14] proposed approaches tion process, which helps examiners to select questions that have not to prevent exam cheating using a proctoring software that activates appeared in the latest exams. We also propose mock-ups to show how the camera on a computer and then records the exam of students. This question and exam generation processes can be proceeded in a real software allows examiners to observe the behaviors of students and exam generator system. thereby detect their cheating behaviors. It also helps to prevent stu- dents from talking to each other or looking up relevant information in books or other sources. Although this approach helps to mitigate 1 INTRODUCTION academic dishonesty behaviors in online exams, it could raise privacy Cheating refers to a tendency of students to fraudulently achieve their issues. Another problem is related to the efficiency of the approach, desired grades rather than investing a sufficient amount of time and especially in the context of big courses where exams are conducted effort in learning and improving their knowledge [42]. In exam sce- with hundreds of students at the same time. Detecting exam cheating narios, cheating behaviors can be shown in different forms, such as of a large number of students by just analyzing students’ recorded item harvesting, item pre-knowledge, item memorizing, collusion and videos might be a sub-optimal solution since it would consume too answer copying, and answer checking [11, 13, 34, 41]. Item harvest- much effort of examiners or proctors. ing occurs when a concerted attempt is made to collect exam ques- A more efficient approach is to randomize exam questions and tions. Students can do this by memorizing exam content, recording answers, which has been widely applied in Learning Management it, or transcribing it. Item pre-knowledge occurs when students obtain Systems such as WebCT and Blackboard2 . This approach allows ex- knowledge of the exam questions and/or answers (e.g., through the aminers to prepare randomized questions in such a way that no two Internet or other multi-media sources) prior to the exam. Item mem- exams are alike [31]. Besides, in order to increase the probability of orizing occurs when a student answers the questions several times to generating different exam instances, this approach requires a large reach an estimated level ability close to his/her true ability. He/she question bank that consists of a large number of questions and an- is assumed to use his/her time only to memorize a fixed number of swers [29]. Additionally, paraphrasing techniques might be needed questions. Collusion or answer copying denotes a scenario where two to reformulate questions that have been selected from the question or more students work together to complete an exam. This type of bank. Golden and Kohlbeck [22] show that paraphrasing questions cheating is triggered when students sit close to each other and try selected from a question bank is, on the one hand, essential for re- to copy answers from each other during the exam. The final exam ducing the benefits of students from cheating in online exams. On cheating type is answer checking, in which students try to check the the other hand, this helps to increase the performance of students in answers to the questions from available resources. completing the exam. In e-learning scenarios where learning and testing activities are Inspired by the ideas discussed by Mccabe [29] and Golden and done primarily via web-based platforms [5, 10], the mentioned exam Kohlbeck [22], we propose in this paper an approach to counter- cheating behaviors have become even more intensively, which is, act exam cheating behaviors by generating a large question bank, in which questions and corresponding answers are generated automat- 1 Graz University of Technology, Graz, Austria. Emails: {vietman.le, ttrang, ically. Our approach supports the generation of different instances alexander.felfernig, muesluem.atas, andrei.popescu, martin.stettinger, spo- later}@ist.tugraz.at, and lisa.weissl@student.tugraz.at 2 https://www.blackboard.com Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) of a question topic. For instance, we could create two instances for A feature model is a hierarchical representation of a set of features a question topic regarding “minimal conflict sets” using equivalent and their interrelationships [6, 26]. In such a representation, features terms. The two instances could be (1) “What is a minimal conflict are represented by nodes, and relationships between features are rep- set?” and (2) “What is a minimal unsatisfiable subset?”. resented by links. The root of the feature model is a so-called root In order to support this, our approach enables question configura- feature (fr ), which is involved in every configuration (fr = true). tion mechanisms using feature models - one of the core technologies A feature model can be exploited in exam scenarios to repre- of configuration systems [24]. In the context of exams and questions sent a set of questions for an exam that share common features. For modeling, where examiners are not always good at technology, fea- instance, given a set of two multiple-choice questions Q1 and Q2 ture models might be an appropriate choice. The reason is that the shown in Table 1, a corresponding feature model representing these representation of feature models is straightforward and does not re- questions is depicted in Figure 1. quire any special expertise of the examiner to create them [6]. Fur- thermore, a feature model utilizes a tree-based representation that Table 1: An example of two multiple-choice questions that share com- provides a good overview of knowledge structure as well as facil- mon features. itates feature model management [6, 20, 27]. These are the advan- What is a minimal diagnosis? tages of feature models, which motivate us to leverage them in our 1. an arbitrary subset approach to exam and question configuration. Q1 2. a minimal deletion subset (correct answer) 3. a minimal unsatisfiable subset Besides, we also encourage the configuration of parameterized 4. a maximal subset questions, in which each question is configured using relationships What is the definition of a minimal conflict set? or constraints defined in the corresponding feature model. With spe- 1. an arbitrary subset cific question settings, all instances are generated. Each instance rep- Q2 2. a minimal deletion subset resents a complete question with a question statement, correct an- 3. a minimal unsatisfiable subset (correct answer) 4. a maximal subset swers, and incorrect answers (see also Figure 3). This way, our ap- proach helps to significantly increase the solution space of questions Feature models can be distinguished with regard to the used and, therefore, increase the question bank’s size. After the question knowledge representation [6]. In this section, we present three well- generation phase, an exam configuration process is activated, which known feature models (basic feature models [26], cardinality-based allows an examiner to configure a set of exams by selecting ques- feature models [12], and extended feature models [4]), where their tions that have been generated. The question selection can be made notations are used in our approach. based on constraints specified by an examiner, such as total num- ber of exam instances, number of questions in each exam instance, Basic Feature Models. A basic feature model [26, 27] consists of duration, the similarity with previous exams, and the share of dif- two parts: structural part and constraint part. The former establishes ferent question types in each exam instance. Furthermore, question a hierarchical relationship between features. The latter combines ad- and exam configuration processes are further supported by a recom- ditional constraints that represent so-called cross-tree constraints. mendation mechanism that helps to generate exams that are different Structurally, the relationship between a feature and its sub-features from previous exams as much as possible. can be typically classified as follows: mandatory, optional, alterna- The contributions of our work are therefore two-fold: tive, and or. A mandatory relationship indicates that a child feature will be included in a configuration if and only if its parent feature is 1. We propose an exam creation approach supporting question and included in the configuration (e.g., see the relationship between f1 answer parameterization, which significantly increases the solu- and f3 ). An optional relationship denotes the fact that the inclusion tion space and automatically generates many exam instances. This of a child feature is optional if the parent feature is included (e.g., see way, each student will receive a different exam, which therefore the relationship between f1 and f4 ). An alternative relationship de- helps effectively counteract cheating behaviors, especially in ex- scribes the fact that exactly one child feature has to be included if the ams for big courses. parent feature has been included (e.g., see the relationships between 2. We develop mock-ups of a real exam creator system to support the f8 and its child features f10 and f11 ). Finally, an or relationship in- mentioned approach. dicates that at least one of the child features should be included if the parent feature has been included (e.g., see the relationships between The remainder of the paper is organized as follows. In Section 2, f9 and its child features f12 ..f15 ). we provide basic knowledge regarding feature models, feature model In the constraint part, cross-tree constraints are integrated graphi- configuration, and recommendation techniques. Section 3 and Sec- cally into the model to set cross-hierarchical restrictions for features. tion 4 are the main parts of our work, in which we present how There are two constraint types, requires and excludes, that can be configuration and recommendation techniques are exploited in our used for the specification of feature models [6]. A requires constraint approach to generate exams. Finally, we conclude the paper and dis- shows that if one feature is included in the configuration, then another cuss open issues for future work in Section 5. feature must be included as well (e.g., f5 requires f10 ). An excludes constraint denotes that two certain features must not be included in 2 PRELIMINARIES the same configuration (e.g., f6 excludes f12 ). 2.1 Feature Models Cardinality-based Feature Models. Cardinality-based feature models [12] extend the basic ones to allow cardinalities with an up- Feature models are used to specify the variability and commonality per bound > 1 of feature relationships. These feature models sup- of complex items, such as software artifacts, configurable products, port two new relationships: feature cardinality and group cardinal- and highly-variant services [3, 6, 26]. Applications based on feature ity. Feature cardinality is a sequence of intervals denoted [n..m] (n - models help users to decide which features should be included in a lower bound, m - upper bound), determining the number of instances specific configuration. of the feature that can be part of a product. Group cardinality is an Question 1 f0 Question f1 Answers f2 What is the a minimal a minimal ? Correct Answers f8 Incorrect Answers f9 definition of diagnosis conflict set <2..3> f3 f7 f4 f5 f6 a minimal a minimal a minimal an a maximal a maximal deletion unsatisfiable unsatisfiable arbitrary deletion subset subset set set subset subset f15 f10 f11 f12 f13 f14 cstr1 cstr2 cstr3 Mandatory Alternative RequiresGroup cardinality Optional Or Excludes [n..m] Feature cardinality Figure 1: Feature model for a set of multiple-choice questions. interval denoted hn..mi (n - lower bound, m - upper bound), limit- recommendation techniques have also been applied in the e-learning ing the number of child features that can be part of a product when its domain to support learners in choosing courses, resources, or learn- parent feature is selected. For instance, the group cardinality h2..3i ing materials [43]. Besides, these techniques can also be exploited to between feature f9 and its child features f12 ..f15 indicates that a con- support teachers/lecturers/instructors for generating exams [23]. figuration for Question 1 has minimum two and maximum three There exist three well-known recommendation approaches that incorrect answers. have been extensively studied in the recommender systems research: collaborative filtering, content-based, and knowledge-based [36]. Extended Feature Models. Extended feature models [4] support Each of these approaches has its own characteristics and suitable ap- the description of features with attributes. For instance, in an exam plication scenarios. Content-based recommendation builds a user’s feature model, each question is described by two attributes: question profile based on his/her past preferences and recommends items that complexity and question type. These feature models can also include are similar to his/her profile. This approach is suitable for recom- complex constraints among attributes and features. One example con- mending items with abundant content information such as docu- straint can be: “If the question complexity of Question 1 is ‘im- ments or webpages [1]. Collaborative filtering suggests a specific portant to know’, then this question should be included in the exam”. item to a user based on the preferences of similar users. This ap- proach is widely used and well-known through the Netflix compe- 2.2 Feature Model Configuration tition [7]. Knowledge-based approaches are usually applied to gen- erate recommendations in domains where the quantity of available For the discussions in the later sections, we introduce the definitions item ratings is quite limited (such as cars, apartments, and financial of a feature model configuration task and a feature model configu- services) or when the user wants to explicitly define his/her require- ration (solution) [17, 24]. A feature model configuration task can be ments for items (e.g., “the apartment should be close to working defined as a constraint satisfaction problem (CSP) [40]. area”). These approaches generate recommendations based on the Definition 1 (Feature model configuration task) A fea- knowledge about the items, explicit user preferences, and a set of ture configuration task is defined by a triple (F, D, C), constraints describing the dependencies between users’ preferences where F = {f1 , f2 , ..., fn } is a set of features, D = and items’ properties [18]. {dom(f1 ), dom(f2 ), ..., dom(fn )} is a set of feature domains, In this study, we select content-based recommendation to be in- and C = CF ∪ CR is a set of constraints restricting possible tegrated into our approach since the items in our recommendation configurations, CF = {c1 , c2 , ..., ck } represents a set of feature scenario are exams and questions that are mostly represented in text model constraints, and CR = {ck+1 , ck+2 , ..., cm } represents a set forms. Our recommendation approach helps to filter exams with a of user requirements. low number of questions that have been used in previous exams (see Definition 2 (Feature model configuration) A feature model further details in Section 4). configuration S for a given feature model configuration task (F, D, C) is an assignment of the features fi ∈ F, ∀i ∈ [1..n]. S is valid if it is complete (i.e., each feature in F has a value) and con- 3 QUESTION AND EXAM CONFIGURATION sistent (i.e., S fulfills the constraints in C). 3.1 Configuring Questions using Feature Models 2.3 Recommendation In this section, we present our approach to model a set of ques- Recommendation techniques have been employed in various do- tions using feature models. Although our approach is illustrated mains such as movies, music, books, tourism destinations, financial by multiple-choice questions, it is also applicable to other question services, and healthcare to recommend products/services that meet types, such as matching, drag-drop, reordering, and freetext (i.e., users’ needs and preferences [8, 18, 28, 38, 39, 43]. More recently, questions whose answers can be entered by students using free texts). Example question configuration scenario. Assume an examiner Overview / Category 1 wants to create a feature model that represents two multiple-choice Question 1 questions Q1 and Q2 as shown in Table 1. For the purpose of generat- 1 Question Question Instances 3 #Instances: 6 ing further instances that are different from Q1 and Q2 , the examiner What is #Used instances: 3 Question instance #1: sets the minimum and the maximum number of correct/incorrect an- 01-01 What is a minimal conflict! a minimal conflict swers. The number of correct answers to each question is exactly 1 an arbitrary subset a minimal diagnosis a minimal deletion subset (i.e., min = max = 1), the number of incorrect answers stays in ! a minimal unsatisfiable subset a maximal subset the range of [2..3]. In the following, we analyze Q1 and Q2 , which Answers This question instance has been used 2 times in the two last exams. is the basic to construct the feature model of these two questions: a minimal deletion subset a minimal unsatisfiable subset 1 of 6 an arbitrary subset Settings • The phrases “What is” and “?” are located at the same relative 4 a maximal deletion subset Randomize the order of Answers positions and obligatory parts of the questions. Therefore, they a maximal subset Important Level: Important to know Question Points: 1 are referred to as mandatory phrases. #Correct answers: 1 - 1 #Incorrect answers: 2 - 3 Estimated Duration: 1 minutes • The phrase “the definition of” appears only in question Q2 , and min max min max Question Type: Multiple choice this question will not change its meaning without this phrase. 2 Constraints Add Delete Edit Hence, this phrase can be referred to as an optional phrase. cstr1: a minimal conflict requires a minimal unsatisfiable subset • The phrases “a minimal diagnosis” and “a minimal conflict set” cstr2: a minimal diagnosis requires a minimal deletion subset cstr3: can be replaced with each other, they are therefore referred to as a minimal conflict excludes a minimal deletion subset alternative phrases. • There are the same incorrect answers such as “an arbitrary sub- set” and “a maximal subset”, which are referred to as or phrases. Figure 3: Mock-up for question configuration, consisting of four • The correct answers (“a minimal deletion subset” and “a minimal parts: Part 1 - Question & Answers Editor, Part 2 - Constraint Editor, unsatisfiable subset”) are chosen depending on which phrase (“a Part 3 - Question Instances, and Part 4 - Question-Attribute Settings. minimal diagnosis” or “a minimal conflict set”) has been selected The content in Part 1 & 2 is related to the feature model depicted in to tailor the question. If “a minimal diagnosis” is selected, then Figure 1. the correct answer should be “a minimal deletion subset” (Q1 ). If “a minimal conflict set” is selected, then the correct answer correct answers is greater than 1. Since the number of incorrect an- should be “a minimal unsatisfiable subset” (Q2 ). These show the swers stays in the range of [2..3], a group cardinality h2..3i between requires relationships between the mentioned phrases. the feature Incorrect Answers and its sub-features f12 ..f15 is needed. Two cross-tree constraints {cstr1: f5 requires f10 } and {cstr2: f6 requires f11 } should be defined to identify the correct Q1 What is a minimal diagnosis ? answers. The constraint {cstr3: f6 excludes f12 } indicates that if f6 is selected then f12 cannot be an incorrect answer. Q2 What is the definition of a minimal conflict set ? Supported tool for question configuration. The question config- mandatory optional alternative mandatory uration process of an examiner can be supported by an envisioned exam generator tool. In the following, we propose a mock-up show- Figure 2: A tokenization for questions Q1 and Q2 , showing how to ing how such a configuration is proceeded. identify the relationship of tokens between the questions. Figure 3 shows the mock-up for configuring a set of multiple- choice questions, which consists of the following parts: Question feature model. Based on the above analysis, a corre- sponding feature model that specifies the variability and the com- • Part 1 - Question & Answers Editor: The editor represents the monality of Q1 , Q2 , and all other instances can be generated (see structural part of a feature model in a tree view control, where Figure 1). The feature model shows two mandatory sub-features of each feature is represented by a node in the tree. The sub-tree the root feature, referring to two main parts of a question Question Question shows phrases used to tailor the question statement. - f1 and Answers - f2 . The statement of a question is now modelled The sub-tree Answer shows correct answers (with X ) and in- based on the sub-features of f1 . The answers of a question are mod- correct answers5 . At the bottom of this part, the editor asks an elled based on the sub-features of f2 .3 examiner to enter the number of correct/incorrect answers to the The feature Question has five sub-features f3 ..f7 , where f3 and question. f7 are mandatory features, f4 is an optional feature, and f5 and • Part 2 - Constraint Editor: The editor allows an examiner to add, f6 are alternative features. In the branch of the feature Answers, edit, or delete constraints used in the feature model. When click- two sub-features f8 and f9 have to be added to distinguish between ing on the “Add” or “Edit” button, the Constraint Editor dialog correct and incorrect answers4 . The feature Correct Answers is shown to let the examiner create constraints (see Figure 5). connects to its sub-features f10 and f11 using an alternative rela- Besides, when defining a constraint, an inconsistency detection tionship since there is only one correct answer to the question. This mechanism is activated to identify constraints triggering incon- relationship could be replaced with an or relationship with a group sistencies [21, 35]. The identified constraints are highlighted to cardinality if the examiner has specified the maximum number of inform the examiner that these constraints should be adapted for resolving inconsistencies. 3 In the context of free-text questions, the feature Answers does not have sub-features since no answers should be pre-specified (i.e., for a free- 5 In the Question & Answers Editor, the correct and incorrect answers are text/open question, the answer is entered by students). not separated into two branches as shown in the feature model (see Figure 4 Another way is to create direct connections from f to correct and incorrect 1). The reason is to visualize answers in the traditional form of a multiple- 2 answers without splitting them into two branches. choice question. Question 5 f0 Question f1 Answers f2 <2..4> [1..2] [2..3] Given the v1>v2 v2>v3 v3>v1 v2>v1 where variables What is/are What is the CS S constraints have the domain corresponding preferred f4 f5 f6 f7 f11 f12 f3 of [1..5]. minimal minimal conflict(s)? conflict? f8 f9 f10 cstr1 v1,v2,v3 ∈ [1,5] cstr5 C = collect(f4,f5,f6,f7) cstr2 (f4 ∧ f5) → f6 cstr6 CS = random( conflict_hsdag(C) ) cstr3 (f4 ∧ ¬f5) → f7 cstr7 f10 → CS = quickxplain(C) cstr4 f10 → f4 ∧ f5 ∧ f6 ∧ f7 cstr8 S ≠ CS cstr9 S = random(f4, f5, f6, f7) Figure 4: The feature model for a set of parameterized questions related to the identification of minimal conflicts or the preferred minimal conflict from a set of constraints. many instances can be generated by randomly selecting different val- Constraint Editor Mandatory Alternative Requires YGroup ues for X and in theircardinality domain. This way, we can generate a set of Operators and Functions Terms Optional Or questions related Excludes to the sum [n..m] Feature of two parameters X and Y . cardinality requires excludes not and or xor a minimal conflict + - ⨉ / = ≠ < > ≤ ≥ ( ) a minimal diagnosis In this work, we support the configuration of parameterized ques- in_domain random collect alldifferent a minimal unsatisfiable subset tions for the purpose of increasing the number of exam instances. hsdag conflict_hsdag quickxplain fastdiag a minimal deletion subset an arbitrary subset A set of parameterized questions can be represented using a feature cstr3: a minimal conflict excludes a minimal deletion subset Add model. The same as discussed in Section 3.1, a feature model for a set of parameterized questions represents the relationships between fea- cstr1: a minimal conflict requires a minimal unsatisfiable subset cstr2: a minimal diagnosis requires a minimal deletion subset tures using basic feature concepts. However, one difference lies in the cstr3: a minimal conflict excludes a minimal deletion subset parameterized features that are often used for a specific calculation (e.g., X + Y ). Besides, the answers to a parameterized question are not predefined. They are instead automatically calculated depending on the selection of the parameterized features. Figure 5: Mock-up for the Constraint Editor dialog. In the following, we present an example of parameterized question configuration using the feature model depicted in Figure 4: • Part 3 - Question Instances: An examiner is able to see the num- • Features f4 ..f7 , f11 , f12 are parameterized features. ber of question instances generated based on the feature model • Features f4 ..f7 represent constraints of a CSP problem [40]. Their and constraints defined in Parts 1 & 2. In our example, six ques- relationship with feature Question - f1 is represented using a tion instances have been generated (“#Instances: 6”). The exam- group cardinality h2..4i, that specifies the minimum and maxi- iner can browse through all instances by using the pagination con- mum numbers of constraints to tailor the question statement. trol. For each instance, a recommendation mechanism is activated • Features f11 and f12 represent correct and incorrect answers re- to specify how often the instance has been used in previous ex- spectively, which can be automatically calculated depending on ams (e.g., “This question instance has been used twice in the last which parameterized features have been selected for the question. two exams”). Besides, the system calculates the number of in- Assume features f4 ..f7 and the statement “What is/are the corre- stances used in previous exams. For example, “#Used instances: sponding minimal conflict(s)?” have been selected, #correct an- 3” means three out of six instances have been used in previous swers = #incorrect answers = 2. Corresponding correct answers exams. For further details of the recommendation mechanism, see would be {c1 , c2 , c3 } and {c1 , c4 }, and corresponding incorrect Section 4. answers would be {c2 } and {c3 }. • Part 4 - Question-Attribute Settings: This part allows an examiner • Due to the support of parameterized features and an automated to set the attributes of a question, such as answer randomizing, answer calculation mechanism, the definition of cross-tree con- important level, question points, estimated duration and question straints is pretty complex. Instead of using requires/excludes con- type. straints, more complex constraints have to be defined. The seman- tics of the constraints is summarized in the following: 3.2 Configuring Parameterized Questions – cstr1 specifies the domain of variables v1..v3. A parameterized question is a template with mathematical expres- – cstr2 and cstr3 assure to trigger at least one inconsistency sions that are changed based on a specific set of replacement val- among the selected features f4 ..f7 . ues. A straightforward template for parameterized questions can be: “What is the result of X + Y?”, in which X and Y are the parame- – cstr4 ensures the existence of many conflicts. ters whose values are in the range of [1..5]. Based on this template, – cstr5 specifies which of the features (f4 .. f7 ) have been se- Exam <2..2> Topic 1 Topic 2 <3..3> Topic 3 Topic 4 Topic 5 … <2..2> Question Question Question Question Question Question Question Question Question Question Question Question Question 1 2 3 4 5 6 7 8 9 10 11 12 13 cstr1 cstr4 cstr5 cstr2 cstr3 cstr6 sum(question)=40 cstr7 sum(question.estimated_duration)=50 cstr8 sum(question.important_level=nice_to_know)/sum(question) = 0.5 cstr9 sum(question.important_level=important_to_know)/sum(question) = 0.3 cstr10 sum(question.important_level=extremely_important_to_know)/sum(question) = 0.2 cstr11 sum(question.type=multiple_choice)/Sum(question) > 0.6 Figure 6: An example feature model for a set of exams, in which cstr1..cstr5 represent the relationship between questions, cstr6 and cstr7 are resource constraints, cstr8..cstr10 are question complexity constraints, and cstr11 denotes a constraint w.r.t. question type. lected and then adds the selected features to a new variable C. Overview / Category 1 – cstr6 identifies all conflicts using the conflict hsdag Question 5 function [35]. The random function is used to randomly se- Question Question Instances #Instances: 756 lect minimal conflicts for the correct answers. Given the constraints #Used instances: 120 Question instance #1: 02-04 – cstr7 indicates that if the question is “What is the preferred v1 > v2 Given the constraints c1: v1 > v2, c2: v2 > v3, c3: v3 > v1, and c4: v2 > v1, where minimal conflict?”, then the correct answer would be the out- v2 > v3 variables have the domain of [1..5]. What is/are corresponding minimal come of the quickxplain function [25]. v3 > v1 conflict(s)! {c1, c2, c3} v2 > v1 {c1, c4} – cstr8 and cstr9 help to generate incorrect answers. where variables have the domain of [1..5]. {c2} {c3} 01-01 In order to support the configuration of parameterized questions, What is/are corresponding minimal conflict(s)! This question instance has been used 1 times in the two last exams. we propose a mock-up as shown in Figure 7, whose design is sim- What is the preferred minimal conflict! 1 of 756 Answers ilar to the mock-up for configuring multiple-choice questions (see Settings CS Figure 3). S Randomize the order of Answers Important Level: Important to know Question Points: 1 #Correct answers: 1 - 2 #Incorrect answers: 2 - 3 Estimated Duration: 2 minutes min max min max 3.3 Exam Configuration Constraints Question Type: Multiple choice Add Delete Edit A set of exams can be modeled using a feature model, in which each cstr1: v1, v2, v3 in_domain(1,5) cstr2: feature represents a topic and/or a question (see Figure 6). The re- cstr3: ( ( v1 > v2 v1 > v2 and and v2 > v3 not ) v2 > v3 requires ) v3 > v1 requires v2 > v1 cstr4: lationships between questions, as well as the relationships between What is the preferred minimal conflict? and v2 > v1 requires v1 > v2 and v2 > v3 and v3 > v1 exam topics and corresponding questions, can be represented by the cstr5: cstr6: C CS = = collect( random( v1 > v2 , conflict_hsdag( v2 > v3 C , ) v3 > v1 ) , v2 > v1 ) constraints described in basic feature models, cardinality-based fea- cstr7: cstr8: What is the preferred minimal conflict? S ≠ CS requires CS = quickxplain( C ) ture models, and extended feature models. Constraints in basic fea- cstr9: S = random( v1 > v2 , v2 > v3 , v3 > v1 , v2 > v1 ) ture models can be used to describe the relationship between topics and questions. For instance, there exists a mandatory relationship be- Figure 7: Mock-up for parameterized question configuration repre- tween a topic and a question, showing that a question should belong senting the feature model depicted in Figure 4. to a specific topic. Constraints in cardinality-based feature models can be exploited to define the minimum number and the maximum number of questions in a specific topic. For instance, there exists a Based on the generated constraints, a set of exam instances can be group cardinality h2..3i between the feature Topic 2 and its sub- generated using a constraint solver. Before activating the solver, the features (Question 8..Question 13), showing that there are exam feature model has to be translated into a CSP [40]. On the basis minimum two questions and maximum three questions to be included of this representation, solutions (configurations) are directly deter- in Topic 2. Constraints in extended feature models can be used mined by the solver, such as Excel Solver [19] or Choco Solver [33]. to define question complexity constraints. For instance, the distribu- Each configuration indicates an exam instance, which is generated tion of question complexity in the exam should be 50% for “nice by traversing selected features in the depth-first fashion. to know” questions, 30% for “important to know” questions, and To support the exam configuration process, we propose a mock-up 20% for “extremely important to know questions” (see constraints as shown in Figure 8. Similar to the mock-up for question configura- cstr8..cstr10). Further constraints regarding number of questions, tion, Exam Editor (Part 1) and Constraint Editor (Part 2) are shown duration, and question types could also be defined (see constraints on the left-hand side, which allow an examiner to describe the exam cstr6, cstr7, and cstr11). structure (based on a feature model) as well as corresponding con- similarity [43] (see Formula 1). Configuration Systems Course Exam Settings P · Qi sim(P, Qi ) = (1) 1 Exam #Exam instances: 120 ||P || · ||Qi || Topic 1 %Similar to questions in previous < 20% exams: Question 1 The calculated similarity between two questions P and Qi is then Duration: 50 minutes 02-02 #Questions: 40 3 compared with a threshold θ that has been specified by the exam- Question 2 iner. In our mock-up shown in Figure 8, the examiner can specify the Question 3 ! 30% of solutions used in the two last exams ! %Nice to know: 50% 30% 4 threshold in the item “%Similar to questions in previous exams”. If %Important to know: Question 4 Question 5 %Extremly Important to know: 20% the similarity is greater than θ, we can conclude that P is very sim- Topic 2 %Multiple choice: > 60% 5 ilar to Qi and increases fP by 1. The same procedure can be done Question 6 ! 40% of solutions used in the two last exams ! %Image Analysis: % for other previous exams. Finally, the frequency of P to appear in %Matching: % Question 7 %Reordering: % n previous exams can be identified by Formula 2. The lower the fP 03-03 %Freetext: % value, the higher the probability of choosing question instance P for Question 8 Question 9 the exam. Question 10 ! 70% of solutions used in the two last exams ! Question 11 Question 12 fP = |sim(P, Qi ) > θ : ∀Qi ∈ Ej , i ∈ [1..m] , j ∈ [1..n] | (2) Question 13 02-02 where n is the number of the previous exams and m is the number of Topic 3 questions in Ej . Topic 4 Topic 5 5 CONCLUSION 2 Constraints Add Delete Edit The paper has proposed an approach that exploits configuration and cstr1: Question 2 requires Question 1 cstr2: Question 6 requires Question 4 recommendation techniques to counteract exam cheating. Thank to cstr3: Question 4 Question 9 cstr4: Category 3 excludes excludes Category 5 question and exam configuration mechanisms, our approach is able to cstr5: Question 10 excludes Question 11 generate a large number of exam instances, which assures the distri- bution of different exams to students. Supported by a content-based Figure 8: Mock-up for exam configuration, consisting of five parts: recommendation algorithm, our approach also helps to generate ex- Part 1 - Exam Editor, Part 2 - Constraint Editor, Part 3 - Resource ams that are different from previous exams. This way, it can prevent Constraints Settings, Part 4 - Question Complexity Settings, and Part students from dishonesty behaviors regarding item harvesting, item 5 - Question Type Settings. The content in these Parts is related to pre-knowledge, and item memorizing. the feature model depicted in Figure 6. Our approach, however, shows some limitations. Automated ques- tion and exam generation could trigger issues regarding the precise- straints between topics and questions. Settings placed in the right- ness of generated questions and exams, emerging as a gap to be hand slide allow an examiner to specify further constraints regarding bridged within the scope of future work. Although we have devel- resource constraints (Part 3), question complexity constraints (Part oped mock-ups to support examiners’ question and exam generation 4), and the distribution of question types in the exam (Part 5). Besides processes, the implementation of an exam generator prototype is still these parts, the mockup allows an examiner to specify the number of needed to further analyze user needs, the applicability of the pro- exam instances (e.g., #Exam instances = 120) and how much each posed mock-ups, and the effectiveness of our approach. exam instance similar to previous exams (e.g., %Similar to previous Future work will include the analysis of the applicability of exams < 20%). the presented concepts in the exam configuration domain (e.g., we will identify a complete set of typically relevant domain con- 4 RECOMMENDATION ALGORITHM straints) as well as in further multi- configuration scenarios. Fur- thermore, we will analyze new user interfaces and interaction re- As mentioned in Section 1, to counteract exam cheating, besides in- quirements triggered by the application of multi-configuration con- creasing the question bank, the exam generation process should be cepts. The knowledge representation concepts discussed within supported by a recommendation mechanism that helps to select ex- the context of our exam configuration scenario are currently in- ams that are less similar to previous exams as much as possible. To tegrated into the K NOWLEDGE C HECK R elearning environment address this goal, we use a content-based recommendation approach (www.knowledgecheckr.com) [13]. Our major motivation is to in- that filters exams based on the similarity between the questions of crease the flexibility of exam generation but also to counteract cheat- the generated exams and the questions of previous exams. ing in online exams through an increased exam variability. In cases Given a set of question instances, we need to specify instances where individual user requirements induce an inconsistency with the that have been used in previous exams as well as their frequency. exam model constraints, we propose the application of model-based Instances that were frequently used in the previous exams should be diagnosis concepts [5, 9, 10] which can help to deter- mine mini- omitted. To do this, for a question instance P , we need to calculate mal conflict resolutions that also take into account aspects such as the frequency fP of instance P to be used in a previous exam Ej . fairness and representativeness of the remaining questions. We first build the profile for the question using a vector space model [32]. The question is represented as a n-dimensional vector, in which each dimension corresponds to a term. The value of each term is ACKNOWLEDGMENTS the frequency of the term appearing in the question. The similarity The presented work has been conducted in the PAR XC EL project between P and a question Qi in exam Ej is calculated using cosine funded by the Austrian Research Promotion Agency (880657). References [23] Hicham Hage and Esma Aı̈meur, ‘Exam question recommender sys- tem’, in Proceedings of the 2005 Conference on Artificial Intelligence [1] Gediminas Adomavicius and Alexander Tuzhilin, ‘Toward the next in Education: Supporting Learning through Intelligent and Socially In- generation of recommender systems: A survey of the state-of-the-art formed Technology, p. 249–257, NLD, (2005). IOS Press. and possible extensions’, IEEE Trans. on Knowl. and Data Eng., 17(6), [24] Lothar Hotz, Alexander Felfernig, Markus Stumptner, Anna Ryabokon, 734–749, (Jun. 2005). Claire Bagley, and Katharina Wolter, ‘Chapter 6 - Configuration [2] Helaine Alessio, Nancy Malay, Karsten Maurer, A. Bailer, and Beth Ru- Knowledge Representation and Reasoning’, in Knowledge-Based Con- bin, ‘Examining the effect of proctoring on online test scores’, Online figuration, eds., Alexander Felfernig, Lothar Hotz, Claire Bagley, and Learning, 21(1), (2017). Juha Tiihonen, 41 – 72, Morgan Kaufmann, Boston, (2014). [3] Sven Apel, Don Batory, Christian Kästner, and Gunter Saake, Feature- [25] Ulrich Junker, ‘Q UICK XP LAIN: Preferred explanations and relaxations Oriented Software Product Lines: Concepts and Implementation, for over-constrained problems’, in Proceedings of the 19th National Springer Science & Business Media, 2013. Conference on Artifical Intelligence, AAAI’04, p. 167–172. AAAI [4] Don Batory, ‘Feature Models, Grammars, and Propositional Formulas’, Press, (2004). in International Conference on Software Product Lines, eds., Henk Ob- [26] Kyo Kang, Sholom Cohen, James Hess, William Novak, and A. Peter- bink and Klaus Pohl, pp. 7–20, Berlin, Heidelberg, (2005). Springer son, ‘Feature-Oriented Domain Analysis (FODA) Feasibility Study’, Berlin Heidelberg. Technical Report CMU/SEI-90-TR-021, Software Engineering Insti- [5] Razan Bawarith, Abdullah Basuhail, Anas Fattouh, and Shehab tute, Carnegie Mellon University, Pittsburgh, PA, (1990). Gamalel-Din, ‘E-exam cheating detection system’, International Jour- [27] Viet-Man Le, Thi Ngoc Trang Tran, and Alexander Felfernig, ‘A con- nal of Advanced Computer Science and Applications, 8(4), 176–181, version of feature models into an executable representation in microsoft (2017). excel’, in Intelligent Systems in Industrial Applications, eds., Martin [6] David Benavides, Sergio Segura, and Antonio Ruiz-Cortés, ‘Automated Stettinger, Gerhard Leitner, Alexander Felfernig, and Zbigniew W. Ras, Analysis of Feature Models 20 Years Later: A Literature Review’, In- pp. 153–168, Cham, (2021). Springer International Publishing. formation Systems, 35(6), 615 – 636, (2010). [28] Greg Linden, Brent Smith, and Jeremy York, ‘Amazon.com recommen- [7] James Bennett and Stan Lanning, ‘The netflix prize’, in Proceedings of dations: Item-to-item collaborative filtering’, IEEE Internet Computing, KDD Cup and Workshop, pp. 3–6, New York, (2007). ACM. 7(1), 76–80, (Jan. 2003). [8] Robin Burke, Alexander Felfernig, and Mehmet H. Göker, ‘Recom- [29] Donald McCabe, ‘Cheating on tests: How to do it, detect it, and pre- mender systems: An overview’, AI Magazine, 32(3), 13–18, (Jun. vent it (review)’, The Journal of Higher Education, 73, 297–298, (Jan. 2011). 2002). [9] Mason Chen, ‘Detect multiple choice exam cheating pattern by apply- [30] Donald L. McCabe, Kenneth D. Butterfield, and Linda Klebe Treviño, ing multivariate statistics’, in Proceedings of the International Confer- ‘Academic dishonesty in graduate business programs: Prevalence, ence on Industrial Engineering and Operations Management, pp. 173– causes, and proposed action’, Academy of Management Learning and 181, Bogota, Colombia, (Oct. 2017). Education, 5(3), 294–305, (Sep. 2006). [10] Chia Yuan Chuang, Scotty D. Craig, and John Femiani, ‘Detecting [31] James Moten Jr, Alex Fitterer, Elise Brazier, Jonathan Leonard, and probable cheating during online assessments based on time delay and Avis Brown, ‘Examining online college cyber cheating methods and head pose’, Higher Education Research & Development, 36(6), 1123– prevention measures’, Electronic Journal of e-Learning, 11, 139–146, 1137, (2017). (2013). [11] Gregory J. Cizek and James A. Wollack, eds., Detecting Potential Col- [32] Michael J. Pazzani and Daniel Billsus, Content-Based Recommenda- lusion among Individual Examinees using Similarity Analysis, chap- tion Systems, 325–341, Springer Berlin Heidelberg, Berlin, Heidelberg, ter 3, 47–69, Routledge, Oct. 2016. 2007. [12] Krzysztof Czarnecki, Simon Helsen, and Ulrich Eisenecker, ‘Formaliz- [33] Charles Prud’homme, Jean-Guillaume Fages, and Xavier Lorca, Choco ing cardinality-based feature models and their specialization’, Software Solver Documentation, TASC, INRIA Rennes, LINA CNRS UMR Process: Improvement and Practice, 10(1), 7–29, (2005). 6241, COSLING S.A.S., 2016. [13] Jennifer P. Davis. Using data forensics to detect cheating: An illustra- [34] Hong Qian, Dorota Staniewska, Mark Reckase, and Ada Woo, ‘Using tion, 2018. response time to detect item preknowledge in computer-based licen- [14] Seife Dendir and R. Maxwell, ‘Cheating in online courses: Evidence sure examinations’, Educational Measurement: Issues and Practice, from online proctoring’, Computers in Human Behavior Reports, 2, 35, n/a–n/a, (Feb. 2016). 100033, (Aug. 2020). [35] Raymond Reiter, ‘A theory of diagnosis from first principles’, Artificial [15] Martin Dick, Judy Sheard, Cathy Bareiss, Janet Carter, Donald Joyce, Intelligence, 32(1), 57–95, (1987). Trevor Harding, and Cary Laxer, ‘Addressing student cheating: Defini- [36] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor, tions and solutions’, SIGCSE Bull., 35(2), 172–184, (Jun. 2002). Recommender Systems Handbook, Springer-Verlag, Berlin, Heidelberg, [16] George M. Diekhoff, Emily E. LaBeff, Robert E. Clark, Larry E. 1st edn., 2011. Williams, Billy Francis, and Valerie J. Haines, ‘College cheating: Ten [37] Neil C. Rowe, ‘Cheating in online student assessment: Beyond pla- years later’, Research in Higher Education, 37, 487–502, (1996). giarism’, Online Journal of Distance Learning Administration, 7(2), [17] Alexander Felfernig, David Benavides, José Galindo, and Florian (2004). Reinfrank, ‘Towards Anomaly Explanation in Feature Models’, in [38] Thi Ngoc Trang Tran, Müslüm Atas, Alexander Felfernig, and Martin ConfWS-2013: 15th International Configuration Workshop (2013), vol- Stettinger, ‘An overview of recommender systems in the healthy food ume 1128, pp. 117–124, (Aug. 2013). domain’, Journal of Intelligent Information Systems, 50(3), 501–526, [18] Alexander Felfernig and Robin Burke, ‘Constraint-based recommender (Jun. 2018). systems: Technologies and research issues’, in Proceedings of the 10th [39] Thi Ngoc Trang Tran, Alexander Felfernig, Christoph Trattner, and International Conference on Electronic Commerce, ICEC’08, pp. 1–10, Andreas Holzinger, ‘Recommender systems in the healthcare domain: New York, NY, USA, (2008). ACM. state-of-the-art and research issues’, Journal of Intelligent Information [19] Alexander Felfernig, Gerhard Friedrich, Dietmar Jannach, Christian Systems, 1–31, (Dec. 2020). Russ, and Markus Zanker, ‘Developing Constraint-Based Applications [40] Edward Tsang, Foundations of Constraint Satisfaction, Academic with Spreadsheets’, in Developments in Applied Artificial Intelligence, Press, London, 1993. eds., Paul W. H. Chung, Chris Hinde, and Moonis Ali, volume 2718 of [41] W. J. van der Linden and Guo Fanmin, ‘Bayesian procedures for iden- IEA/AIE 2003, pp. 197–207, Berlin, Heidelberg, (2003). Springer. tifying aberrant response-time patterns in adaptive testing’, Psychome- [20] Alexander Felfernig, Viet Man Le, and Trang Tran, ‘Supporting feature trika, 73, 365–384, (2008). model-based configuration in microsoft excel’, in 22nd International [42] George Watson and James Sottile, ‘Cheating in the digital age: Do Configuration Workshop, (2020). students cheat more in online courses?’, Online Journal of Distance [21] Alexander Felfernig, Monika Schubert, and Christoph Zehentner, ‘An Learning Administration, (Jan. 2010). efficient diagnosis algorithm for inconsistent constraint sets’, Artif. In- [43] Qian Zhang, Jie Lu, and Zhang Guangquan, ‘Recommender systems in tell. Eng. Des. Anal. Manuf., 26(1), (Feb. 2012). e-learning’, J Smart Environ Green Comput, 1, 76–89, (Jun. 2020). [22] Joanna Golden and Mark Kohlbeck, ‘Addressing cheating when using test bank questions in online Classes’, Journal of Accounting Educa- tion, 52(C), (2020).