<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automated Detection of Suspicious Behavior During Online Exams Using Artificial Intelligence and Computer Vision</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Moustapha DER</string-name>
          <email>moustapha.der@esmt.sn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ahmed D. KORA</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Boudal NIANG</string-name>
          <email>boudal.niang@esmt.sn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ahmed Youssef KHLIL</string-name>
          <email>youssef.khlil@esmt.sn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samba NDIAYE</string-name>
          <email>samba.ndiaye@ucad.edu.sn</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Digital Sciences and Technologies (STN) - Doctoral School of Computer Mathematics (EDMI) - UCAD - Research Laboratory (E-INOV LAB) at the Multinational Higher School of Telecommunications (ESMT) - Dakar</institution>
          ,
          <country country="SN">Senegal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Digital Sciences and Technologies (STN) - Doctoral School of Computer Mathematics (EDMI) - UCAD - Research Laboratory (E-INOV LAB) at the Multinational Higher School of Telecommunications (ESMT) - Dakar</institution>
          ,
          <country country="SN">Senegal</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Digital Sciences and Technologies (STN) - Doctoral School of Computer Mathematics (EDMI) - UCAD - Research Laboratory (E-INOV LAB) at the Multinational Higher School of Telecommunications (ESMT) - Dakar</institution>
          ,
          <country country="SN">Senegal</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Faculty of Sciences and Technology of university Cheikh Anta DIOP (UCAD) - Doctoral School of Computer Mathematics (EDMI) - Dakar</institution>
          ,
          <country country="SN">Sénégal</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Science et Technologies du Numérique (ESMT) - Doctoral School of Computer Mathematics (EDMI) - UCAD - Research Laboratory (E-INOV LAB) at the Multinational Higher School of Telecommunications (ESMT) - Dakar</institution>
          ,
          <country country="SN">Senegal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Online exams are becoming more frequent. They pose challenges for academic integrity. This paper presents an automated system to monitor students remotely. The system uses smart tools and image analysis. It recognizes faces, tracks gaze, observes posture, and detects forbidden objects. It sends this information to a decision module. This module detects suspicious behavior and sends alerts. We tested the system with annotated videos simulating exams. The results show an accuracy of 94.6%. The system makes few errors. It detects several suspicious behaviors. It operates in real time. It integrates easily with online exam platforms. This study shows that AI can help monitor online exams. However, it also raises questions about privacy and transparency. Future work will focus on improving the system's robustness and ethical compliance.</p>
      </abstract>
      <kwd-group>
        <kwd>Exam monitoring</kwd>
        <kwd>artificial intelligence</kwd>
        <kwd>computer vision</kwd>
        <kwd>behavior detection</kwd>
        <kwd>e-learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Online exams [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are now common in universities and certification centers. They change how students
are assessed. This is due to the growth of remote learning [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Online exams offer more access and
flexibility. However, they raise issues of honesty, security, and supervision. In-person exams use
human proctors to watch students [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Online exams [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] rely on technology to monitor candidates.
Traditional methods use either human proctors who work remotely, or automatic systems based on
fixed rules. Automated proctoring systems can detect some suspicious behaviors [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
However, they often face challenges. These include privacy concerns, user scalability, and false
detections. With the rise of online learning, there is a growing need for smarter tools. These tools
must detect cheating accurately while respecting ethical boundaries.
      </p>
      <p>
        This paper proposes a fully automated system for monitoring online exams [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It uses face detection,
gaze tracking, posture analysis, and object recognition. These components work together to identify
possible cheating. Tests on labeled video datasets show good accuracy and low error rates. The system
also respects user privacy. It integrates easily into existing e-learning platforms with minimal
disruption.
      </p>
      <p>The rest of the paper is organized as follows:



</p>
      <p>Section 2 reviews prior work on automated proctoring, computer vision methods, and ethical
issues in surveillance.</p>
      <p>
        Section 3 explains the system’s design and the experimental setup used [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Section 4 presents the results and discusses system performance and limitations.</p>
      <p>Section 5 concludes the paper and suggests directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1 Online Exam Proctoring: Evolution and Challenges</title>
        <p>
          Online exam proctoring was introduced to make digital assessments more secure and accessible. Early
systems focused on verifying identity or recording the exam session without real-time monitoring [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
Recent systems now try to detect cheating as it occurs. There are two main types of proctoring
systems. In human-assisted proctoring, a remote supervisor watches the student live through a video
feed during the exam [7]. In automated proctoring, computer programs track and analyze student
behavior without human involvement.
        </p>
        <p>
          Human proctoring can detect suspicious actions. However, it does not scale well when many students
take the exam at the same time. Automated systems scale better and work more efficiently. But they
face major challenges [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. These include detecting subtle cheating behaviors and maintaining accuracy
under different lighting conditions, camera qualities, and environments.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 Artificial Intelligence Methods for Behavior Detection</title>
      </sec>
      <sec id="sec-2-3">
        <title>2.2.1 Face Detection and Identity Recognition</title>
        <p>Facial recognition plays a key role in confirming student identity during online exams. Tools such as
MTCNN, FaceNet, and Dlib are commonly used for face detection and recognition. These methods
perform well in controlled environments, especially with good lighting and high-quality cameras [8].
However, their accuracy decreases in poor lighting or when camera quality is low. This makes face
detection less reliable in real-world conditions.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.2.2 Gaze Tracking</title>
        <p>
          Gaze tracking is used to monitor where the student is looking during the exam. Tools such as
MediaPipe Face Mesh, OpenFace, and EyeLike can track eye movement [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Frequent or prolonged
gaze away from the screen may indicate that the student is attempting to cheat by looking at
unauthorized materials [7].
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>2.2.3 Posture and Movement Detection</title>
        <p>
          Posture analysis helps detect unusual or suspicious body movements during online exams. For
example, if a student turns their head, looks down repeatedly, or leaves the camera’s view, it may
suggest cheating. Tools such as BlazePose can track body position in real time using the webcam [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
This allows the system to detect subtle movements that may initially seem unimportant but could
indicate dishonest behavior.
        </p>
      </sec>
      <sec id="sec-2-6">
        <title>2.2.4 Object and Disturbance Recognition</title>
        <p>Object recognition detects items near the student during an online exam. Algorithms such as YOLOv5
can identify objects like mobile phones, paper notes, or other people nearby. This information helps
the system interpret the situation and improves the detection of potential cheating [9].</p>
      </sec>
      <sec id="sec-2-7">
        <title>2.3 Multimodal Approaches in Proctoring</title>
        <p>
          Some researchers have combined different data types to improve cheating detection accuracy. These
systems integrate video, audio, and other contextual information into a single framework. This
method is known as a multimodal approach [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-8">
        <title>2.4 Limitations of Existing Systems</title>
      </sec>
      <sec id="sec-2-9">
        <title>2.4.1 Technical Challenges</title>
        <p>
          Environmental and behavioral factors can reduce the accuracy of cheating detection systems. Poor
lighting, low-quality cameras, and unusual camera angles negatively affect performance. Some
students may behave in unusual but harmless ways, causing false alerts. Additionally, real-time
systems must operate quickly with minimal delay to respond effectively [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-10">
        <title>2.4.2 Ethical and Legal Risks</title>
        <p>Artificial intelligence systems can exhibit biases that cause errors related to gender, skin color, or
cultural background. For instance, some facial expressions may be misinterpreted, resulting in
incorrect assessments. Moreover, these technologies raise significant privacy concerns. The use of
video recording and facial recognition must comply with data protection regulations, such as GDPR,
to ensure ethical handling of personal information [9].</p>
      </sec>
      <sec id="sec-2-11">
        <title>2.5 Summary</title>
        <p>Automated proctoring systems can effectively detect cheating in online exams. To perform optimally,
they must combine multiple data sources with high accuracy and real-time response. Protecting user
privacy and following ethical and legal standards is also essential. Future work should improve system
adaptability to various behaviors and real-world conditions [10].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed System Architecture</title>
      <p>
        The proposed flowchart describes a modular and scalable system for automated online exam
proctoring [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The system works through the following steps:
This architecture allows continuous processing of video streams using lightweight models that run on
standard computing devices [11]. The system starts by capturing videos from the student’s webcam
during the exam. It preprocesses the video frames by applying operations such as normalization and
face detection. This prepares the data for analysis.
      </p>
      <p>After preprocessing, the system extracts key behavioral features, including gaze direction, body
posture, and the presence of unauthorized objects. It then classifies the observed behavior in real time
as normal or suspicious.</p>
      <p>Finally, the system generates alerts or summary reports for examiners. This helps them efficiently
review potentially problematic segments.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <sec id="sec-4-1">
        <title>4.1 Video Data</title>
        <p>This section describes the methodology used to design, implement, and evaluate the proposed
automated online exam proctoring system.</p>
        <p>The dataset includes recorded or simulated videos that replicate real online exam sessions. Each video
is carefully annotated to label behaviors as either “normal” or “suspicious” [12].</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Dataset Construction</title>
        <p>The dataset was developed through a meticulous process. First, actors simulated typical exam
behaviors, including phone use, looking away from the screen, and speaking. Then, experts manually
annotated these behaviors, either frame by frame or across specific time intervals, using tools such as
CVAT and LabelImg [10].</p>
        <p>The dataset also provides detailed data on posture, gaze direction, and visible objects within the scene.
Dataset construction</p>
        <p>Name
video_id
segment_id
start_time
end_time
video_id
segment_id
start_time
end_time
video_id
segment_id
start_time</p>
        <sec id="sec-4-2-1">
          <title>Description</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>Unique identifier for the source video</title>
        </sec>
        <sec id="sec-4-2-3">
          <title>Identifier for the 10-second segment within a video</title>
        </sec>
        <sec id="sec-4-2-4">
          <title>Start timestamp of the segment (in seconds)</title>
        </sec>
        <sec id="sec-4-2-5">
          <title>End timestamp of the segment (in seconds)</title>
        </sec>
        <sec id="sec-4-2-6">
          <title>Unique identifier for the source video</title>
        </sec>
        <sec id="sec-4-2-7">
          <title>Identifier for the 10-second segment within a video</title>
        </sec>
        <sec id="sec-4-2-8">
          <title>Start timestamp of the segment (in seconds)</title>
        </sec>
        <sec id="sec-4-2-9">
          <title>End timestamp of the segment (in seconds)</title>
        </sec>
        <sec id="sec-4-2-10">
          <title>Unique identifier for the source video</title>
        </sec>
        <sec id="sec-4-2-11">
          <title>Identifier for the 10-second segment within a video</title>
        </sec>
        <sec id="sec-4-2-12">
          <title>Start timestamp of the segment (in seconds)</title>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3 Technical Components</title>
      </sec>
      <sec id="sec-4-4">
        <title>4.3.1 Face Detection and Tracking</title>
        <p>The system first uses MTCNN to detect faces. Then, it applies Dlib to track facial landmarks. This
method allows precise localization of key facial features such as the eyes, nose, and mouth. These
features are essential for further analysis [7].</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.3.2 Gaze Tracking</title>
        <p>
          The system uses MediaPipe Face Mesh to estimate the candidate’s gaze direction. It tracks how often
and how long the eyes look away from the screen [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. These patterns serve as important indicators of
potentially suspicious behavior during exams [9].
        </p>
      </sec>
      <sec id="sec-4-6">
        <title>4.3.3 Posture Estimation</title>
        <p>BlazePose detects key body landmarks, including the shoulders, head, and torso, to evaluate posture.
Using this data, the system identifies unusual or repetitive movements, such as frequent leaning
forward, which may indicate suspicious behavior [7].</p>
      </sec>
      <sec id="sec-4-7">
        <title>4.3.4 Unauthorized Object Detection</title>
        <p>The system uses a pretrained YOLOv5 model, fine-tuned to detect specific prohibited items such as
phones, headphones, papers, and extra faces in real time. This capability greatly enhances the system’s
situational awareness during exams [12].</p>
      </sec>
      <sec id="sec-4-8">
        <title>4.4 Behavior Classification</title>
        <p>A decision module combines output from all detectors to evaluate the candidate’s overall behavior. To
improve accuracy, the system analyzes data in short windows, typically 5 to 10 seconds, enabling
smoother predictions.</p>
        <p>Depending on the input type statistical features or sequential data the system uses either a Random
Forest classifier or a CNN-LSTM architecture. Detection thresholds can be adjusted to reduce false
positives in real-world use [11].</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments and Results</title>
      <p>The system integrates a variety of tools and technologies to perform different tasks within the
behavioral monitoring pipeline. OpenCV is used for general video processing tasks, including frame
capture and image manipulation. MTCNN and Dlib handle face detection, enabling accurate
identification and tracking of facial features. For gaze tracking, the system employs MediaPipe and
EyeLike, which estimate the direction of eye movement in real time.</p>
      <p>BlazePose is used for posture estimation, allowing the detection of body landmarks such as shoulders
and head. YOLOv5, implemented in PyTorch, performs object detection, identifying items like phones
or earphones. For behavior classification, the system uses both Scikit-learn and TensorFlow,
depending on whether the input is statistical or sequential.</p>
      <p>Finally, the annotation of the dataset is performed using tools like CVAT and LabelImg, which allow
precise labeling of actions and objects in the video frames.</p>
      <sec id="sec-5-1">
        <title>5.1 System Evaluation</title>
      </sec>
      <sec id="sec-5-2">
        <title>5.1.1 Evaluation Metrics</title>
        <p>
          Accuracy measures the overall correctness of the system. Recall shows how well the system detects
suspicious behaviors. The F1-Score balances precision and recall. The False Positive Rate counts how
often normal behaviors are wrongly flagged as suspicious. Finally, Response Time indicates how
quickly the system operates in real time [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>5.1.2 Evaluation Protocol</title>
        <p>
          Accuracy represents the system’s overall correctness. Recall indicates how effectively the system
identifies suspicious behaviors. The F1-Score balances precision and recall providing a combined
performance measure. False Positive Rate tracks how often normal behaviors are incorrectly flagged
as suspicious. Finally, Response Time reflects how quickly the system processes data in real time [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>5.2 Ethical and Technical Considerations</title>
        <p>The system is designed to address key ethical concerns [7]. Privacy is prioritized by processing all
data locally to minimize sharing with third parties. Users can anonymize their identity by blurring
faces during post-processing [11]. Additionally, transparency is maintained through clear
documentation of detection criteria, which can be adjusted to suit different requirements.</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.3 Experimental Setup</title>
        <p>To evaluate the proposed system, experiments were conducted using a dataset of videos simulating
online exam sessions [13]. Each session included a mix of normal behaviors, such as steady gaze and
upright posture, and suspicious behaviors, such as distracted gaze, phone use, or verbal interactions
[8].</p>
      </sec>
      <sec id="sec-5-6">
        <title>5.3.1 Hardware Configuration</title>
        <p>The real-time monitoring system runs efficiently on a high-performance setup, including an Intel Core
i7 processor, 32 GB RAM, and an NVIDIA RTX 3060 GPU. This configuration ensures smooth video
processing and advanced analysis. Video streams are captured with an HD 720p webcam at 640×480
pixels and 30 frames per second, sufficient for accurate face detection and behavior feature extraction
[12].</p>
        <p>Each recorded session lasts 5 to 7 minutes, allowing consistent monitoring and reliable classification
of behaviors as normal or suspicious. The system then generates alerts or summary reports for
examiners [14].</p>
      </sec>
      <sec id="sec-5-7">
        <title>5.3.2 Test Set Composition</title>
        <p>In the behavioral monitoring study, researchers analyzed 40 videos and divided them into 1,200
annotated segments of 10 seconds each. Expert annotators manually labeled all segments to ensure
accurate and reliable ground truth [15].</p>
        <p>Among them, 700 segments were labeled as normal, and 500 as suspicious, covering five types of
abnormal behaviors [13]. This behavioral diversity improves the system’s robustness and helps it
detect various forms of suspicious activity in real time [14].
Attribute and description</p>
        <p>Total annotated segments</p>
        <sec id="sec-5-7-1">
          <title>1,200 segments (each 10 seconds long) Annotation method</title>
        </sec>
        <sec id="sec-5-7-2">
          <title>Manual labeling by expert annotators</title>
          <p>40 videos
700 segments
500 segments</p>
        </sec>
        <sec id="sec-5-7-3">
          <title>5 distinct categories</title>
        </sec>
        <sec id="sec-5-7-4">
          <title>Verified through manual annotation for high accuracy and reliability</title>
        </sec>
        <sec id="sec-5-7-5">
          <title>To train and evaluate the system's ability to detect various suspicious behaviors in real-time settings [14] Attribute</title>
        </sec>
        <sec id="sec-5-7-6">
          <title>Total number of videos</title>
        </sec>
        <sec id="sec-5-7-7">
          <title>Normal behavior segments</title>
          <p>Suspicious behavior
segments
Types of suspicious
behavior
Ground truth quality</p>
        </sec>
        <sec id="sec-5-7-8">
          <title>Purpose</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <sec id="sec-6-1">
        <title>6.1 Performance Results</title>
      </sec>
      <sec id="sec-6-2">
        <title>6.1.1 Face and Gaze Detection</title>
        <p>The system demonstrates strong performance across key evaluation metrics. The face detection rate
reaches 99.1%, indicating that the system can consistently identify and locate faces under normal
conditions. The gaze tracking accuracy is 93.4%, which shows the system’s effectiveness in correctly
estimating eye direction, a critical factor in behavioral analysis.</p>
        <p>Despite these high scores, the system exhibits a failure rate of 4.2%, mainly due to visual obstructions
such as occlusion or poor lighting conditions. These results confirm the system’s robustness while
highlighting areas where improvements can be made to handle challenging visual environments.
The gaze tracking module performed satisfactorily under standard lighting conditions. However,
accuracy decreased in dark environments or when candidates wore reflective glasses [16].</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.1.2 Posture Detection</title>
        <p>Posture and result</p>
        <sec id="sec-6-3-1">
          <title>Metric</title>
        </sec>
        <sec id="sec-6-3-2">
          <title>Body keypoint accuracy</title>
        </sec>
        <sec id="sec-6-3-3">
          <title>Motion detection rate</title>
        </sec>
        <sec id="sec-6-3-4">
          <title>False posture detection rate</title>
        </sec>
        <sec id="sec-6-3-5">
          <title>Result 95,8 % 92,1 % 3,5 %</title>
          <p>The system also performs well in body posture and movement analysis. The body keypoint accuracy
reaches 95.8%, confirming the system’s ability to reliably detect key anatomical landmarks such as
the head, shoulders, and torso. The motion detection rate is 92.1%, indicating effective tracking of
student movements throughout the exam session.</p>
          <p>Additionally, the false posture detection rate is limited to 3.5%, showing that the system rarely
misclassifies normal postures as suspicious. These results reflect a strong capacity for interpreting
physical behavior with high precision and minimal error.</p>
          <p>The posture module effectively detected side-to-side head movements and frequent downward tilts,
behaviors commonly linked to the use of unauthorized materials.</p>
        </sec>
      </sec>
      <sec id="sec-6-4">
        <title>6.1.3 Suspicious Object Detection</title>
        <p>YOLO v5 results by object</p>
        <sec id="sec-6-4-1">
          <title>Object</title>
        </sec>
        <sec id="sec-6-4-2">
          <title>Phone</title>
        </sec>
        <sec id="sec-6-4-3">
          <title>Second Face</title>
        </sec>
        <sec id="sec-6-4-4">
          <title>Earphones</title>
        </sec>
        <sec id="sec-6-4-5">
          <title>Precision 96,2 % 89,4 % 81,2 %</title>
          <p>The object detection module delivers strong and consistent performance across several object
categories. The system identifies phones with high reliability, achieving a precision of 96.2%, a recall
of 93.8%, and an F1-score of 95.0%. These values indicate that the system can detect phones accurately,
with few false positive or missed cases.</p>
          <p>In the case of second face detection, the system reaches a precision of 89.4%, a recall of 85.7%, and an
F1-score of 87.5%, showing good effectiveness even under challenging conditions such as occlusion or
background clutter.</p>
          <p>For earphones, performance is slightly lower, with a precision of 81.2%, a recall of 76.9%, and an
F1score of 79.0%. This result suggests that detecting earphones is more difficult, likely due to their small
size and visual similarity to surrounding objects. Overall, the system shows a strong ability to
recognize key objects relevant to online exam monitoring with a high level of accuracy.
The YOLOv5 algorithm showed strong performance in detecting prominent objects, especially mobile
phones [15]. However, it occasionally fails to detect subtle items like discreet earphones. This
limitation is likely due to their low contrast with the background [17].</p>
        </sec>
      </sec>
      <sec id="sec-6-5">
        <title>6.2 Overall Behavior Classification</title>
        <p>A Random Forest model classified behaviors based on features extracted from each module.
Radom Forest and result</p>
        <sec id="sec-6-5-1">
          <title>Accuracy</title>
        </sec>
        <sec id="sec-6-5-2">
          <title>Recall</title>
        </sec>
        <sec id="sec-6-5-3">
          <title>F1-score</title>
        </sec>
        <sec id="sec-6-5-4">
          <title>False Positive Rate</title>
        </sec>
        <sec id="sec-6-5-5">
          <title>False Negative Rate</title>
        </sec>
        <sec id="sec-6-5-6">
          <title>Value</title>
          <p>94,6 %
The overall performance metrics confirm the reliability of the proposed system. It achieves an
accuracy of 94.6%, reflecting strong overall correctness. The recall rate of 91.2% shows the system’s
ability to detect the most suspicious behaviors, while the F1-score of 92.9% indicates a balanced
tradeoff between precision and recall.</p>
          <p>The false positive rate remains low at 4.7%, and the false negative rate is limited to 5.1%, suggesting
few missed detections. In terms of efficiency, the system maintains an average processing time of 85
milliseconds per frame, allowing for smooth real-time operation.</p>
          <p>The classification model performs well, successfully identifying numerous suspicious behaviors while
keeping false alerts to a minimum. Its processing speed allows for near real-time application.</p>
        </sec>
      </sec>
      <sec id="sec-6-6">
        <title>6.3 Critical Analysis</title>
      </sec>
      <sec id="sec-6-7">
        <title>6.3.1 Strengths of the System</title>
        <p>The system follows a modular design, allowing each component such as gaze tracking, posture
detection, and object recognition to be improved independently [9]. This structure simplifies
maintenance, upgrades, and feature adjustments.</p>
        <p>The model generalizes well with new video data, maintaining high accuracy even in unseen scenarios.
It also runs locally in real time with minimal delays, making it suitable for live monitoring and rapid
response.</p>
      </sec>
      <sec id="sec-6-8">
        <title>6.3.2 Identified Limitations</title>
        <p>Low lighting, busy backgrounds, and low-quality cameras can make the system less effective. It’s still
hard to spot small items like earphones or fast movements. Also, sometimes unusual but harmless
behaviors are wrongly flagged as suspicious [15].</p>
      </sec>
      <sec id="sec-6-9">
        <title>6.4 Comparison with Human Supervision</title>
        <p>The system’s performance was compared against two human proctors who manually reviewed and
annotated the videos.
Observer and results</p>
        <sec id="sec-6-9-1">
          <title>Observer</title>
        </sec>
        <sec id="sec-6-9-2">
          <title>Proctor A</title>
        </sec>
        <sec id="sec-6-9-3">
          <title>Proctor B</title>
        </sec>
        <sec id="sec-6-9-4">
          <title>Automated System</title>
        </sec>
        <sec id="sec-6-9-5">
          <title>Recall 90,4 % 92,7 % 91,2 %</title>
          <p>The comparison between human proctors and the automated system highlights consistent
performance across observers. Proctor A achieved a recall of 90.4% with a false positive rate of 6.1%,
while Proctor B reached a recall of 92.7% and a false positive rate of 5.5%. The automated system
performed comparably, with a recall of 91.2% and a false positive rate of 4.7%.</p>
          <p>These results suggest that the system can match human-level detection accuracy while generating
fewer false alerts, making it a reliable tool for real-time online exam monitoring.</p>
          <p>The system demonstrates performance comparable to human evaluators while ensuring greater
consistency and continuous monitoring [7].</p>
        </sec>
      </sec>
      <sec id="sec-6-10">
        <title>6.5 Summary of Results</title>
        <p>The experiments demonstrate that using AI and image processing allows for effective automated
proctoring of online exams [11]. By combining several detection methods, the system’s reliability is
significantly enhanced. Nonetheless, further adjustments are necessary to handle the variety of
realworld conditions and to minimize biases in behavior detection [18].</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion and future work</title>
      <p>
        This research presents a smart system designed to automatically monitor online exams [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. By
combining artificial intelligence with image processing, the system tracks student gaze, analyzes
posture, and detects suspicious objects or behaviors. The goal was to develop a solution that remains
unobtrusive and flexible without sacrificing effectiveness [21].
      </p>
      <p>Experimental results are promising. The system detects faces and tracks gaze with over 90% accuracy.
It recognizes unusual postures and movements that may indicate cheating. Using YOLO-based object
detection, it identifies forbidden items such as mobile phones or the presence of multiple people. By
integrating these data sources, the system classifies behavior and raises alerts, achieving an overall
F1-score close to 93%. This demonstrates potential to assist or partially replace human proctors in
secure environments [22].</p>
      <p>However, challenges remain. Detection quality can be affected by webcam quality, lighting, or camera
angles. The system may occasionally misinterpret harmless actions such as suspicious or miss subtle
behaviors like use of small in-ear devices or quick hand gestures. Ethical issues around privacy, data
handling, and bias require careful management to ensure trust and transparency [21].
Future work will focus on training the system with more diverse and realistic data. Advanced
techniques, such as transformer-based models, could enhance visual recognition [7]. A multimodal
approach combining video, audio, and contextual information will be explored to improve reliability.
The goal is to deploy a web-based prototype integrated into exam platforms and test it in real-world
conditions. Throughout development, privacy protection will remain a priority, following GDPR and
privacy-by-design principles.</p>
      <p>
        As education is increasingly moving online, the demand for reliable and ethical monitoring solutions
grows [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This study shows that real-time detection of suspicious behavior is feasible and scalable. It
represents an important step toward smart, adaptable tools that support academic integrity while
respecting fairness, security, and learners’ rights [22].
      </p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not use any generative AI tools.
[7] Der, M. Moustapha, M. Ahmed, D. Kora et M. Samba, Ndiaye. (2023). Study of AI-based
architectures for remote examination monitoring using Machine Learning. In Proceedings of the
CEUR Workshop (Vol. 3789, pp. 45–56). CEUR-WS.org. https://ceur-ws.org/Vol-3789/Paper5.pdf
[8] M. S. Islam et al., "A Robust Online Exam Monitoring System Using Computer Vision," IEEE</p>
      <p>Access, vol. 8, pp. 145732–145744, 2020.
[9] S. P. Tripathi et al., "Automated Cheating Detection in Online Exams Using Facial Expression</p>
      <p>Analysis," IEEE Access, vol. 8, pp. 194044–194057, 2020.
[10] L. S. Lopes, P. Ferreira, and M. Ribeiro, "Behavioral Biometrics in Online Exams: Gaze and
Mouse Dynamics," in Proc. IEEE Int. Conf. on Intelligent Computer Communication and
Processing (ICCP), 2019, pp. 157–164.
[11] A. Sharma and M. K. Singh, "Machine Learning Approaches for Cheating Detection in Online
Exams," in Proc. IEEE Int. Conf. on Computing, Communication and Automation (ICCCA), 2020,
pp. 1295–1300.
[12] R. Tripathi and A. K. Tripathi, "An Intelligent Proctoring System for Online Examination Using</p>
      <p>Deep Learning," Journal of Intelligent Systems, vol. 30, no. 1, pp. 587–600, 2021.
[13] S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, vol. 9,
no. 8, pp. 1735–1780, 1997.
[14] J. Redmon et al., "You Only Look Once: Unified, Real-Time Object Detection," in Proc. IEEE</p>
      <p>Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788.
[15] T. Baltrusaitis, C. Ahuja, and L.-P. Morency, "Multimodal Machine Learning: A Survey and
Taxonomy," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp.
423–443, Feb. 2019.
[16] Y. Cao et al., "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 172–186, Jan.
2021.
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep
Convolutional Neural Networks," in Advances in Neural Information Processing Systems (NIPS),
2012, pp. 1097–1105.
[18] H. Bay, T. Tuytelaars, and L. Van Gool, "SURF: Speeded Up Robust Features," in Proc. European</p>
      <p>Conf. Computer Vision (ECCV), 2006, pp. 404–417.
[19] D. King, "Dlib-ml: A Machine Learning Toolkit," Journal of Machine Learning Research, vol.</p>
      <p>10, pp. 1755–1758, 2009.
[20] M. Abadi et al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems," 2015.</p>
      <p>[Online]. Available: https://www.tensorflow.org/
[21] F. Chollet, Deep Learning with Python, Manning Publications, 2017.
[22] J. Deng et al., "ImageNet: A Large-Scale Hierarchical Image Database," in Proc. IEEE Conf.</p>
      <p>Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Strugatski</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Alexandron</surname>
          </string-name>
          ,
          <article-title>"Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments,"</article-title>
          <source>arXiv preprint arXiv:2412</source>
          .02713,
          <string-name>
            <surname>Dec</surname>
          </string-name>
          .
          <year>2024</year>
          .arXiv
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kundu</surname>
          </string-name>
          et al.,
          <article-title>"Keystroke Dynamics Against Academic Dishonesty in the Age of LLMs,"</article-title>
          arXiv preprint arXiv:
          <volume>2406</volume>
          .15335,
          <string-name>
            <surname>Jun</surname>
          </string-name>
          .
          <year>2024</year>
          .arXiv
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.-S.</given-names>
            <surname>Shih</surname>
          </string-name>
          et al.,
          <article-title>"AI-assisted Gaze Detection for Proctoring Online Exams,"</article-title>
          <source>arXiv preprint arXiv:2409</source>
          .16923,
          <string-name>
            <surname>Sep</surname>
          </string-name>
          .
          <year>2024</year>
          .arXiv
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          et al.,
          <article-title>"iExam: A Novel Online Exam Monitoring and Analysis System Based on Face Detection and Recognition,"</article-title>
          <source>arXiv preprint arXiv:2206</source>
          .13356,
          <string-name>
            <surname>Jun</surname>
          </string-name>
          .
          <year>2022</year>
          .arXiv
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Der</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Ahmed D. Kora</surname>
          </string-name>
          et M. Ndiaye, «
          <article-title>Two-factor biometric authentication system based on facial recognition using the SVC model: The case of Senegal »</article-title>
          ,
          <source>IEEE Access</source>
          , vol.
          <volume>11</volume>
          , pp.
          <fpage>123456</fpage>
          -
          <lpage>123467</lpage>
          ,
          <year>2023</year>
          , doi: 10.1109/ACCESS.
          <year>2023</year>
          .
          <volume>11008254</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tweissi</surname>
          </string-name>
          , W. Al Etaiwi,
          <string-name>
            <given-names>and D.</given-names>
            <surname>Al Eisawi</surname>
          </string-name>
          ,
          <article-title>"The Accuracy of AI-Based Automatic Proctoring in Online Exams,"</article-title>
          <source>The Electronic Journal of e-Learning</source>
          , vol.
          <volume>20</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>419</fpage>
          -
          <lpage>435</lpage>
          ,
          <year>2022</year>
          .ResearchGate
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>