Campus Surveillance with Violence Insight and Suspect
                                Profiling: A Survey
                                Deepak N R1,†, Santosh Kumar Paital2,∗,†,Umme Kulsum3,†, Shaima Afreen4,† and
                                Shrey Verma5,†

                                1 Professor, Department of Computer Science and Engineering, HKBK College of Engineering, Bengaluru, India

                                2345 Department of Computer Science and Engineering, HKBK College of Engineering, Bengaluru, India


                                                Abstract
                                                Campus safety is a paramount concern in educational institutions worldwide, especially in
                                                combating the prevalence of campus violence. This comprehensive review utilizes advanced
                                                image processing and computer vision technologies to proactively tackle security challenges. The
                                                appraised surveillance system integrates sophisticated methods, employing Convolutional
                                                Neural Networks (CNNs), bidirectional LSTM models, and the Convolutional 3D (C3D) network
                                                architecture. Leveraging transfer learning from ImageNet and strategic data augmentation, the
                                                system aims to refine its capabilities for robust performance. Through the integration of
                                                advanced techniques in image analysis and data processing, the system seeks to enhance security
                                                measures while respecting individual privacy. By leveraging an integrated surveillance
                                                framework, the emphasis is on delivering timely alerts to security personnel and providing
                                                comprehensive behavioural profiles. This research significantly contributes to the progression of
                                                security systems within educational contexts, emphasizing the significance of employing state-
                                                of-the-art computational methodologies. By prioritizing the creation of safer educational
                                                environments on a global scale, this approach stands as a pivotal stride in fortifying campus
                                                security measures.

                                                Keywords
                                                Computer vision, Convolutional Neural Networks, Image processing, Violence, LSTM 1


                                1. Introduction
                                The growing concern for safety within educational institutions globally, driven by the need
                                to address security challenges such as violence and unauthorized access, has prompted a
                                crucial shift. Integration of cutting-edge surveillance technologies, leveraging
                                advancements in image processing and computer vision, has emerged as a pivotal strategy
                                to fortify security measures within these environments. This exploration delves deeply into
                                potential methodologies, such as Convolutional Neural Networks (CNNs), bidirectional
                                LSTM models, and the Convolutional 3D (C3D) network architecture, aimed at enhancing


                                Symposium on Computing & Intelligent Systems (SCI), May 10, 2024, New Delhi, INDIA
                                ∗ Corresponding author.
                                † These authors contributed equally.

                                   deepaknrgowda@gmail.com (Deepak N R); santoshpaital@gmail.com (S. K. Paital);
                                ummekulsum304162@gmail.com (U. Kulsum); shaimaafreen2001@gmail.com (S. Afreen);
                                shreyverma9452@gmail.com (S. Verma)
                                    0000-0002-2766-2990 (Deepak N R)
                                           © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
surveillance capabilities tailored specifically for security purposes. This investigation
deeply explores the aspects of campus surveillance technologies, aligning closely with the
primary objectives. These objectives encompass the development of a comprehensive
surveillance system. The focus is on investigating methodologies and technological
advancements essential for establishing a robust surveillance infrastructure, including
incident detection, suspect identification, real-time suspect monitoring, and an integrated
alert system. Moreover, the exploration aims to create an accurate facial recognition system.
This involves exploring diverse methods and advancements in unauthorized face detection
systems to strengthen security measures by implementing a precise system capable of
efficiently identifying potential threats within the campus environment.
    While prioritizing security measures, this study recognizes the delicate balance between
safety and fostering a nurturing environment. It aims not only to explore surveillance
technology capabilities but also to analyse their impact on the overall campus atmosphere.
The overarching goal of these objectives is to serve as a valuable resource fostering an
understanding of cutting-edge approaches in campus surveillance. Shedding light on their
effectiveness, challenges, and potential consequences aims to significantly contribute to
bolstering campus security measures and nurturing educational environments globally.
Furthermore, there is a strong emphasis on continuously evaluating and adapting
surveillance systems. Advocating for ongoing assessment and adaptation of surveillance
technologies ensures their efficacy in addressing emerging threats, adapting to evolving
landscapes, and meeting the changing needs of educational environments. This iterative
approach fosters a culture of enhancement, ensuring the durability and effectiveness of
campus surveillance technologies.

2. Background

In the realm of computer vision, the journey from raw visual data to actionable insights is
a meticulously orchestrated sequence of steps, each playing a pivotal role in extracting
valuable information. Initiating the process, the capture of data through cameras or
sensors marks the first step. Following this, crucial preprocessing steps come into play,
acting as the foundation for subsequent analysis. These preprocessing tasks encompass a
spectrum of operations, including but not limited to, noise reduction techniques to
eliminate unwanted distortions, fine-tuning brightness levels for optimal clarity, and
preparing images in a standardized format conducive to in-depth analysis. As the journey
progresses, the preprocessed visual data traverses into the segmentation phase, where
advanced techniques like edge detection partition the images into discernible regions or
objects. This segmentation process acts as a precursor to detailed analysis by effectively
isolating specific elements within the images, laying the groundwork for a deeper dive into
individual components.
    Advancing further, the process accentuates feature extraction, a critical phase where
the system meticulously identifies and isolates crucial characteristics such as shapes or
textures within the segmented regions. This intricate extraction process plays a pivotal
role in distilling the most pertinent details essential for subsequent analysis and
interpretation, setting the stage for insightful observations.
Figure 1: Basic Concept of Computer Vision with Image Processing.

   Finally, culminating in the recognition stage, the interpreted data, derived from the
extracted features, is employed to identify objects, faces, or patterns. Leveraging cutting-
edge machine learning or deep learning algorithms, computers adeptly interpret visual
information across multifaceted applications, spanning from healthcare to autonomous
vehicles and surveillance systems. Illustrating these sequential steps in Fig. 1. The Basic
Concept of Computer Vision with Image Processing provides a comprehensive
visualization of the intricate process by which computer vision meticulously extracts,
refines, and interprets insights from raw visual data, empowering its widespread
application across diverse fields and industries.

3. Literature review
Abbass and Kang (2023) [1] conducted a thorough investigation into violence detection in
surveillance videos using the UBI Fights dataset. Their research introduced six dis- tinct
architectures, incorporating the Convolutional Block Attention Module (CBAM) with
ConvLSTM2D or Conv2D and LSTM layers, as well as integrating CBAM with established
models like ResNet50, VGG16, or MobileNet for efficient spatio-temporal feature extraction.
Through rigorous evaluation and comparison with state-of-the-art methods on the UBI
Fights dataset, their work showcased the superior performance of these architectures.
Looking ahead, the authors plan to extend their approach to additional datasets, aiming for
a more generalized capability in violence detection, thus contributing to the advancement
of surveillance video analysis techniques.
    Aktı et al (2022) [2] introduced the Social Media Fight Images (SMFI) dataset, a unique
collection comprising samples gathered from media platforms and camera recordings.
Their research underscored the model’s impressive ability to accurately differentiate
between images depicting fighting and non-fighting actions, with a particular emphasis on
the model’s resilience even when excluding a substantial portion (60%) of the dataset.
Furthermore, experiments conducted on video-based fight recognition datasets
demonstrated the model’s effective classification using randomly selected frames. These
cumulative findings position the SMFI dataset as a notable contribution to the field,
demonstrating superior accuracy, especially on previously unexplored datasets, and
indicating its potential to advance models in the domain of fight recognition.
    Aldehim et al (2023) [3] presents a groundbreaking technique called TSODL VD. This
method aims to identify instances of violence, in surveillance videos, with precision and
efficiency. By incorporating the TSO protocol as a hyperparameter enhancer for the residual
DenseNet model, the researchers have improved the effectiveness of violence detection, in
the TSODL VD procedure. Through experiments conducted on a violence dataset, the study
demonstrates that this approach achieves accurate detection results, surpassing previous
state-of-the-art methods.
    Cao et al (2023) [4] contribute to the domain of supervised Video Anomaly Detection
(VAD), with a specific focus on the NWPU Campus dataset, recognized as the largest dataset
of its kind. This dataset stands out due to its emphasis on scene-dependent anomalies,
providing a tailored frame- work for Video Anomaly Anticipation (VAA). The researchers
propose an innovative scene-conditioned model adept at effectively addressing both Video
Anomaly Detection (VAD) and Video Anomaly Anticipation (VAA) by giving priority to
anomaly management across diverse scenes. Their commitment extends beyond short-
term VAA, underscoring their dedication to advancing the understanding and application of
anomaly detection and anticipation in video data, particularly emphasizing long-term
anticipation.
Garcia-Cobo and Sanmiguel (2023) [5] introduce a cutting edge architecture for violence
detection in surveillance videos, outperforming existing models in real-time operation
through careful parameter optimization. The method seamlessly integrates pose estimation
and dynamic temporal changes to identify violence instances autonomously. The authors
stress the crucial significance of implementing this unique combination technique to
improve overall system performance, highlighting the pivotal role of integration in violence
detection methodologies. This pioneering approach represents a significant advancement
in surveillance video analysis, offering a more effective and efficient solution to address
contemporary challenges in the field.
    Ghadekar et al (2023) [6] developed an advanced violence detection system capable of
accurately identifying instances of violence in both textual content and videos. Notable for
its commendable precision and adept avoidance of overfitting, this system emerges as a
highly valuable tool for integration into surveillance systems and social media platforms,
particularly where violence and harassment are prevalent. A distinctive feature of their
work is the deliberate incorporation of diverse files into the dataset, strategically refining
the system to capture various facets of violence. This enhancement significantly strengthens
the system’s versatility, establishing it as an effective means for proactively addressing
potential forms of violence before they escalate into serious incidents.
    Husz´ar et al (2023) [7] introduced two innovative architectures, FT and TL, designed
for the classification of violence in video clips by utilizing action recognition features from
the Kinetics 400 dataset. In the FT model, optimization is performed on X3D M parameters
trained on Kinetics 400, while the TL model extracts spatio features without modifying
these parameters and incorporates training multiple fully connected layers. Evaluation
results, including dataset validation, highlight the TL model’s superior generalization to
unseen scenarios compared to the FT model. However, a recognized limitation in the
evaluation method involves the use of non-overlapping four-second video segments, which
may not adequately capture violent events spanning two segments. To address this
limitation, the authors plan to implement strategies in future work, such as adjusting
segment sizes or incorporating overlapping segments, aiming to strike a balance between
the speed and accuracy of analysing violent events.
   Kang et al (2021) [8] introduce a cutting-edge violence detection system, featuring
attention modules that strategically target spatial and temporal dimensions to group frames
effectively. Supported by comprehensive experiments, the paper underscores the seamless
integration of these modules with efficient 2D CNN backbones, establishing their efficacy.
The authors highlight a significant accomplishment with the successful deployment of a
real-time violence recognition system in a resource-limited environment, showcasing the
practical viability of their methodology. Looking ahead, they articulate plans to fortify the
model’s robustness through data utilization and explore data augmentation techniques.
Furthermore, the researchers express their commitment to expanding their investigation
into action recognition tasks, aiming to apply their versatile approach across a range of
contexts.
   Mahareek et al (2023) [9] introduced an innovative model for detecting anomalies in
surveillance videos. This model seamlessly integrates 3DCNN and ConvLSTM architectures,
effectively addressing challenges in anomaly detection. The demonstrated performance
includes flawless 100% training accuracy across five datasets. The model exhibits high
recognition reliability, achieving evaluation scores of 98.5%, 99.2%, and 94.5%. When
compared to the standalone 3DCNN, the integrated 3DCNN+ConvLSTM configuration
consistently outperforms across all datasets. The authors underscore their dedication to
research, emphasizing the model’s proactive capabilities in anticipating anomalies in
surveillance videos, representing a significant advancement in the field.
   Perseghin and Foresti (2023) [10] introduce the School Violence Detection (SVD)
system, proposing an innovative method that utilizes a 2D Convolutional Neural Network
(CNN) for categorizing actions within educational settings. Their focus on cost-effectiveness
and efficiency involves the utilization of school data to alleviate computational demands,
resulting in an impressive 95% classification accuracy achieved through techniques like
web scraping and transfer learning. Despite these accomplishments, challenges persist,
particularly in addressing image noise and acquiring images of minors. To advance the SVD
system, potential improvements may involve analysing frame time series, integrating DSB
dataset data, and collaborating with experts in psychology and education. This dual
emphasis on technological refinement and inter-disciplinary collaboration underscores the
commitment of Erica Perseghin and Gian Luca Foresti to comprehensively address the
intricacies of school violence detection.
   Siddique et al (2022) [11] proposed an extensive framework designed to identify
violence in sensitive areas by analysing video streams from surveillance cameras through
computer vision and machine learning techniques. Their emphasis lies on the significant
role played by Violent Flows (ViF) Descriptors in enhancing the accuracy of violence
detection. The researchers conducted a thorough feasibility analysis and performance
validation of diverse machine learning techniques, discovering that the combination of ViF
with weighted averaging on Linear SVM, Cubic SVM, and Random Forest techniques
produce outstanding results in detecting violence within video streams. They also
recommend exploring the integration of cloud computing to expedite video stream
processing and suggest refining the framework by incorporating a face detection module
for identifying individuals involved in activities. In summary, Singh, Preethi, et al.’s
framework employs cutting-edge technologies to ensure reliable violence detection in
surveillance scenarios, addressing technical aspects and proposing practical enhancements
for improved real- world applicability.
   Singh et al (2020) [12] presents a groundbreaking methodology for anomaly detection
in CCTV footage, integrating abnormal videos to enhance accuracy with a dual-network
system and a labelled dataset. Their resulting anomaly detection model, validated on a
dataset featuring 12 real- world anomalies, achieves an impressive accuracy rate of 97.23%,
eliminating the need for laborious annotations in abnormal segments. Addressing
overfitting concerns, their Threat Recognition Model categorizes anomalies into thirteen
classes, employing techniques such as frame concatenation and categorical cross-entropy.
This literature review underscores the substantial contributions of Singha, Singh, and their
colleagues, emphasise the practical implications of their model and the importance of timely
implementation while considering hardware constraints for computational efficiency and
cost- effectiveness. Their work represents a significant advancement in the field of anomaly
detection in CCTV footage, providing valuable insights for both researchers and
practitioners.
   Ullah et al (2023) [13] introduced a system for violence detection in surveillance videos,
leveraging computer vision and AI techniques to enhance security. Their research, spanning
from 2011 to the present, centres on the integration of neural networks (NNs) and
emphasizes the significant contributions of artificial neural networks (ANNs) in advancing
violence detection within computer vision. The study highlights the broad applications of
these advancements, particularly in improving security and safety in urban environments
and large-scale industries. Overall, their work underscores the evolving landscape of
violence detection technologies and their potential impact on diverse sectors.
   Vieira et al (2022) [14] conducted a comprehensive study focusing on the analysis and
application of cost-effective techniques employing Convolutional Neural Networks (CNNs)
for automated event recognition and classification. Their experiments revealed the
effectiveness of specific architectural parameters, resulting in an impressive classification
accuracy of up to 92.05%. To enable a detailed comparison of performance across various
CNN models, the researchers devised a prototype for an intelligent monitoring system with
a budget-friendly implementation, deployed on a Raspberry Pi as an embedded platform,
achieving a real-time frame rate of up to 4.19 FPS. Going forward, the team aims to explore
the implications of implementing mobile CNN architectures on embedded platforms.
Furthermore, they plan to enrich their dataset by incorporating videos depicting non-
violent actions in diverse settings like malls, air- ports, subways, parks, and sports stadiums,
with the goal of significantly enhancing the models’ applicability and accuracy in event
recognition and classification.

Table 1
Comparison of Various Existing System with our System


 Ref. Author                     Violence      Face      Event     Live      Unknown
 no.                             detection   Identifi-   Place   Location      Face
                                              cation     Alert    Alert        Alert
 1     Abbass et al(2023)           Yes        No         No       No           No
 2     Aktı et al (2022)            Yes        No         No       No           No
 3     Aldehim et al (2023)         Yes        No         No       No           No
 4     Cao et al (2023)             Yes        No         No       No           No
 5     G. Cobo et al (2023)         Yes        No         No       No           No

 6     Ghadekar et al (2023)        Yes        Yes        No       No           No
 7     Husz´ar et al (2023)         Yes        No         No       No           No
 8     Kang et al (2021)            Yes        No         Yes      No           No
 9     Mahareek et al (2023)        Yes        No         No       No           No
 10    Perseghin et al(2023)        Yes        No         Yes      No           No
 11    Siddique et al (2022)        Yes        No         Yes      No           No
 12    Singh et al (2020)           Yes        No         No       No           No
 13    Ullah et al (2023)           Yes        No         No       No           No
 14    Vieira et al (2022)          Yes        No         No       No           No
 15    Ye et al (2021)              Yes        No         No       No           No


   Ye et al (2021) [15] introduced an inventive method for campus violence detection by
combining video sequence analysis with speech emotion evaluation. Through simulated
scenarios, they utilized video samples from surveillance cameras representing both violent
and non-violent situations, alongside speech samples from three databases for
comprehensive testing. The study showcased an impressive recognition accuracy of 97%,
outperforming existing methods when applied to the amalgamated video and emotional
databases. A noteworthy contribution surfaced in the development of a fusion algorithm,
showcasing a significant 10.79% enhancement in accuracy compared to previous fusion
rules. In their future research agenda, Liang Ye and Tong Liu outlined plans to explore
skeleton-based activity recognition methods for campus violence detection, aiming to
compare them with established body-based approaches. The manuscript also proposed
integrating emotion recognition to augment activity recognition and vice versa, indicating
a promising avenue for continued research in this field.
    The Table 1: Comparison of Various Existing System with our System presents a
comprehensive snapshot of key components within security systems, encompassing
violence detection, face identification, event place alert, live location alert, and unknown
face identification. All existing systems, as exemplified in Table 1, have made notable strides
in the realm of violence detection, showcasing diverse methodologies. These systems
contribute valuable insights into the development of sophisticated algorithms and
techniques for accurately identifying and responding to violent incidents. It takes Ghadekar
et al (2023) [6] the lead in addressing face identification, shedding light on advancements
in facial recognition technology. Moreover, event place alert systems are explored in Kang
et al (2021) [8], Perseghin and Foresti (2023) [10] and Siddique et al (2022) [11],
highlighting the significance of targeted threat response in specific locations. However,
despite the progress in these areas, current existing systems exhibit a gap in the crucial
domains of live location alert and unknown face identification. This underscores the need
for innovative solutions, prompting our proposed objectives to pioneer real- time live
location alerts and robust unknown face identification capabilities, thereby enhancing the
overall efficacy of security measures in diverse contexts.

4. Proposed Work
Our proposed approach to address the gap identified in the literature review by devel- oping
a surveillance system, for campuses. This system focuses on preventing entry and detecting
activities through real-time analysis of CCTV camera footage. Firstly, we continuously
extract frames from the CCTV feeds, allowing us to analyse the data in depth. For identifying
entry, we employ object detection techniques to detect individuals entering the campus
without authorization. Additionally, we utilize facial recognition algorithms to identify
authorized personnel and promptly alert the system with information about when and
where the unauthorized entry occurred.
    Simultaneously, our methodology extends to analysing frames for detecting violence
using advanced computer vision algorithms and anomaly detection models. When violent
activities are detected, a series of steps are triggered for identification and recognition.
Facial recognition algorithms play a role in identifying individuals involved in acts while
tracking mechanisms monitor their movements across frames, creating a timeline of the
incident. Real-time alerts are then activated, providing notifications to security personnel.
It is crucial to note that these alerts contain information, about both the location where the
violence took place and where suspected individuals are currently located, greatly
enhancing the system’s responsiveness.
    To improve our system continuously, we integrate machine learning algorithms
ConvLSTM, CNN Abbass and Kang (2023) [1] that enhance the accuracy of detecting entries
and violent activities. This involves updating our database of personnel and refining
recognition models through ongoing analysis. We prioritize privacy and ethics by
implementing measures to comply with standards, such, as anonymizing data and
restricting access to information. Our methodology also emphasizes documentation and
reporting generating incident reports maintaining audit logs and taking an approach to
continuously enhancing campus surveillance. This comprehensive methodology represents
an advancement in surveillance effectively addressing the identified gap and establishing a
proactive approach, to campus security.
    The strategy to overhaul campus safety involves creating a sophisticated surveillance
system (depicted in Fig. 2. Proposed Methodology Flowchart) that utilizes advanced image
processing and computer vision technologies. This theoretical analysis responds to the
urgent need to address campus violence in educational institutions worldwide. The
proposed surveillance system incorporates a fusion of sophisticated methods including
Convolutional Neural Networks (CNNs) Abbass and Kang (2023) [1] Kang et al (2021) [8]
Perseghin and Foresti (2023) [10] Vieira et al (2022) [14], bidirectional LSTM models,
Convolutional 3D (C3D) networks Mahareek et al (2023), Campus Block based Location
with GPS, Haar Cascade for face detection, and LBPH algorithm for person recognition.
LSTMs Abbass and Kang (2023) [1] are a type of recurrent neural network (RNN) designed
to process and analyse sequences of data, effectively capturing long-term dependencies. In
the context of campus safety, LSTMs can be utilized for behavioural analysis and pattern
recognition within video sequences.
    Unlike standard RNNs, LSTMs Abbass and Kang (2023) possess a more sophisticated
memory cell structure, allowing them to retain and selectively forget information over
extended periods. In this surveillance system, bidirectional LSTM Abbass and Kang (2023)
[1] Mahareek et al (2023) [9] models are employed to understand and analyse the temporal
patterns in video data collected from CCTV cameras across the campus. This model excels
in comprehending sequences of frames, identifying behavioural patterns, and detecting
anomalous activities that may signify potential threats or violent behaviour. By leveraging
LSTM’s ability to capture temporal dependencies, it aids in recognizing suspicious
behavioural sequences, thus contributing to the violence detection and suspect
identification processes. CNNs Abbass and Kang (2023) [1] Kang et al (2021) [8] Perseghin
and Foresti (2023) [10] Vieira et al (2022) [14], are specialized neural networks primarily
used for image processing tasks, such as object recognition and feature extraction. They
consist of multiple layers, including convolutional layers, pooling layers, and fully connected
layers, enabling them to learn hierarchical representations of visual data. In the context of
campus safety, CNNs Abbass and Kang (2023) [1] Kang et al (2021) [8] Perseghin and
Foresti (2023) [10] Vieira et al (2022) [14], play a crucial role in the violence detection
phase. The surveillance system employs Convolutional Neural Networks to analyse the
frames extracted from the video data. These networks utilize convolutional layers to extract
meaningful spatial features from each frame, recognizing patterns indicative of violent
behaviour. The integration of transfer learning from ImageNet and strategic data
augmentation techniques further refines CNN’s ability to identify violence-related cues
accurately.
    By integrating both LSTM Abbass and Kang (2023) [1], and CNN Abbass and Kang (2023)
[1] Kang et al (2021) [8] Perseghin and Foresti (2023) [10] Vieira et al (2022) [14],
architectures, the surveillance system can effectively process and interpret video data in a
holistic manner. The LSTM Abbass and Kang (2023) [1] models contribute to understanding
temporal dynamics and behavioural patterns, while CNNs Abbass and Kang (2023) [1] Kang
et al (2021) [8] Perseghin and Foresti (2023) [10] Vieira et al (2022) [14], excel in extracting
spatial features and identifying violent behaviour within individual frames. Their combined
functionality enhances the system’s capability for violence detection, suspect identification,
and real-time alert generation, contributing significantly to campus safety measures. The
methodology commences with the acquisition of data through CCTV cameras installed
across the campus.


Figure 1: Proposed Methodology Flowchart.

   The collected video data undergoes a frame extraction process, subsequently subjected
to violence detection algorithms via sophisticated image processing techniques. Upon
detecting violence, the system triggers suspect identification processes employing Haar
Cascade for face detection and the LBPH algorithm for person recognition. Upon successful
suspect identification, the system generates real-time alerts notifying security personnel
about the suspect presence in the particular camera feed. Incorporating the IPStack
algorithm facilitates GPS Location data retrieval, contributing to comprehensive suspect
monitoring. Iterative refinement of the system involves continual data enhancement and
model optimization. Emphasis is placed on seamless integration with existing surveillance
infrastructure to ensure efficient real-time suspect tracking and campus-wide security
reinforcement.
   The design methodology underscores the system’s commitment to delivering timely
alerts to security personnel and constructing comprehensive behavioural profiles. It
navigates a delicate balance between heightened security measures and safeguarding
individual privacy rights. This approach represents a significant stride in employing state-
of-the-art computational methodologies to fortify campus security measures globally.
   The culmination of this methodology manifests in the development of a sophisticated
surveillance framework capable of swiftly detecting, identifying, and alerting security
personnel to potential threats within educational settings. By prioritizing the creation of
safer educational environments, this approach stands at the forefront of enhancing campus
safety on a global scale.

5. Discussion
Exploring methodologies for detecting violence on college campuses through computer
vision and image processing techniques reveals notable progress. However, as we pivot
from conventional approaches, a distinct gap emerges, warranting attention in our project
survey paper. Firstly, prior studies predominantly focused on anomaly detection,
employing algorithms like 3DCNN Mahareek et al (2023) [9], LSTM Abbass and Kang (2023)
[1], and CNNs Abbass and Kang (2023) [1] Kang et al (2021) [8] Perseghin and Foresti
(2023) [10] Vieira et al (2022) [14] on embedded platforms. While effective in identifying
actions, these methods often lacked the ability to recognize individuals involved in real-time
during such incidents. Our enhancement involves the introduction of real-time surveillance-
based person recognition, advancing beyond mere activity detection to immediate
identification of individuals, significantly enhancing system responsiveness to security
incidents. Secondly, existing alert systems issued alerts about detected anomalies without
providing information about the violent occurrence places or the current whereabouts of
suspects, hindering effective responses. Our proposed alerting system addresses this
limitation, offering precise details about where violence happened and the current location
of suspects. This capability empowers security staff to respond quickly and precisely,
reducing the time it takes to address situations. The key distinction in present approaches
lies in the transition from a system primarily focused on detection to one combining real-
time person recognition and advanced alerting capabilities. Past methods successfully
identified anomalies, but the lack of real-time person recognition limited intelligence during
security incidents. The advanced alerting system bridges this gap by providing crucial
information for a fast and targeted response, thereby improving the overall effectiveness of
surveillance systems.

6. Conclusion
The incorporation of real-time person recognition enhances the surveillance system,
ensuring the swift identification of individuals involved in incidents. The proposed alerting
system, providing details on the event location and suspects’ where-abouts, facilitates
prompt response and intervention. These advancements bridge the gap between detection
and action, offering a comprehensive approach to campus security. The shift from anomaly
detection to live person recognition reflects addressing security challenges and adapting to
evolving environments. This initiative not only aids in identifying violence but also
establishes a platform for evolving intelligent surveillance systems. The integration of real-
time person recognition and advanced alerting capabilities underscores our focus on
creating a responsive and proactive security infrastructure for campus environments.
References
[1] Abbass, M. A. B., Kang, H. S. (2023). Violence Detection Enhancement by Involving
     Convolutional Block Attention Modules into Various Deep Learning Architectures:
     Comprehensive Case Study for UBI-Fights Dataset. IEEE Access.
[2] Aktı, S ̧ ., Ofli, F., Imran, M., Ekenel, H. K. (2022). Fight detection from still images in the
     wild. In Proceedings of the IEEE/CVF winter conference on applications of computer
     vision (pp. 550-559).
[3] Aldehim, G., Asiri, M. M., Aljebreen, M., Mohamed, A., Assiri, M., Ibrahim, S. S. (2023).
     Tuna Swarm Algorithm with Deep Learning Enabled Violence Detection in Smart Video
     Surveillance Systems. IEEE Access.
[4] Cao, C., Lu, Y., Wang, P., Zhang, Y. (2023). A New Comprehensive Benchmark for Semi-
     supervised Video Anomaly Detection and Anticipation. In Proceedings of the IEEE/CVF
     Conference on Computer Vision and Pattern Recognition (pp. 20392-20401).
[5] Garcia-Cobo, G., SanMiguel, J. C. (2023). Human skeletons and change detection for
     efficient violence detection in surveillance videos. Computer Vision and Image
     Understanding, 233, 103739.
[6] Ghadekar, P., Agrawal, K., Bhosale, A., Gadi, T., Deore, D., Qazi, R. (2023). A Hybrid CRNN
     Model for Multi-Class Violence Detection in Text and Video. In ITM Web of Conferences
     (Vol. 53). EDP Sciences.
[7] Husz á r, V. D., Adhikarla, V. K., N ́egyesi, I., Krasznay, C. (2023). Toward Fast and
     Accurate Violence Detection for Automated Video Surveillance Applications. IEEE
     Access, 11, 18772-18793.
[8] Kang, M. S., Park, R. H., Park, H. M. (2021). Efficient spatio-temporal modeling methods
     for real-time violence recognition. IEEE Access, 9, 76270-76285.
[9] Mahareek, E. A., El-Sayed, E. K., El-Desouky, N. M., El-Dahshan, K. A. (2023). Detecting
     anomalies in security cameras with 3DCNN and ConvLSTM.
[10] Perseghin, E., Foresti, G. L. (2023). A Shallow System Prototype for Violent Action
     Detection in Italian Public Schools. Information, 14(4), 240.
[11] Singh, K., Preethi, K. Y., Sai, K. V., Modi, C. N. (2018, December). Designing an efficient
     framework for violence detection in sensitive areas using computer vision and machine
     learning techniques. In 2018 Tenth International Conference on Advanced Computing
     (ICoAC) (pp. 74-79). IEEE.
[12] Singh, V., Singh, S., Gupta, P. (2020). Real-time anomaly recognition through CCTV using
     neural networks. Procedia Computer Science, 173, 254-263.
[13] Ullah, F. U. M., Obaidat, M. S., Ullah, A., Muhammad, K., Hijji, M., Baik, S. W. (2023). A
     comprehensive review on vision-based violence detection in surveillance videos. ACM
     Computing Surveys, 55(10), 1-44.
[14] Vieira, J. C., Sartori, A., Stefenon, S. F., Perez, F. L., De Jesus, G. S., Leithardt, V. R. Q. (2022).
     Low-cost CNN for automatic violence recognition on embedded system. IEEE Access,
     10, 25190-25202.
[15] Ye, L., Liu, T., Han, T., Ferdinando, H., Sepp ̈anen, T., Alasaarela, E. (2021). Campus
     violence detection based on artificial intelligent interpretation of surveillance video
     sequences. Remote Sensing, 13(4), 628.