<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Gourav Kalra);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Support Vector Machine-Based Segmentation for Accurate Crowd Density Detection in Urban Spaces</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gourav Kalra</string-name>
          <email>gkalra144@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rajeev Yadav</string-name>
          <email>rajeevtpo@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Satish Kumar Alaria</string-name>
          <email>Satish.alaria@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>1M. Tech. Scholar, Department of CSE, Arya College of Engineering</institution>
          ,
          <addr-line>Jaipur, Rajasthan</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>2Professor, Department of CSE, Arya College of Engineering</institution>
          ,
          <addr-line>Jaipur, Rajasthan</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>3Computer Instructor, Education Department, Government of Rajasthan</institution>
          ,
          <addr-line>Rajasthan</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1929</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Estimating crowd density has become increasingly important in fields like public safety, event management, and urban planning. Accurate detection of crowd density helps in making informed decisions and ensuring safety in crowded areas. This study proposes a novel method for crowd density detection using segmentation and classification based on a Support Vector Machine (SVM). The method involves two key steps: crowd segmentation and density categorization. During segmentation, advanced image processing techniques like background removal and region-based segmentation extract crowd sections from input images or video frames. These segmented areas are then classified using an SVM model, known for handling complex data. The model is trained on a diverse dataset containing images with varying crowd densities. The approach captures crucial spatial and contextual information, and extensive testing on various datasets has demonstrated its accuracy and resilience in dynamic crowd scenarios. The proposed SVM-based method can be implemented in real-time, making it valuable for applications requiring quick decisions. This technique offers a reliable and efficient solution for crowd density detection, with significant implications for event management, public safety, and urban planning in congested environments.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Crowd density detection</kwd>
        <kwd>Support Vector Machine</kwd>
        <kwd>crowd segmentation</kwd>
        <kwd>image processing</kwd>
        <kwd>real-time detection</kwd>
        <kwd>region-based segmentation</kwd>
        <kwd>urban planning</kwd>
        <kwd>machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The world has undergone rapid urbanization over the past two decades, leading to a significant
increase in city populations. As cities become more crowded, the need for effective surveillance
systems has grown, particularly to monitor people's movements and behaviors in public spaces,
ensuring the safety and security of individuals and their possessions. Surveillance has become an
integral part of maintaining public safety, with both public and private entities worldwide regularly
employing video cameras for this purpose. However, traditional surveillance systems heavily rely on
human operators, whose effectiveness can vary depending on their alertness and the available
manpower. Given these limitations, modern surveillance is transitioning towards smart systems
equipped with advanced technologies like intelligent video analysis, which enable automated
decisionmaking without continuous human intervention.</p>
      <p>Smart surveillance systems can be broadly categorized into two types: visual-based and multimodal.
Visual-based systems utilize computer vision algorithms to process video data from cameras and
drones in real time, offering solutions like facial recognition and license plate identification. On the
other hand, multimodal systems integrate various data sources, including motion and audio sensors,
alongside video data to provide comprehensive real-time insights. Companies like IBM and Intel have
pioneered technologies that can detect traffic incidents, optimize routes, and even identify
crimerelated events using these advanced surveillance systems.</p>
      <p>In today’s world, smart surveillance plays a critical role, particularly in monitoring crowds. This
becomes especially relevant during large public gatherings, where the potential for disasters, accidents,
or criminal activity increases. Effective crowd control is vital in these scenarios, as seen in airports,
concert venues, and religious gatherings. As crime, terrorism, and natural disasters rise, smart
surveillance systems must rely on robust algorithms to manage and predict crowd behavior.
The analysis of crowd behavior is a key focus of this chapter. It begins by defining different types of
crowds, highlighting their unique characteristics and behaviors in various contexts. A deeper
exploration of collective crowd behavior from a psychological standpoint follows, offering insights
into how crowds react in specific situations. From there, the discussion shifts to the challenges of
analyzing crowd behavior through video footage, including the complexities involved in cognitive
modeling for crowd behavior analysis. Ultimately, this chapter sets the stage for understanding the
motivations behind this research and the primary contributions of the proposed approach.
activities, active crowds may exhibit behaviors ranging from aggression to panic or expressive actions,
such as cheering at a concert or participating in religious events.</p>
      <p>Analyzing crowd behavior is essential for smart surveillance systems, as it helps authorities
understand crowd dynamics, develop control measures, and prevent crowd-related disasters. The
behavior of a crowd is often influenced by the context in which it forms. For instance, in a shopping
district, people might move peacefully alongside one another, while in a stadium, fans may express
intense emotions in response to the game. These varying behaviors highlight the need for smart
surveillance systems capable of monitoring and analyzing different crowd scenarios in real time.
Crowd behavior is inherently complex, as it depends on the context and setting. Monitoring and
understanding collective crowd behavior in both regular and emergency situations is challenging,
particularly when individual identification is difficult in dense crowds. Over time, psychologists and
sociologists have proposed numerous theories to explain crowd behavior. One of the earliest and most
popular is Le Bon's Group Mind Theory, which suggests that crowd members lose their individual
identity and are easily influenced by a leader. Freud’s theories support the notion that individuals in a
crowd open their unconscious minds, yet maintain control over their actions. McPhail’s
PreDisposition hypothesis posits that aggressive behavior in crowds is influenced by individual
dispositions toward antisocial behavior. In contrast, the Emergent-Norm hypothesis suggests that
crowds consist of people with common interests, leading to distinctive behavior patterns. These
collective behaviors can often become impulsive, unpredictable, and volatile. Understanding these
behaviors is crucial for developing smart surveillance systems that can anticipate and prevent
crowdrelated issues. Such systems must account for the social and psychological components of group
behavior, including how crowd members concentrate their attention on a common cause, exchange
ideas rapidly, and form homogenous groups based on shared beliefs and behaviors. Machine learning,
particularly Support Vector Machines (SVM), is a key technology used in crowd behavior analysis.
SVM models create distinct classes from input data features, enabling the classification of various
crowd behaviors. Deep learning, especially Convolutional Neural Networks (CNN), is another
powerful tool for crowd behavior research. CNN mimics the structure of neurons in the human visual
cortex, allowing for the hierarchical processing of input data. Long Short-Term Memory (LSTM)
networks, which resemble the brain's short-term memory, are also used to analyze and predict crowd
behavior based on past events. These advanced AI models enable the system to learn from past
examples, making it more effective in predicting crowd behaviors and detecting anomalies. This
research is motivated by the need to develop smart surveillance systems capable of detecting crowd
anomalies, evaluating behaviors in real-time, and providing timely alerts. The current pandemic crisis
has also highlighted the importance of monitoring crowd behavior to ensure public safety, especially
in terms of enforcing social distancing and detecting free-standing conversation groups. By combining
video, audio, and other sensor data, this study aims to develop a comprehensive crowd behavior
analysis system that can operate effectively in a variety of challenging scenarios.</p>
      <p>In conclusion, the introduction of cognitive modeling and AI technologies into surveillance systems
offers the potential to greatly improve crowd management, enhancing public safety and preventing
disasters in crowded settings.</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>Crowd behavior evaluation through computer vision techniques has been explored through various
research studies, with each contributing to a broader understanding of how anomalies and movement
patterns in large groups can be detected and analyzed. A review of these works highlights both
advancements in this domain and the identification of gaps that future research must address. For
instance, a framework [1] for video event identification that proved essential for high-level video
indexing and retrieval. This framework addressed challenges such as skewed data distribution and
loose video structure, automating the determination of crucial thresholds that were typically manually
set in conventional Association Rule Mining (ARM) techniques. The reduction in manual intervention
in video analysis was a critical advancement towards fully autonomous video content analysis.</p>
      <p>The Trajectory Segmentation and Multi-Instance Learning (TRASMIL) framework, which allowed
for precise and adaptable local anomaly detection. This three-step method was found to outperform
existing techniques in terms of identifying trajectories with local abnormalities [2]. TRASMIL
emphasized the importance of trajectory-based anomaly detection for accurately understanding crowd
movement and behaviors. Similarly, a semantic video [3] segmentation method that relied on
OneClass Classification (OCC) techniques for identifying events through frame-by-frame processing.
Their work highlighted the effectiveness of OCC in detecting unsupervised events, particularly
through the use of Temporal Self-Similarity Maps (TSSMs), which were evaluated using a publicly
available thermal video dataset. The use of OCC for unsupervised event detection opened new avenues
for handling video data with minimal prior knowledge of the scene.</p>
      <p>A dynamic time interval segmentation technique to improve item anomaly detection. Their
segmentation approach dynamically validated the time interval length, grouping successive attack
ratings [4]. While effective, [5] the robustness of anomaly detection methods had received limited
attention in terms of accuracy and consistency, pointing to a gap that future research must address.
Meanwhile, [6] contributed by proposing an unsupervised method for scene analysis and anomaly
detection in traffic video data recorded by stationary security cameras. By using local Hierarchical
Dirichlet Process (HDP) models, Kaltsa et al. were able to achieve improved accuracy with lower
computational costs, emphasizing the need for efficient solutions in processing large amounts of traffic
video data.</p>
      <p>Other researchers have approached the problem from a probabilistic standpoint. A probabilistic
framework for identifying [7] local spatiotemporal anomalies. This framework allowed for a more
refined decision-making process by identifying ideal decision-making procedures based on score
functions obtained from nearby neighbors’ distances. The work emphasized the importance of
spatiotemporal scales in accurately identifying anomalies. Spatiotemporal anomaly detection using
scalable aggregation [8] and geolocated text visualization. They proposed a cluster analysis technique
to automatically discover anomalies and presented these findings through a global map depiction.
Their work demonstrated how scalable visualization could assist analysts in categorizing and
evaluating event candidates on a global scale.</p>
      <p>The visualization of social media data with a visual analytics technique [9], which allowed users to
extract significant subjects from a chosen collection of communications. By applying Latent Dirichlet
Allocation (LDA) and visualizing topic time series, analysts could better understand abnormal events
by identifying peaks and outliers in the data. A probabilistic methodology that placed temporal and
geographical [10] constraints on video volumes, allowing for the identification of abnormal video
configurations. Their approach, which avoided the need for motion estimation or background removal,
proved particularly efficient for detecting rare events in video data.</p>
      <p>In a related development, [11] an anomaly detection method that incorporated both spatial and
temporal contexts. They introduced a region-based descriptor called Motion Context, which proved to
be more reliable than statistical models when dealing with small training datasets. Their use of
compact random projections sped up the search process, further enhancing the efficiency of the
method. A spatiotemporal Laplacian eigenmap [12] technique to model crowd behavior and detect
anomalies. Their method, which identified both local and global anomalies, showcased the potential
of regular crowd behavior modeling in accurately detecting abnormal crowd behaviors.</p>
      <p>A different approach by developing a Structural Context Descriptor (SCD) [13] to define crowd
individuals, utilizing the potential energy function of particles from solid-state physics. Their SCD
method used the 3-D Discrete Cosine Transform (DCT) to compute crowd SCD fluctuations and
pinpoint issues through these variations. Focused on anomaly detection [15] in complex crowd
settings, using a hierarchical activity-pattern discovery framework. Their work factored in both local
and global spatiotemporal contexts, creating an anomaly energy function that could quantify the
abnormality of motion patterns. This method was particularly useful for detecting abnormal activity
in densely packed crowds [16].</p>
      <p>Continuing with anomaly detection in video monitoring, [17] an unsupervised statistical learning
framework for monitoring crowded environments. The method, which relied on clustering and sparse
coding to learn global and local activity patterns, utilized a multi-scale analysis approach to ensure
precise anomaly localization. Advanced these techniques by developing a novel crowd video anomaly
detection [18] method based on spatiotemporal texture analysis. Their approach, designed for
realtime applications, simplified machine learning procedures and demonstrated improved flexibility and
efficiency compared to existing systems.</p>
      <p>a spatiotemporal architecture for anomaly detection, combining spatial feature representation with
temporal changes in spatial features [19]. This method proved to be effective for detecting anomalies
in videos of crowded scenes. An intrusion detection technique that detected normal behavior
disturbances, signaling potential intentional [20] or unintentional attacks. Their work explored both
supervised and unsupervised methods for anomaly detection, emphasizing the importance of detecting
disruptions in normal behavior patterns.</p>
      <p>An anomaly detection approach that utilized a reliable anomaly degree measure to increase the
separability between anomaly pixels and background pixels [21]. This method divided pixels into
potential anomaly sections and background sections, followed by discriminative information learning,
highlighting the significance of feature extraction for accurate anomaly detection. A fresh approach
to anomaly detection using a difference of convex functions algorithm [22]. This method built a hidden
Markov anomaly detector that extended the One-Class SVM and demonstrated improved performance
across various datasets.</p>
      <p>A sparse reconstruction-based method for detecting aberrant behavior, [23] combining low-level
visual features with causality analysis. By analyzing individual and group behaviors, they were able
to detect abnormal interactions in multi-object settings. Improving image classification performance
through convolutional neural network (CNN) ensembles, showing how this approach could
outperform both single CNN models and regular perceptrons in detecting abnormalities [24].</p>
      <p>An unsupervised Fully Convolutional Network (FCN) for anomaly detection in videos. Their
approach relied on temporal data and cascaded outlier detection, lowering computational complexity
and improving both speed and accuracy [25]. A machine learning-based anomaly detection approach
for detecting fraudulent traffic in Modbus and Transmission Control Protocol (TCP) connections. Their
use of SVM, Random Forest, K-NN, and K-means clustering allowed for effective anomaly detection in
an industrial scenario [26].</p>
      <p>Applied deep learning to behavior detection, using a bag of vision words and the Agglomerative
Information Bottleneck technique to compress vocabulary and minimize feature dimensions. Their
sparse representation approach increased detection precision for deviant behavior [27]. Leveraged
deep learning in social multimedia to detect suspect flows, testing their method on a large-scale
Carnegie Mellon University (CMU) dataset [28]. The Inception-V3 neural network for feature
extraction and classification, comparing its performance with traditional models like K-nearest
Neighbor, random forest, and SVM [29], while a technique focused on maximizing the area under the
ROC curve for hierarchical abnormal behavior detection, eliminating the need for manual labeling and
offering a semi-supervised approach [30].</p>
      <p>The literature on crowd behavior analysis demonstrates the continuous evolution of methods
aimed at enhancing surveillance through anomaly detection. From trajectory-based techniques to deep
learning and probabilistic models, researchers have developed increasingly sophisticated approaches
to ensure real-time, accurate detection of abnormal behavior in crowds. These advancements have laid
the groundwork for further research into the robustness and scalability of anomaly detection methods,
while also identifying key areas for future exploration, such as improving computational efficiency
and addressing issues like occlusion and multi-camera data integration.</p>
    </sec>
    <sec id="sec-3">
      <title>Mathematical Modeling &amp; Proposed Methodology</title>
      <p>In the realm of image processing, feature extraction is pivotal for enhancing tasks like pattern
recognition, face detection, and image classification. Features can broadly be divided into two
categories: general features such as color, texture, and shape, and domain-specific features like object
detection or human face recognition. The efficiency of image annotation frameworks hinges on the
ability to represent semantic concepts through low-level image features, which form the foundation
of multimedia information retrieval, object recognition, and image annotation. In both Content-Based
Image Retrieval (CBIR) and Automatic Image Annotation (AIA), key image features such as color,
texture, and shape are employed to extract meaningful data. While CBIR primarily focuses on visual
aspects of an image, AIA incorporates high-level concepts that better reflect the image content,
addressing the challenge of locating images in large datasets. Hence, this research integrates both
lowlevel features and high-level semantic concepts to improve image retrieval, focusing particularly on
texture and shape as central features for efficient image annotation. Feature extraction is a
dimensionality reduction process where the image is transformed into a feature set, representing its
high-level characteristics. By condensing the image data into a feature vector, the system can quickly
and accurately identify patterns within an image. For computational efficiency, a robust feature
extraction system is required, and combining low-level and high-level semantic concepts provides
better retrieval accuracy. The proposed system uses fused feature extraction, employing texture and
shape features to enhance the accuracy of image retrieval and reduce system complexity. This
methodology combines multiple features to provide more accurate image information, avoiding the
errors that might arise from relying on a single feature. In this study, the Haralick and Tamura texture
features are fused with shape features, significantly improving image retrieval performance and
reducing processing time. Image feature extraction forms the backbone of image retrieval systems,
with features classified into two main categories: general features and domain-specific features.
General features, including color, texture, and shape, describe the overall content of the image, while
domain-specific features, such as face recognition or object detection, require specialized knowledge
and fine-tuning. Low-level features like color and texture represent the visual aspects of an image,
while high-level features correspond to semantic keywords or concepts.
features are considered. To overcome this issue, AIA systems incorporate semantic concepts based on
visual content, enabling more accurate retrieval of relevant images. Pre-processing is crucial for
pattern recognition and image classification, as it enhances the quality of input images by removing
noise, resizing, and adjusting image features. In this research, the images are normalized through
rescaling to (128x128) pixels, ensuring uniformity across datasets and improving computational
efficiency, as shown. Additionally, color conversion to grayscale reduces the inherent complexity of
the images, facilitating edge detection and pixel-based processing. In this research, edge-based
segmentation is employed, relying on intensity differences and content. Edge detection using
techniques such as Sobel, Prewitt, and Canny operators helps identify object boundaries by detecting
intensity contrasts. Canny edge detection, in particular, is favored for its ability to produce sharp and
fine edges, as demonstrated.The performance of various segmentation techniques is evaluated using
metrics such as Root Mean Square Error (RMSE), Signal
-to-Noise Ratio (SNR), and Peak
Signal-toNoise Ratio (PSNR). RMSE measures the average difference between the original image and the
segmented image, with a higher value indicating greater differences. SNR quantifies the noise present
in an image, with higher values representing cleaner, noise-free images. PSNR is commonly used to
measure the quality of edge detection between the original and segmented image, with higher values
indicating better segmentation accuracy, where RRR is the maximum possible pixel value of the image.
The performance evaluation results indicate that the Canny operator outperforms other edge detection
techniques in terms of RMSE, SNR, and PSNR values. In thus section, we provide detailed mathematical
expressions related to the proposed methodology, including image pre-processing, feature extraction,
classification, and evaluation techniques. Each expression will be explained to illustrate its role in the
overall image annotation and retrieval system. To normalize the size of images for consistent
processing, we perform rescaling. If the original image has dimensions 
×  (width 
), and we want to resize it to a fixed siae  10 × ℎ0. the rescaling factor   and   in the  and 
and height
directions can be expressed as:
the
image
is
resized
uniformly
for
further
processing.</p>
      <p>To convert a color image to a gray-scale image, a weighted sum of the red, green, and blue (RG
日)
 gIn = 0.2989 ⋅  + 0.5870 ⋅  + 0.1140 ⋅ 
Where  ,  , and</p>
      <p>are the intensities of the red, green, and blue compconents of the image,
respectively. This formula accounts for the different contributions of each color channel to perceived
  =
This

  ,   =
ℎ0

ensures
components is used:
brightness.
computed as
 hinary ( ,  ) =
(1)
(2)
(3)
(4)
(5)
Thresholding is a simple segmentation technique used to separate objects from the background by
converting an image into a binary format. Given a threshold value  , the binary image  biuny ( ,  ) is</p>
      <p>Where  ( ,  ) represents the intensity of the pixel at location ( ,  ). Canny edge detection uses
gradients to detect edges. The gradient magnitude  at each pixel is calculated using the partial
derivatives in the  - and  -directions,   and   :
1
0
if  ( ,  ) &gt; 
if  ( ,  ) ≤ 
 =    2 +   2
 = tan−1</p>
      <p />
      <sec id="sec-3-1">
        <title>The direction of the edge  is caloulated as:</title>
        <p>thresholding are applied to finalize the edge map.</p>
        <p>After calculating the gradient magnitude and direction, non-maximum suppression and double
The GLCM matrix is a statistical measure to describe texture features. For two picels separated by
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
difference.</p>
        <p>⋅  +  = 0</p>
        <p>2
= ∥ ∥
In Support Vector Machines (SVM), the goal is to find a hyperplane that separates data points of
different classes. For a linear SVM, the decision boundary is given by:</p>
        <p>Where  is the weight vector,  is the imput feature vector, and  is the bias term. The hyperplane
is
defined
such
that
it
maximizes
the
margin
between
the
two
classes.</p>
        <p>The margin  is the distance between the hyperplane and the closest data points, and is defined as

 = 2 ,  
= arg max  ∑ −1   ∑ −1      + 2 , 

−  ( ,  )
Where  ( ,  ) is the intensity at pixel ( ,  ) and  
is the scale that maximizes the intensity
polynomial kernel is given by:
   ,  
Where 
=   ⋅   + 1
is
the</p>
        <p>2
detection. RMSE is computed as:
1 
 ∑ =1   (  −   )2
RMSE =</p>
        <p>Where</p>
        <sec id="sec-3-1-1">
          <title>It is defined as:</title>
          <p>PSNR = 10log10
 2</p>
          <p>MSE
a distance  in a specific direction  , the GLCM matrix element  ( ,  ) is defined as:
 ( ,  ) = ∑
 −1   ∑ −1   [1 if  ( ,  ) =  and  ( +  ,  +  ) =  ]</p>
          <p />
          <p>Where  ( ,  ) is the intensity of the pixel at ( ,  ), and  and  represent gray-level values.
The contrast, a texture feature that describes the intensity contrast between a pixel and its neighbor
over the whole image, is computed as</p>
          <p>Contrast = ∑ =−01   ∑ =−01   ( −  )2 ⋅  ( ,  )</p>
          <p>Where  ( ,  ) is the element in the GLCM matrix coeresponding to the gray-level ca-occurrence
between  and  . Entropy measures the randomness or complexity of the texture, and is given by:
Entropy = − ∑</p>
          <p>=−01   ∑ =−01    ( ,  ) ⋅ log  ( ,  )
Entropy = − ∑</p>
          <p>=−01   ∑ =−01    ( ,  ) ⋅ log  ( ,  )
Entropy measures the randomness or complexity at the texture, and is grven by:</p>
          <p>This value indicates the level af disorder or unpredictability in the texture of the image. Coarseness
measures the texture's roughness, where large differences in pixel intensities indicate coarser textures.
The coarseness feature is calculated as</p>
          <p>The objective is to maximize  , which is equivalent to minimizing ∥  ∥2 - For non-linearly
separable data, kernel functions transform the input space into a higher dimensional space. The
degree of the polynomial, and  
and  
are input vectors.</p>
          <p>RMSE measures the difference between the original and predicted values, after used in evaluating edge
number of pixels.PSNR is used to measure the quality of an image after compression or transformation.</p>
          <p>is the original image,   is the processed (e.g, edge-detected) image, and  is the total
Where  is the maximum pixel value (e.g, 255 for 8-bit imoges) and MSE is the Mean Squared Error
between the original and processed image.</p>
          <p>These mathematical expressions and their explanations provide a foundation for understanding the
various components of the proposed image annotation and retrieval system, from feature extraction
to classification and evaluation. Each formula plays a critical role in enhancing the accuracy and
efficiency of the averall system. In the proposed methodology, the focus is on automatic image
annotation using machine learning, specifically the Multi-Class Support Vector Machine (MCSVM)
classifier. Automatic image annotation is a classification task where an image is automatically labeled
with semantic keywords based on its visual content. Traditional binary SVM classifiers have
limitations in handling multi-class problems, which are common in image annotation tasks. MCSVM
extends the binary SVM approach to handle multiple classes by training classifiers for each class and
combining their outputs to classify new images.</p>
          <p>The proposed system incorporates the Semantic Keyword Transfer (SKT) algorithm to bridge the
gap between low-level image features and high-level semantic concepts. Image classification involves
training a model to recognize patterns in labeled images and applying this model to classify new
images. Classification techniques such as Minimum Distance Classifier (MDC), K-Nearest Neighbor
(KNN), Support Vector Machine (SVM), Artificial Neural Networks (ANN), and Decision Trees (DT)
are commonly used in image processing.</p>
          <p>The SVM classifier is particularly effective in high-dimensional data classification due to its ability
to create optimal class boundaries by maximizing the margin between classes. In the context of image
annotation, MCSVM is used to classify images with multiple objects or regions.</p>
          <p>The proposed methodology for automatic image annotation combines fused features (texture and
shape) with the MCSVM classifier and SKT algorithm. This approach bridges the semantic gap
between low-level image features and high-level semantic concepts, resulting in improved image
retrieval accuracy. The integration of Haralick and Tamura texture features with shape features
provides a comprehensive representation of image content, while the MCSVM classifier efficiently
handles multi-class image annotation tasks. The evaluation results demonstrate that the proposed
system outperforms existing methods in terms of retrieval accuracy, making it a promising solution
for automatic image annotation and retrieval tasks.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results and Analysis</title>
      <p>This research proposes and examines a simple algorithm to perform this crowd behavior analysis.
Given an aerial image of a crowd, the algorithm segments the image into crowd and non
-crowd
regions. On a large scale, we expect a crowd to contain some repetitive visual elements or textures
that are significantly different from that of a non-crowd region. The proposed algorithm uses multiple
Gabor filters to capture these different textures in an image and uses improved pre processing and
support vector machines to segment the image into 2 groups corresponding to crowd and non -crowd
regions. This research attempts to detect crowds of humans in still images. Given an image, the
proposed algorithm segments out the regions that the crowd occupies. The data set consists of 1200
aerial images of crowds taken from the internet. Each images are tagged with a range 5 properties. By
testing the algorithm on a range of images with varying properties, this research aims to choose a
good set of parameters that can detect crowd well despite the diverse characteristics of crowds.</p>
      <p>determines the spatial frequency bandwidth and hence the number of parallel
(in octave) related to the ratio  / as follows:
excitatory and inhibitory stripes in the Gabor filter. The half-response spatial frequency bandwidth 
The ratio  /
 = log2   + ln (2)</p>
      <p>2
  − ln (2)
2
,

 =

1 ln (2) 2 +1
2 2 −1</p>
      <p>.
with orientation separation angles of   = 30∘ :
 : 0∘, 30∘, 60∘, 90∘, 120∘, 150∘</p>
      <sec id="sec-4-1">
        <title>We also use a range of wavelengths, evenly spaced in log</title>
        <p>2
wavelength to the radius of the image (or half its diagonal length). The choice of the minimum
wavelength is adjusted when we apply the algorithm to some initial images. The general formula for
-space, ranging from some minimum</p>
        <p>In order to capture the repetitive texture of a crowd from many perspectives, we use 6 orientations
the chosen wavelengths is
 :  min × 2</p>
        <p>,  ∈ ℕ
the approximation</p>
      </sec>
      <sec id="sec-4-2">
        <title>For example, if we choose both  min and   equal to 2 for a 288</title>
        <p>× 512 image, there would be a
total
of
42</p>
        <p>Gabor
filters
used
from
6
orientations
and
7
wavelengths.</p>
        <p>In this work we set the value of the bandwidth  by default to 1 octave. In that case, the Equation gives
(16)
(17)
(18)
 = 0.5 × 
For each filtered image, we use a Gaussian smoothing function given by:
 ( ,  ) = 2 1 2 exp −  22+2 2 (20)
where  is the standard deviation that determines the windown size. The ratio  /  (where   is
the standard deviation parameter of Gabor filter) is estimated and adjusted when we apply the
algorithm to some initial images. We first test them on minimum wavelength λmin = 3 and the
gaussian vs gabor standard deviation ratio σ/σg = 3. The resulting segmentation is in Figure 5.
(19)</p>
        <p>The algorithm does decently well with both of the picture. For both images, it pinpoints the correct
regions where the crowds of people are. In the first image, it seems slightly over estimate the size of
each crowd on the left and right. But the crosswalkstripes do not seem to confuse the algorithm. With
the second image, the algorithm does a slightly worse job, as the shadow makes it overestimates the
regions that the crowd occupies, and there are quite a few people who are not captured as belonging
to the crowd.</p>
        <sec id="sec-4-2-1">
          <title>Type of Analysis</title>
          <p>Performance Parameter</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Single Level Scenario F Score</title>
          <p>Proposed Work</p>
          <p>Segmentation and
Classification</p>
          <p>Multiple Scenario
Precision, Recall and F Score</p>
          <p>In order to lessen the algorithm’s overestimation and be able to detect more people in a scattered
crowd, we will reduce the value of both the minimum wavelength and the standard deviation ratio.
The goal is that the algorithm can pick up smaller details in the picture and thus segment more
precisely all the regions of the crowd.</p>
          <p>In the second trial, we change the minimum wavelength to 2 and the standard deviation ratio to
1.6. The algorithm seems to improve for both images. For the first image, the algorithm seems to reduce
the algorithm overestimation, although it seems to confuse a tiny part of the crosswalk stripes as parts
of the crowd. For the second image, the algorithm seems to no longer include the majority of the
shadow as parts of the crowd, and there are only 1 -2 people who are no included as belonging to the
crowd. As a result, we choose minimum wavelength equal 2 and standard deviation ratio equal 1.6 as
the parameters for our algorithm, in addition to the other parameters</p>
          <p>There are some defects inherent in Matlab average filters such as Gabor and Gaussian. In particular,
they assume that pixels out of the image has intensity of 0, and thus it is possible the algorithm does
not work well for pixels at the circumference of images. This problem did not arise with the 16 images
in this data set, but it is a problem that may be needed to deal with when applying to more images in
different circumstances. This program worked reasonably fast, needed from 20.839009 to 31.543316
seconds for each image of size 288 × 512. However, the time does add up when we want to process all
the images multiple times when testing for different parameters. Crowd image segmentation and
detection play a significant role in various computer vision applications, including crowd monitoring,
crowd behavior analysis, and public safety. This work presents a comprehensive study on the use of
Gabor filters and Support Vector Machine (SVM) for crowd image segmentation and detection. The
Gabor filter is employed to extract discriminative features from crowd images, and SVM is used as a
classifier to distinguish between crowd and non-crowd regions. The results demonstrate the
effectiveness of this approach in accurately segmenting and detecting crowds in complex visual scenes.
This research concludes by discussing the potential applications of crowd image segmentation and
detection using Gabor filters and SVM in real-world scenarios.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This research presents a novel approach to crowd behavior analysis using a combination of Gabor
filters and Support Vector Machines (SVM) to detect and segment crowds in still images. The algorithm
effectively segments an image into crowd and non-crowd regions by identifying repetitive textures
that differentiate the crowd from the background. Through the use of multiple Gabor filters, the
method captures various orientations and scales of these textures, enhancing the detection of
crowdspecific characteristics. The SVM classifier is used to cluster the regions based on these features,
ensuring that crowd regions are distinguished from non-crowd areas. The ability to detect crowds in
public spaces is crucial for preventing congestion, ensuring safety, and enforcing social distancing
measures. This research successfully demonstrates that crowd segmentation is a vital preprocessing
step for more complex tasks such as crowd density estimation and behavior analysis. The algorithm's
robustness is tested on a dataset of 1200 aerial images with varying properties, including crowd
density, background variation, and lighting conditions, resulting in reliable crowd detection.Despite
some limitations, such as overestimation in regions affected by shadows, the proposed methodology
improves the precision and accuracy of crowd detection. By adjusting key parameters like the
minimum wavelength and standard deviation ratio, the algorithm's performance was optimized,
providing precise crowd segmentation. This research highlights the potential for further
advancements in crowd detection, with applications in public safety, event management, and urban
planning, offering a foundation for real-time crowd analysis systems in diverse environments.
REFERENCES
[1.] Aditya, CSK, Hani'ah, M, Bintana, RR &amp; Suciati, N 2015, 'Batik classification using neural
network with gray level co-occurence matrix and statistical color feature extraction', in 2015
International Conference on Information &amp; Communication Technology and Systems (ICTS),
pp. 163-8.
[2.] Ahmed, M, Mahmood, AN &amp; Hu, J 2016, 'A survey of network anomaly detection techniques',</p>
      <p>Journal of Network and Computer Applications, vol. 60, pp. 19-31.
[3.] Anton, SD, Kanoor, S, Fraunholz, D &amp; Schotten, HD 2018, 'Evaluation of machine learning-based
anomaly detection algorithms on an industrial modbus/tcp data set', in Proceedings of the 13th
international conference on availability, reliability and security, pp. 1-9.
[4.] Au, CE, Skaff, S &amp; Clark, JJ 2006, 'Anomaly detection for video surveillance applications', in 18th</p>
      <p>International Conference on Pattern Recognition (ICPR'06), vol. 4, pp. 888-91.
[5.] Babenko, B, Yang, M-H &amp; Belongie, S 2010, 'Robust object tracking with online multiple instance
learning', IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 8, pp.
161932.
[6.] Belousov, A, Verzakov, S &amp; Von Frese, J 2002, 'A flexible classification approach with optimal
generalisation performance: support vector machines', Chemometrics and intelligent laboratory
systems, vol. 64, no. 1, pp. 15-25.
[7.] Benabbas, Y, Ihaddadene, N &amp; Djeraba, C 2011, 'Motion pattern extraction and event detection
for automatic visual surveillance', EURASIP Journal on Image and Video Processing, vol. 2011,
pp. 1-15.
[8.] Bertini, M, Del Bimbo, A &amp; Seidenari, L 2012, 'Multi-scale and real- time non-parametric
approach for anomaly detection and localization', Computer Vision and Image Understanding,
vol. 116, no. 3, pp. 320-9.
[9.] Bezdek, JC, Ehrlich, R &amp; Full, W 1984, 'FCM: The fuzzy c -means clustering algorithm',</p>
      <p>Computers &amp; Geosciences, vol. 4, no. 10, pp. 191-203.
[10.] Brassil, J 2009, 'Technical challenges in location-aware video surveillance privacy', in Protecting</p>
      <p>Privacy in Video Surveillance, Springer, pp. 91-113.
[11.] Brutzer, S, Höferlin, B &amp; Heidemann, G 2011, 'Evaluation of background subtraction techniques
for video surveillance', in CVPR 2011, pp. 1937-44.
[12.] Castiglione, A, Cepparulo, M, De Santis, A &amp; Palmieri, F 2010, 'Towards a lawfully secure and
privacy preserving video surveillance system', in International Conference on Electronic
Commerce and Web Technologies, pp. 73-84.
[13.] Chae, J, Thom, D, Bosch, H, Jang, Y, Maciejewski, R, Ebert, DS &amp; Ertl, T 2012, 'Spatiotemporal
social media analytics for abnormal event detection and examination using seasonal-trend
decomposition', in 2012 IEEE Conference on Visual Analytics Science and Technology (VAST),
pp. 143-52.
[14.] Chandola, V, Banerjee, A &amp; Kumar, V 2009, 'Anomaly detection: A survey', ACM computing
surveys (CSUR), vol. 41, no. 3, pp. 1-58.
[15.] Chang, C-I &amp; Chiang, S-S 2002, 'Anomaly detection and classification for hyperspectral imagery',</p>
      <p>IEEE transactions on geoscience and remote sensing, vol. 40, no. 6, pp. 1314-25.
[16.] Chapelle, O, Scholkopf, B &amp; Zien, A 2009, 'Semi-supervised learning (chapelle, o. et al., eds.;
2006)[book reviews]', IEEE Transactions on Neural Networks, vol. 20, no. 3, pp. 542.
[17.] Chen, M, Chen, S-C &amp; Shyu, M-L 2007, 'Hierarchical temporal association mining for video event
detection in video databases', in 2007 IEEE 23rd International Conference on Data Engineering
Workshop, pp. 137-45.
[18.] Cheng, K-W, Chen, Y-T &amp; Fang, W -H 2015, 'Video anomaly detection and localization using
hierarchical feature representation and Gaussian process regression', in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 2909
[19.] Cho, S-B &amp; Park, H -J 2003, 'Efficient anomaly detection by modeling privilege flows using
hidden Markov model', Computers &amp; Security, vol. 22, no. 1, pp. 45-55.
[20.] Choi, Y-S 2009, 'Least squares one -class support vector machine', Pattern Recognition Letters,
vol. 30, no. 13, pp. 1236-40.
[21.] Chong, YS &amp; Tay, YH 2017, 'Abnormal event detection in videos using spatiotemporal
autoencoder', in International symposium on neural networks, pp. 189-96.
[22.] Coello, CAC, Pulido, GT &amp; Lechuga, MS 2004, 'Handling multiple objectives with particle swarm
optimization', IEEE Transactions on evolutionary computation, vol. 8, no. 3, pp. 256-79.
[23.] Cong, Y, Yuan, J &amp; Tang, Y 2013, 'Video anomaly search in crowded scenes via spatio -temporal
motion context', IEEE transactions on information forensics and security, vol. 8, no. 10, pp.
15909.
[24.] Dasarathi, S 2015, 'Parametrization of Convolutional Neural Network for Image Classification',</p>
      <p>Dublin, National College of Ireland.
[25.] Davies, AC, Yin, JH &amp; Velastin, SA 1995, 'Crowd monitoring using image processing', Electronics
&amp; Communication Engineering Journal, vol. 7, no. 1, pp. 37-47.
[26.] Davis, JW &amp; Sharma, V 2005, 'Fusion-based background-subtraction using contour saliency', in
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(CVPR'05)-Workshops, pp. 11.
[27.] Du, B &amp; Zhang, L 2014, 'A discriminative metric learning based anomaly detection method', IEEE
transactions on geoscience and remote sensing, vol. 52, no. 11, pp. 6844-57.
[28.] Du, B, Zhao, R, Zhang, L &amp; Zhang, L 2016, 'A spectral -spatial based local summation anomaly
detection method for hyperspectral images', Signal Processing, vol. 124, pp.
115[29.] Duan, L-Y, Xu, M, Tian, Q, Xu, C-S &amp; Jin, JS 2005, 'A unified framework for semantic shot
classification in sports video', IEEE Transactions on multimedia, vol. 7, no. 6, pp. 1066-83.
[30.] Feizi, A 2020, 'Hierarchical detection of abnormal behaviors in video surveillance
through modeling normal behaviors based on AUC maximization', Soft Computing, vol.
24, no. 14, pp. 10401-13.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>