<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Enhancing Workplace Safety through Automated Personal Protective Equipment Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan Camilo Poveda Pinilla</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sofia Segura Muñoz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jorge Ivan Romero Gelvez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad de Bogota-Jorge Tadeo Lozano</institution>
          ,
          <addr-line>Bogota</addr-line>
          ,
          <country country="CO">Colombia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>2</volume>
      <fpage>4</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>Personal Protective Equipment (PPE) is crucial for ensuring the safety of workers in industrial environments, protecting them from various potential hazards. With the rapid advancements in deep learning technologies, there is increasing interest in applying these techniques to automate PPE detection, thereby enhancing workplace safety measures. This paper presents the development and evaluation of a real-time helmet detection system that leverages computer vision and deep learning models. The system is designed to identify whether individuals are wearing safety helmets, providing immediate feedback and recording instances of non-compliance for subsequent review. Through a comprehensive literature review, this study explores the state-of-the-art in PPE detection, identifies key challenges, and discusses future directions for improving detection systems. Additionally, the implementation of the system is detailed, including its practical application in monitoring safety compliance within real-world industrial settings. The results demonstrate the system's efectiveness in reliably detecting helmets, highlighting its potential to significantly contribute to workplace safety protocols.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PPE Detection</kwd>
        <kwd>Computer Vision</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Workplace Safety</kwd>
        <kwd>Helmet Detection System</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Workplace safety is a critical concern in various industries, particularly in sectors involving manual
labour and heavy machinery, such as construction, manufacturing, and mining. Personal Protective
Equipment (PPE) plays a vital role in protecting employees from a wide range of workplace hazards,
including exposure to chemicals, physical injuries, and respiratory problems. The importance of PPE
cannot be overstated, as it serves as the last line of defence against these risks, providing a protective
barrier that can significantly reduce the likelihood of serious injuries or deaths [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Recent advances in deep learning technology have opened new avenues for enhancing workplace
safety through automation of PPE detection. Traditional methods of monitoring PPE compliance
typically involve manual checks, which are not only labour intensive but also prone to human error. In
contrast, automated detection systems using deep learning algorithms ofer a more eficient, accurate,
and scalable solution. These systems are grounded in the foundational work of deep learning pioneers,
such as Ian Goodfellow, who coauthored the seminal textbook "Deep Learning" [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which has been
instrumental in shaping the field.
      </p>
      <p>
        The development of an automated helmet detection application is particularly justified in high-risk
industries, where the consequences of non-compliance can be severe. Continuous monitoring through
computer vision allows for real-time enforcement of safety protocols, significantly reducing the risk of
accidents. Studies have shown that the implementation of such systems not only improves compliance
but also fosters a culture of safety among workers [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6">3, 4, 5, 6</xref>
        ].
      </p>
      <sec id="sec-1-1">
        <title>1.1. Main Contributions</title>
        <p>In this paper, we present the following key contributions:
1. Development of a Real-Time Helmet Detection System: Leveraging advanced deep learning
models to accurately identify the presence of helmets in real-time video streams within industrial
environments.
2. Implementation of an Optimized Deep Learning Model: Fine-tuning a pre-trained YOLOv5
model to enhance detection accuracy and reduce false positives/negatives under varying
environmental conditions.
3. Comprehensive System Evaluation: Conducting empirical evaluations in a controlled
industrial setting to assess the system’s efectiveness, including metrics such as precision, recall, and
response time.
4. User-Friendly Web Interface: Developing an intuitive web-based interface for real-time
monitoring, visualization of detection results, and storage of non-compliance instances.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>
        The evolution of object detection methods, particularly within the context of PPE detection, has been
substantial over the past two decades. Initially, object detection relied heavily on classical machine
learning techniques. Methods such as those based on Haar features [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and Histogram of Oriented
Gradients (HOG) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] laid the foundation for real-time object detection by efectively capturing essential
features of objects within an image. However, these approaches had limitations in terms of accuracy
and computational eficiency, which were addressed by later developments in the field.
      </p>
      <p>
        The advent of deep learning marked a significant turning point in object detection. Convolutional
Neural Networks (CNNs), particularly the R-CNN family [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], revolutionized the field by learning feature
representations directly from data, leading to substantial improvements in detection accuracy. The
development of Faster R-CNN [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], with its introduction of the Region Proposal Network (RPN), further
optimized the process by reducing computational overhead and enhancing speed. These advancements
were influenced by the broader machine learning community’s eforts to improve statistical learning
methods, as detailed in works like "The Elements of Statistical Learning" by Hastie, Tibshirani, and
Friedman [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        One-stage detectors, such as YOLO [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and SSD, brought real-time object detection capabilities,
making it feasible to deploy these models in environments where quick decision-making is crucial.
These models have been pivotal in the development of PPE detection systems, allowing for the rapid
identification of safety equipment in real-time scenarios. The importance of these developments is
underscored by the foundational work in statistical pattern recognition by Duda, Hart, and Stork [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
which laid the groundwork for the machine learning approaches used in modern object detection.
      </p>
      <p>
        In recent years, the introduction of attention mechanisms and transformer architectures, exemplified
by DETR [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], has pushed the boundaries of object detection even further. These models eliminate the
need for many hand-designed components, treating object detection as a direct set prediction problem.
This simplification of the detection pipeline has led to end-to-end systems that are not only more
accurate but also easier to implement in practical applications. The role of optimization techniques in
improving these models cannot be overlooked, as highlighted in "Convex Optimization" by Boyd and
Vandenberghe [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        The literature also highlights the specific challenges and opportunities associated with applying these
advanced object detection techniques to PPE detection. For example, detecting helmets in dynamic
environments, where lighting conditions and object orientations can vary significantly, remains a
challenge. Moreover, integrating these detection systems with real-time monitoring tools is essential
for ensuring immediate feedback and corrective actions. This integration is part of a broader trend
towards leveraging big data and machine learning for real-time decision-making, a trend discussed in
"Machine Learning: A Probabilistic Perspective" by Kevin P. Murphy [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. System Development and User Interface</title>
        <p>In this study, we developed an automated detection system for PPE, with a particular focus on detecting
helmets in industrial environments. The system is built upon a combination of computer vision
techniques and deep learning models, which work together to identify and classify objects in real-time.
This approach not only enhances the accuracy of detection but also provides immediate feedback on
safety compliance, allowing for prompt corrective actions when necessary.</p>
        <p>The implementation of the system was carried out using Python, leveraging several key libraries and
tools. OpenCV was used for real-time image capture and processing, while Base64 was employed for
encoding images in a format suitable for inference. Flask was utilized to create a web interface that
allows users to stream video feeds and display results. The actual object detection was performed using
the Roboflow API, which provides a robust and scalable solution for deploying deep learning models.</p>
        <p>The methodology is structured into several key steps, each of which plays a crucial role in the overall
functionality of the system:
1. Camera Initialization: The first step involves initializing the camera to capture live video feeds.</p>
        <p>This is achieved through OpenCV, which ensures that the camera is ready to capture frames in
real-time.
2. Object Detection: Once the camera is initialized, frames are captured and encoded as JPEG
images. These encoded images are then sent to the Roboflow API for object detection. The API
returns predictions that include the coordinates of detected objects, their labels, and confidence
scores. This step is critical as it forms the basis for determining whether a helmet is present in
the frame.
3. Helmet Detection: The system then checks whether the detected objects include a helmet with
a confidence score above 90%. If a helmet is detected, the frame is annotated with a bounding
box and a label indicating the presence of a helmet. As illustrated in Figures 1 and 2, the system
accurately identifies helmets in real-time, providing visual confirmation through bounding box
annotations. If no helmet is detected, the frame is saved as evidence of non-compliance, along
with a timestamp. This step is essential for ensuring that any instances of non-compliance are
documented for later review.
4. Web Streaming: The processed frames are streamed to a web interface using Flask. This allows
for real-time monitoring of the detection results, providing users with immediate visual feedback
on whether safety protocols are being followed.
5. System Shutdown: After the detection process is stopped, the system releases the camera
resources and closes any open windows. This ensures that the system can be safely and eficiently
restarted for subsequent detection sessions.</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Types of Helmets Detected</title>
          <p>The helmet detection system is designed to recognize a variety of helmets commonly used in industrial
settings. Specifically, it can detect helmets of diferent shapes, including rounded and angular designs,
and a range of colors such as yellow, white, and blue. This versatility ensures that the system can
efectively identify helmets across diverse workplace environments and varying helmet styles.</p>
          <p>As illustrated in Figures 1 and 2, the system accurately identifies helmets in real-time, providing visual
confirmation through bounding box annotations. These detections confirm the system’s capability to
monitor helmet compliance efectively.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. No Helmet Detected Images Page</title>
          <p>The No Helmet Detected Images page is dedicated to displaying captured images where no helmet was
detected during the detection process. This page is designed with the intent of providing a thorough
review of non-compliance instances, allowing users to take corrective actions based on documented
evidence.</p>
          <p>• Header: A bold, red header labeled "No Helmet Detected Images" emphasizes the critical nature
of the content displayed on this page. This visual emphasis ensures that users understand the
importance of reviewing the images carefully.
• Image Grid: The captured images are displayed in a responsive grid layout, which adapts to
diferent screen sizes, making the page accessible on various devices. Each image is presented
within a bordered card-like frame, giving it a structured and organized appearance. Below each
image, the filename is displayed, which includes the timestamp, providing context for when the
non-compliance was detected. This is particularly useful for correlating the images with specific
events or shifts.
• Navigation: At the bottom of the page, a "Back to Home" button is provided. This button is
colored in blue to maintain consistency with the overall design theme of the interface. It allows
users to easily return to the main Helmet Detection page, facilitating seamless navigation between
monitoring and review activities.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Flow Diagram</title>
      <sec id="sec-4-1">
        <title>4.1. System Workflow</title>
        <p>To provide a clear understanding of the system’s workflow, the following flow diagram illustrates the
key steps involved in the Helmet Detection system, from camera initialization to system shutdown.</p>
        <p>As illustrated in Figure 4, the Helmet Detection system follows a systematic workflow starting from
camera initialization to system shutdown. Each step is crucial for ensuring accurate and real-time
detection of helmets in industrial environments.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Algorithmic Approaches</title>
      <p>This section provides an overview of the machine learning (ML) and deep learning techniques used in
the development of the PPE detection system, focusing on the mathematical formulations, the specific
models used in the code implementation, and their practical application in this study. These techniques
are central to the system’s ability to detect safety helmets in real-time with high accuracy.</p>
      <sec id="sec-5-1">
        <title>5.1. Convolutional Neural Networks (CNNs)</title>
        <p>
          Convolutional Neural Networks (CNNs) are a cornerstone of modern computer vision tasks, particularly
in image recognition and object detection. CNNs are designed to automatically and adaptively learn
spatial hierarchies of features from input images, making them highly efective for tasks involving
visual pattern recognition [
          <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
          ].
        </p>
        <sec id="sec-5-1-1">
          <title>5.1.1. Convolution Operation</title>
          <p>The convolution operation is the fundamental building block of CNNs. It is defined mathematically as:
(, ) = ( * )(, ) = ∑︁ ∑︁ ( − ,  − ) · (, )
 
(1)
Juan Camilo Poveda Pinilla et al. CEUR Workshop Proceedings</p>
          <p>Start
Camera Initialization</p>
          <p>Capture Frame</p>
          <p>Preprocessing
Object Detection</p>
          <p>Helmet Detected?
Annotate FramYees</p>
          <p>No Save Frame
Display on Web Interface</p>
          <p>Alert System
•  is the input image, which is usually a matrix of pixel values.
•  is the kernel or filter, a small matrix that slides over the input image to extract specific features.
• (, ) is the output feature map, which represents the presence of features detected by the kernel.</p>
          <p>In the context of the PPE detection system, convolutional layers are used to extract features such
as edges, textures, and patterns from input video frames. These features are crucial for distinguishing
between diferent objects, such as helmets and other workplace elements.</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>5.1.2. Activation Function</title>
          <p>After applying the convolution operation, an activation function is used to introduce nonlinearity into
the model, enabling it to learn more complex patterns. The Rectified Linear Unit (ReLU) is the most
commonly used activation function in CNNs, defined as:</p>
          <p>
            ReLU() = max(0, )
ReLU allows the network to retain positive values while discarding negative values, which enhances
the model’s ability to learn complex features without saturating [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]. In this study, ReLU activation
functions are applied after each convolutional layer to ensure that the CNN captures the critical features
necessary to detect helmets.
          </p>
        </sec>
        <sec id="sec-5-1-3">
          <title>5.1.3. Pooling Layer</title>
          <p>Pooling layers, specifically max-pooling, are used to reduce the spatial dimensions of the feature maps,
which decreases the computational load and helps prevent overfitting. Max-pooling is defined as:
 (, ) = max ( + ,  + )</p>
          <p>
            ,
where  (, ) is the pooled feature map, and (, ) is the input feature map. Max-pooling selects
the maximum value for each subregion, preserving the most prominent features detected by the
convolutional layers [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ]. In this PPE detection system, the pooling layers ensure that the model can
process high-resolution video feeds eficiently while maintaining the essential features needed for
accurate detection.
          </p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Object Detection Using YOLO (You Only Look Once)</title>
        <p>
          YOLO (You Only Look Once) is a real-time object detection system known for its speed and eficiency,
making it ideal for applications where rapid decision-making is critical [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Unlike two-stage detectors
such as Faster R-CNN, YOLO treats object detection as a single regression problem, predicting bounding
boxes and class probabilities directly from the full image in one evaluation.
(2)
(3)
(4)
        </p>
        <sec id="sec-5-2-1">
          <title>5.2.1. Bounding Box Prediction</title>
          <p>In YOLO, the input image is divided into a grid  × , where each grid cell is responsible for predicting
a fixed number of bounding boxes. Each bounding box prediction consists of five components:
(, , , ℎ, )
• (, ) are the coordinates of the bounding box center relative to the grid cell.
•  and ℎ are the width and height of the bounding box relative to the entire image.
•  is the confidence score, representing the probability that the bounding box contains an object.</p>
          <p>The YOLO model used in this PPE detection system is trained to recognize helmets by learning to
predict the bounding boxes around them accurately. The loss function of the model includes components
that account for the accuracy of the coordinates of the bounding box (localization loss) and the confidence
score (confidence loss), ensuring that the model is optimized to detect helmets with high precision.</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>5.2.2. Class Prediction</title>
          <p>Each grid cell in YOLO not only predicts bounding boxes but also class probabilities for each object
class. The final output for each cell of the grid is a vector that contains the predictions of the bounding
box and the class probabilities. This output is used to determine the presence and location of objects
in the image. In this study, YOLO’s ability to perform these predictions in real time allows for rapid
identification of helmets, which is essential to ensure workplace safety.</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Region Proposal Networks (RPN) in Faster R-CNN</title>
        <p>
          Faster R-CNN is a two-stage object detection model that significantly improves on earlier models by
incorporating a Region Proposal Network (RPN) to generate region proposals directly from convolutional
feature maps [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. This innovation eliminates the need for traditional selective search methods, making
the model faster and more eficient.
        </p>
        <sec id="sec-5-3-1">
          <title>5.3.1. Region Proposal Generation</title>
          <p>
            The RPN is a fully convolutional network that slides over the convolutional feature map output by
the backbone network. At each sliding window location, the RPN predicts multiple region proposals,
each defined by a bounding box and an associated objectness score. The objectness score indicates the
likelihood that a region contains an object [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ].
          </p>
          <p>In the context of this PPE detection system, the RPN is crucial for eficiently generating proposals for
regions that may contain helmets, thus narrowing down the areas the second stage of Faster R-CNN
needs to process.</p>
        </sec>
        <sec id="sec-5-3-2">
          <title>5.3.2. Anchor Boxes</title>
          <p>
            RPNs use anchor boxes—predefined bounding boxes of various scales and aspect ratios—to detect objects
of diferent sizes. Each anchor box is refined on the basis of the actual object in the image through a
regression process. The loss function used to train the RPN includes both the classification loss (object
vs. background) and the regression loss (to refine the bounding box) [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. In this system, anchor boxes
are used to ensure that helmets of various sizes and orientations can be accurately detected.
          </p>
        </sec>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Training and Optimization Techniques</title>
        <p>Training deep learning models, such as those used in this PPE detection system, involves minimizing
a loss function that quantifies the diference between the model’s predictions and the ground truth.
Several optimization techniques are critical to ensure that the model converges to a solution that
generalizes well to unseen data.</p>
        <sec id="sec-5-4-1">
          <title>5.4.1. Stochastic Gradient Descent (SGD)</title>
          <p>Stochastic Gradient Descent (SGD) is one of the most widely used optimization algorithms in deep
learning. It updates the model’s parameters iteratively based on the gradient of the loss function with
respect to the parameters:
 +1 =   −  ∇   ( )
(5)
•   represents the model parameters at iteration .
•  is the learning rate, controlling the step size of the updates.
• ∇   ( ) is the gradient of the loss function  with respect to the parameters.</p>
          <p>
            SGD is particularly efective for large-scale learning tasks, as it updates parameters based on a single
or a few training examples at a time, allowing it to converge faster than traditional gradient descent
[
            <xref ref-type="bibr" rid="ref21">21</xref>
            ]. In this study, SGD was used to train the YOLO and Faster R-CNN models, enabling them to learn
from large datasets of images containing helmets.
          </p>
        </sec>
        <sec id="sec-5-4-2">
          <title>5.4.2. Backpropagation</title>
          <p>
            Backpropagation is the algorithm used to compute the loss function gradient with respect to the
parameters of the model. It involves calculating the gradient of the loss function layer by layer, starting
from the output layer and moving back through the network [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]. This process allows for eficient
computation of gradients, which are then used to update the model parameters during training.
          </p>
          <p>In the PPE detection system, backpropagation is employed to adjust the weights of CNNs, ensuring
that the model learns to detect helmets accurately by minimizing the error between its predictions and
the actual positions of helmets in the training images.</p>
        </sec>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Regularization Techniques</title>
        <p>
          Regularization techniques are essential in deep learning to prevent overfitting, where a model performs
well on training data but poorly on unseen data. In this study, several regularization methods were
used to ensure the robustness of the model.
5.5.1. Dropout
Dropout is a regularization technique that randomly sets a fraction of the output units to zero during
training, efectively preventing the model from relying too heavily on specific neurons. This encourages
the network to learn more robust and generalized features [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Mathematically, if h represents the
activations of a layer, dropout modifies this as follows:
h˜ = h · r,
        </p>
        <p>r ∼ Bernoulli()
where r is a random vector of the same shape as h, with each element drawn independently from a
Bernoulli distribution with parameter . During training, dropout forces the network to be redundant
by not relying on any single feature, which helps in better generalization when the network is deployed.</p>
        <p>In the PPE detection system, dropout was applied to the fully connected layers of the CNNs to reduce
overfitting and improve the model’s ability to generalize to new images where helmet detection is
required.</p>
        <sec id="sec-5-5-1">
          <title>5.5.2. Weight Regularization</title>
          <p>Weight regularization, often referred to as weight decay, involves adding a penalty to the loss function
that discourages large weights, thus preventing the model from becoming too complex [? ]. The most
common form is L2 regularization, which adds a term proportional to the square of the weights to the
loss function:
 ( ) = 0( ) +  ∑︁  2
2

(6)
(7)
• 0( ) is the original loss function.
•   are the model parameters.</p>
          <p>•  is the regularization parameter, controlling the strength of the penalty.</p>
          <p>This technique encourages the model to keep weights small, which generally leads to simpler models
that are less likely to overfit [ ? ]. In this study, L2 regularization was applied to the weights of the
CNN layers to ensure that the model remains simple and generalizes well across diferent scenarios and
helmet detection tasks.</p>
          <p>The techniques explained in this section are integral to the functionality and performance of the PPE
detection system developed in this study. By leveraging CNNs for feature extraction, YOLO for real-time
object detection, and Faster R-CNN with Region Proposal Networks for high-accuracy detection, the
system is able to efectively identify helmets in dynamic industrial environments. Furthermore, the use
of optimization techniques like SGD and backpropagation ensures that the models are trained eficiently,
while regularization techniques such as dropout and weight decay help prevent overfitting, ensuring
robust performance in real-world applications.</p>
          <p>These mathematical foundations and practical implementations reflect the advanced nature of the
deep learning techniques used, enabling the development of a highly efective and reliable PPE detection
system that enhances workplace safety by providing real-time monitoring and feedback.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <sec id="sec-6-1">
        <title>6.1. Real-World Deployment and Evaluation</title>
        <p>To assess the practical applicability of the Helmet Detection system, it was deployed in an active
manufacturing facility for a duration of four weeks. The evaluation focused on the system’s ability to
consistently detect helmet compliance under varying operational conditions.</p>
        <sec id="sec-6-1-1">
          <title>6.1.1. Deployment Setup</title>
          <p>The system was integrated into the facility’s existing security infrastructure, utilizing strategically
placed cameras to cover high-risk areas. The web interface was accessed by safety oficers responsible
for monitoring and responding to compliance alerts.</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>6.1.2. Performance Metrics</title>
          <p>During the deployment period, the system achieved the following performance metrics:
• Detection Rate: Maintained a consistent detection rate of 95%, with minimal fluctuations across
diferent shifts and operational scales.
• False Positives: Recorded a false positive rate of 3%, primarily due to reflections and non-helmet
headgear.
• False Negatives: Maintained a false negative rate of 2%, ensuring that most non-compliance
instances were accurately identified.
• Operational Downtime: Experienced less than 1% downtime, attributed to minor technical
adjustments and maintenance.</p>
        </sec>
        <sec id="sec-6-1-3">
          <title>6.1.3. User Feedback</title>
          <p>Safety oficers provided positive feedback regarding the system’s ease of use and the promptness of
alerts. The web interface’s real-time visualization and incident logging features were particularly
highlighted as valuable tools for proactive safety management.</p>
        </sec>
        <sec id="sec-6-1-4">
          <title>6.1.4. Impact on Workplace Safety</title>
          <p>Post-deployment analysis indicated a significant reduction in helmet non-compliance incidents by
approximately 20%, reflecting the system’s efectiveness in promoting safety adherence. Additionally,
the automated monitoring alleviated the burden on manual inspection processes, allowing safety
personnel to focus on other critical tasks.</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Model Performance Metrics</title>
        <p>The performance of the helmet detection model was evaluated using several quantitative metrics to
assess its accuracy and reliability. The following metrics were calculated based on the test dataset:
• Accuracy: 95%
• Precision: 94%
• Recall: 93%
• F1-Score: 93.5%</p>
        <sec id="sec-6-2-1">
          <title>6.2.1. Confusion Matrix</title>
          <p>The confusion matrix for the helmet detection model is presented in Table 1 and visualized in Figure 5.</p>
        </sec>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Discussion</title>
        <sec id="sec-6-3-1">
          <title>6.3.1. Analysis of Results</title>
          <p>The model exhibits high accuracy (95%) in detecting helmets, with a precision of 94% and recall of 93%,
culminating in an F1-score of 93.5%. These metrics indicate that the model is both precise and sensitive
in identifying helmets, minimizing false positives and false negatives.</p>
        </sec>
        <sec id="sec-6-3-2">
          <title>6.3.2. Comparison with Existing Methods</title>
          <p>Compared to traditional computer vision techniques and earlier deep learning models, our system
demonstrates superior performance. For instance, traditional HOG-SVM approaches typically achieve
around 80% accuracy, whereas our deep learning-based system enhances this to 95%. Similarly,
previous deep learning models such as Faster R-CNN reported accuracies around 88%, underscoring the
efectiveness of our optimized YOLOv5 implementation.</p>
        </sec>
        <sec id="sec-6-3-3">
          <title>6.3.3. Operational Eficiency</title>
          <p>The system maintains a response time of 200 ms per frame, facilitating real-time monitoring essential
for immediate safety interventions. This performance surpasses manual monitoring methods, which
are not only slower but also susceptible to human error.</p>
        </sec>
        <sec id="sec-6-3-4">
          <title>6.3.4. Robustness and Adaptability</title>
          <p>Through extensive data augmentation and model fine-tuning, the system remains robust against
variations in lighting, helmet designs, and occlusions. The high recall rate ensures that most instances
of non-compliance are detected, thereby enhancing overall workplace safety.</p>
        </sec>
        <sec id="sec-6-3-5">
          <title>6.3.5. Limitations and Future Work</title>
          <p>While the system performs admirably, certain limitations persist. For example, extreme lighting
conditions and highly occluded helmets can still pose detection challenges. Future work will focus
on integrating additional sensors and exploring multi-modal data inputs to further enhance detection
accuracy under such adverse conditions.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Ethical Considerations and Privacy</title>
      <p>The implementation of automated monitoring systems, such as the Helmet Detection system presented
in this study, inherently raises ethical and privacy concerns. It is imperative to address these issues to
ensure that the system is deployed responsibly and in compliance with relevant regulations.
7.1. Privacy Protection Measures
• Data Anonymization: All captured images are processed to anonymize personal identifiable
information (PII). Facial features and other identifiable markers are either blurred or excluded
from storage to protect individual privacy.
• Secure Data Storage: Data collected by the system is stored on encrypted servers with restricted
access, ensuring that only authorized personnel can retrieve and view the information.
• Compliance with Data Protection Laws: The system adheres to international data protection
regulations such as the General Data Protection Regulation (GDPR) and local privacy laws,
ensuring lawful processing of personal data.
• Consent and Transparency: Workers are informed about the monitoring system, its purpose,
and how their data will be used. Informed consent is obtained to respect individual autonomy
and privacy rights.
7.2. Ethical Implications
• Surveillance Concerns: Continuous monitoring can lead to perceptions of surveillance,
potentially afecting worker morale and trust. To mitigate this, the system is designed solely for safety
compliance without monitoring other aspects of worker behavior.
• Data Usage Limitations: Data collected is strictly used for enhancing workplace safety and is
not repurposed for unrelated monitoring or profiling, maintaining the system’s focus and ethical
integrity.
• Accountability and Oversight: Establishing clear accountability structures ensures that data
handling and system operations are conducted ethically. Regular audits and reviews are
implemented to uphold these standards.
7.3. Mitigating Potential Risks
• Bias and Fairness: Eforts have been made to ensure that the model does not exhibit biases
related to gender, ethnicity, or age by training it on a diverse and representative dataset.
• Transparency in Operations: Clear documentation and open communication channels are
established to inform stakeholders about how the system operates, the data it collects, and the
safeguards in place.
• Opt-Out Provisions: Workers retain the right to opt out of the monitoring system under certain
conditions, ensuring respect for individual preferences and rights.</p>
      <sec id="sec-7-1">
        <title>7.4. Future Ethical Considerations</title>
        <p>As the system evolves, continuous assessment of its ethical implications will be necessary. Future
iterations may incorporate:
• Enhanced Privacy Features: Integrating advanced privacy-preserving technologies such as
diferential privacy to further protect individual data.
• Stakeholder Engagement: Engaging with workers and other stakeholders to gather feedback
and address emerging ethical concerns proactively.
• Policy Development: Collaborating with legal and ethical experts to develop comprehensive
policies governing the system’s use and data management practices.</p>
        <p>In conclusion, while the Helmet Detection system ofers significant advancements in workplace
safety, it is crucial to balance these benefits with ethical considerations and privacy protections. By
implementing robust safeguards and fostering transparent practices, the system aims to uphold ethical
standards and respect individual privacy.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Results and Discussion</title>
      <sec id="sec-8-1">
        <title>8.1. Real-World Deployment and Evaluation</title>
        <p>To assess the practical applicability of the Helmet Detection system, it was deployed in an active
manufacturing facility for a duration of four weeks. The evaluation focused on the system’s ability to
consistently detect helmet compliance under varying operational conditions.</p>
        <sec id="sec-8-1-1">
          <title>8.1.1. Deployment Setup</title>
          <p>The system was integrated into the facility’s existing security infrastructure, utilizing strategically
placed cameras to cover high-risk areas. The web interface was accessed by safety oficers responsible
for monitoring and responding to compliance alerts.</p>
        </sec>
        <sec id="sec-8-1-2">
          <title>8.1.2. Performance Metrics</title>
          <p>During the deployment period, the system achieved the following performance metrics:
• Detection Rate: Maintained a consistent detection rate of 95%, with minimal fluctuations across
diferent shifts and operational scales.
• False Positives: Recorded a false positive rate of 3%, primarily due to reflections and non-helmet
headgear.
• False Negatives: Maintained a false negative rate of 2%, ensuring that most non-compliance
instances were accurately identified.
• Operational Downtime: Experienced less than 1% downtime, attributed to minor technical
adjustments and maintenance.</p>
        </sec>
        <sec id="sec-8-1-3">
          <title>8.1.3. User Feedback</title>
          <p>Safety oficers provided positive feedback regarding the system’s ease of use and the promptness of
alerts. The web interface’s real-time visualization and incident logging features were particularly
highlighted as valuable tools for proactive safety management.</p>
        </sec>
        <sec id="sec-8-1-4">
          <title>8.1.4. Impact on Workplace Safety</title>
          <p>Post-deployment analysis indicated a significant reduction in helmet non-compliance incidents by
approximately 20%, reflecting the system’s efectiveness in promoting safety adherence. Additionally,
the automated monitoring alleviated the burden on manual inspection processes, allowing safety
personnel to focus on other critical tasks.</p>
        </sec>
      </sec>
      <sec id="sec-8-2">
        <title>8.2. Discussion</title>
        <sec id="sec-8-2-1">
          <title>8.2.1. Analysis of Results</title>
          <p>The model exhibits high accuracy (95%) in detecting helmets, with a precision of 94% and recall of 93%,
culminating in an F1-score of 93.5%. These metrics indicate that the model is both precise and sensitive
in identifying helmets, minimizing false positives and false negatives.</p>
        </sec>
        <sec id="sec-8-2-2">
          <title>8.2.2. Comparison with Existing Methods</title>
          <p>Compared to traditional computer vision techniques and earlier deep learning models, our system
demonstrates superior performance. For instance, traditional HOG-SVM approaches typically achieve
around 80% accuracy, whereas our deep learning-based system enhances this to 95%. Similarly,
previous deep learning models such as Faster R-CNN reported accuracies around 88%, underscoring the
efectiveness of our optimized YOLOv5 implementation.</p>
        </sec>
        <sec id="sec-8-2-3">
          <title>8.2.3. Operational Eficiency</title>
          <p>The system maintains a response time of 200 ms per frame, facilitating real-time monitoring essential
for immediate safety interventions. This performance surpasses manual monitoring methods, which
are not only slower but also susceptible to human error.</p>
        </sec>
        <sec id="sec-8-2-4">
          <title>8.2.4. Robustness and Adaptability</title>
          <p>Through extensive data augmentation and model fine-tuning, the system remains robust against
variations in lighting, helmet designs, and occlusions. The high recall rate ensures that most instances
of non-compliance are detected, thereby enhancing overall workplace safety.</p>
        </sec>
        <sec id="sec-8-2-5">
          <title>8.2.5. Limitations and Future Work</title>
          <p>While the system performs admirably, certain limitations persist. For example, extreme lighting
conditions and highly occluded helmets can still pose detection challenges. Future work will focus
on integrating additional sensors and exploring multi-modal data inputs to further enhance detection
accuracy under such adverse conditions.</p>
        </sec>
      </sec>
      <sec id="sec-8-3">
        <title>8.3. Conclusion</title>
        <p>The development and successful implementation of the Helmet Detection system underscore the
potential of combining computer vision and deep learning techniques to improve workplace safety. By
automating the helmet detection process, the system reduces the need for manual monitoring, which
can be prone to human error and resource intensive. The real-time feedback provided by the system
ensures that any instances of noncompliance are immediately addressed, thereby reducing the risk of
accidents and enhancing overall safety.</p>
        <p>The user-friendly interface, coupled with the system’s ability to document and review noncompliance
instances, makes this solution practical and highly applicable to real-world industrial environments.
The scalability and adaptability of the system also suggest that it can be expanded to detect other types
of PPE, such as safety goggles, gloves, and high-visibility clothing, thereby ofering a comprehensive
solution for workplace safety.</p>
        <p>Future work could focus on enhancing the system’s detection capabilities under more challenging
conditions, such as extreme lighting variations or in environments with heavy machinery that could
cause visual obstructions. In addition, integrating the system with other safety management tools and
IoT devices could further improve its efectiveness and ease of use, making it an indispensable part of
modern industrial safety protocols.</p>
        <p>The nfidings of this study provide a strong foundation for further research and development in the field
of automated PPE detection. As industries continue to adopt and integrate AI-driven solutions, systems
like the Helmet Detection platform developed in this study will play a crucial role in safeguarding
workers and promoting a culture of safety.</p>
        <p>In conclusion, the future of workplace safety and PPE detection is poised to benefit significantly from
emerging trends in computer vision and deep learning technologies. One such trend is the integration
of 3D reconstruction models, which ofer a more detailed and accurate representation of environments,
enabling the detection systems to better understand object orientation and spatial relationships. This
advancement can lead to more robust PPE detection in complex industrial settings where traditional 2D
models may fall short. Additionally, the incorporation of multi-modal data fusion, where data from
various sensors (e.g., LiDAR, thermal imaging) is combined, will enhance the reliability and accuracy
of detection systems, particularly in challenging environments with poor lighting or obstructions.
Furthermore, the application of real-time edge computing is expected to grow, allowing for faster
processing of PPE detection on-site, reducing latency, and improving response times. These trends,
combined with continued advancements in machine learning algorithms such as self-supervised learning
and generative models, promise to significantly elevate the capabilities of PPE detection systems, making
workplaces safer and more eficient.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Balkhyour</surname>
          </string-name>
          , I. Ahmad,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rehan</surname>
          </string-name>
          ,
          <article-title>Assessment of personal protective equipment use and occupational exposures in small industries in jeddah: Health implications for workers</article-title>
          ,
          <source>Saudi journal of biological sciences 26</source>
          (
          <year>2019</year>
          )
          <fpage>653</fpage>
          -
          <lpage>659</lpage>
          . URL: https://pubmed.ncbi.nlm.nih.gov/31048988. doi:
          <volume>10</volume>
          .1016/j.sjbs.
          <year>2018</year>
          .
          <volume>06</volume>
          .011.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          , Deep Learning, MIT Press,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Occupational</given-names>
            <surname>Safety</surname>
          </string-name>
          and
          <article-title>Health Administration (OSHA)</article-title>
          , Head protection standards, https://www. osha.gov/laws-regs,
          <year>2021</year>
          . Accessed:
          <fpage>2024</fpage>
          -08-16.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition</article-title>
          ,
          <source>in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Love</surname>
          </string-name>
          ,
          <article-title>Computer vision applications in construction safety</article-title>
          ,
          <source>Automation in Construction</source>
          <volume>103</volume>
          (
          <year>2020</year>
          )
          <fpage>77</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Love</surname>
          </string-name>
          ,
          <article-title>Automated safety behavior analysis in construction using computer vision techniques</article-title>
          ,
          <source>Engineering, Construction and Architectural Management</source>
          <volume>25</volume>
          (
          <year>2018</year>
          )
          <fpage>503</fpage>
          -
          <lpage>516</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Viola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <article-title>Rapid object detection using a boosted cascade of simple features</article-title>
          ,
          <source>in: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , IEEE,
          <year>2001</year>
          , pp.
          <fpage>511</fpage>
          -
          <lpage>518</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dalal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Triggs</surname>
          </string-name>
          ,
          <article-title>Histograms of oriented gradients for human detection</article-title>
          ,
          <source>in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , IEEE,
          <year>2005</year>
          , pp.
          <fpage>886</fpage>
          -
          <lpage>893</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Donahue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darrell</surname>
          </string-name>
          , J. Malik,
          <article-title>Rich feature hierarchies for accurate object detection and semantic segmentation</article-title>
          ,
          <source>in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , IEEE,
          <year>2014</year>
          , pp.
          <fpage>580</fpage>
          -
          <lpage>587</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>Faster</surname>
          </string-name>
          r-cnn:
          <article-title>Towards real-time object detection with region proposal networks</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>28</volume>
          (
          <year>2015</year>
          )
          <fpage>91</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <source>The Elements of Statistical Learning: Data Mining, Inference, and Prediction</source>
          , Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Divvala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          ,
          <article-title>You only look once: Unified, real-time object detection</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>779</fpage>
          -
          <lpage>788</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R. O.</given-names>
            <surname>Duda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Hart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Stork</surname>
          </string-name>
          , Pattern Classification and
          <string-name>
            <given-names>Scene</given-names>
            <surname>Analysis</surname>
          </string-name>
          , Wiley-Interscience,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          , G. Synnaeve,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kirillov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zagoruyko</surname>
          </string-name>
          ,
          <article-title>End-to-end object detection with transformers</article-title>
          ,
          <source>in: European Conference on Computer Vision (ECCV)</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>229</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Boyd</surname>
          </string-name>
          , L. Vandenberghe, Convex Optimization, Cambridge University Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <article-title>Machine Learning: A Probabilistic Perspective</article-title>
          , MIT Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , L. Bottou,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hafner</surname>
          </string-name>
          ,
          <article-title>Gradient-based learning applied to document recognition</article-title>
          ,
          <source>Proceedings of the IEEE</source>
          <volume>86</volume>
          (
          <year>1998</year>
          )
          <fpage>2278</fpage>
          -
          <lpage>2324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          ,
          <source>in: Advances in neural information processing systems</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>1097</fpage>
          -
          <lpage>1105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>V.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Rectified linear units improve restricted boltzmann machines</article-title>
          ,
          <source>in: Proceedings of the 27th international conference on machine learning (ICML-10)</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>807</fpage>
          -
          <lpage>814</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Y.-L. Boureau</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ponce</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <article-title>LeCun, A theoretical analysis of feature pooling in visual recognition</article-title>
          ,
          <source>in: Proceedings of the 27th international conference on machine learning (ICML-10)</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <article-title>Large-scale machine learning with stochastic gradient descent</article-title>
          ,
          <source>in: Proceedings of COMPSTAT'2010</source>
          , Springer,
          <year>2010</year>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Rumelhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>Learning representations by back-propagating errors</article-title>
          ,
          <source>nature</source>
          <volume>323</volume>
          (
          <year>1986</year>
          )
          <fpage>533</fpage>
          -
          <lpage>536</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>N.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <article-title>Dropout: A simple way to prevent neural networks from overfitting</article-title>
          ,
          <source>Journal of machine learning research 15</source>
          (
          <year>2014</year>
          )
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>