=Paper= {{Paper |id=Vol-3762/526 |storemode=property |title=An integrated intelligent surveillance system for Industrial areas |pdfUrl=https://ceur-ws.org/Vol-3762/526.pdf |volume=Vol-3762 |authors=Francesco Camastra,Angelo Ciaramella,Angelo Casolaro,Pasquale De Trino,Alessio Ferone,Giovanni Hauber,Gennaro Iannuzzo,Vincenzo Mariano Scarrica,Antonio Junior Spoleto,Antonino Staiano,Maria Concetta Vitale |dblpUrl=https://dblp.org/rec/conf/ital-ia/CamastraCCTFHIS24 }} ==An integrated intelligent surveillance system for Industrial areas== https://ceur-ws.org/Vol-3762/526.pdf
                                An integrated intelligent surveillance system for Industrial
                                areas
                                Francesco Camastra, Angelo Ciaramella, Angelo Casolaro, Pasquale De Trino, Alessio Ferone,
                                Giovanni Hauber, Gennaro Iannuzzo, Vincenzo Mariano Scarrica, Antonio Junior Spoleto,
                                Antonino Staiano∗ and Maria Concetta Vitale
                                Department of Science and Technology, Parthenope University of Naples, Centro Direzionale Isola C4, Naples, 80143, Italy


                                                 Abstract
                                                 This paper presents the design and implementation phases of a software prototype developed by the University of Parthenope
                                                 for the SE4I project (Smart Energy Efficiency & Environment for Industry), funded by the ”Progetti di ricerca industriale e
                                                 lo Sviluppo sperimentale” (PNR 2015-2020). The prototype leverages advanced computer vision techniques based on deep
                                                 learning architectures to address industrial security and monitoring needs. Specifically, the prototype tackles three key
                                                 functionalities, (1) personnel and vehicle identification: The system recognizes authorized personnel and vehicle license
                                                 plates within video streams captured in restricted industrial areas; (2) anomaly detection: The software can detect various
                                                 anomalies in video feeds, including falls of personnel in monitored zones and unattended objects left in unauthorized areas;
                                                 (3) smart parking management: The prototype identifies vacant parking spaces within camera-monitored zones, enabling
                                                 efficient parking management. These functionalities are integrated into the software prototype, and its performance has been
                                                 thoroughly evaluated.

                                                 Keywords
                                                 Plate Detection, Face Detection, Fall Detection, Parking Detection



                                1. Introduction                                                                                                  with barriers. Upon arrival, an employee’s car triggers
                                                                                                                                                 the system. The camera mounted on the smart pole cap-
                                The SE4I project aims to improve safety within a desig- tures the RGB video stream of the scene, using AI to
                                nated industrial area by implementing a real-time video identify the license plate and the driver’s face. Access
                                monitoring system. This system uses strategically placed is then granted only after successful recognition. The
                                smart poles equipped with RGB cameras to capture video combined recognition of license plate and driver verifies
                                streams. The project focuses on three key functionalities that the vehicle and driver are authorized. If recognition
                                (see Fig. 1): (a) authorized access control: The system is successful, access is allowed; if not, access is denied;
                                will recognize individuals and vehicle license plates. This (b) anomaly detection: This use case focuses on detect-
                                ensures that only authorized personnel and vehicles can ing abnormal behavior or events in the video streams
                                enter the area, likely through controlled access points captured by the pole-mounted cameras. These anoma-
                                                                                                                                                 lies can range from environmental violations, such as
                                Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- illegal dumping of waste to personnel safety issues, e.g.,
                                nized by CINI, May 29-30, 2024, Naples, Italy
                                ∗
                                     Corresponding author.                                                                                       persons’ falls, and unattended objects left in restricted
                                Envelope-Open francesco.camastra@uniparthenope.it (F. Camastra);                                                 areas. An RGB video stream from a smart pole camera
                                angelo.ciaramella@uniparthenope.it (A. Ciaramella);                                                              continuously feeds the scene, which may include both
                                angelo.casolaro001@studenti.uniparthenope.it (A. Casolaro);                                                      facility personnel and outsiders, to a hardware compo-
                                pasquale.detrino001@studenti.uniparthenope.it (P. D. Trino);                                                     nent equipped with AI modules. The intelligent module
                                alessio.ferone@uniparthenope.it (A. Ferone);
                                giovanni.hauber@studenti.uniparthenope.it (G. Hauber);
                                                                                                                                                 analyzes the video to identify unusual elements. Upon
                                gennaro.iannuzzo001@studenti.uniparthenope.it (G. Iannuzzo);                                                     detection, an alert is sent to a central control station that
                                vincenzomariano.scarrica001@studenti.uniparthenope.it                                                            specifies the type and location of the event. This allows
                                (V. M. Scarrica);                                                                                                immediate assistance to personnel experiencing medical
                                antoniojunior.spoleto001@studenti.uniparthenope.it (A. J. Spoleto); emergencies or accidents and swift intervention in case
                                antonino.staiano@uniparthenope.it (A. Staiano);
                                mariaconcetta.vitale001@studenti.uniparthenope.it (M. C. Vitale)
                                                                                                                                                 of suspicious activity; (c) smart parking management:
                                Orcid 0000-0003-4439-7583 (F. Camastra); 0000-0001-5592-7995                                                     This use case addresses the detection and management of
                                (A. Ciaramella); 0000-0002-7577-6765 (A. Casolaro);                                                              parking availability within the vast industrial area. Due
                                0009-0003-0680-4501 (P. D. Trino); 0000-0002-4883-0164 (A. Ferone); to its size, automated and intelligent parking lot mon-
                                0009-0007-0137-3182 (G. Hauber); 0009-0003-5962-8302                                                             itoring is crucial. This system will inform users about
                                (G. Iannuzzo); 0009-0008-4640-2693 (V. M. Scarrica);
                                0009-0007-4037-7821 (A. J. Spoleto); 0000-0002-4708-5860
                                                                                                                                                 free parking spaces as they approach designated park-
                                (A. Staiano); 0000-0002-5538-9952 (M. C. Vitale)                                                                 ing spaces. In the context of performing learning tasks
                                                    © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                                    Attribution 4.0 International (CC BY 4.0).
                                                                                                                                                 from video streams in surveillance and security appli-




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                                                   region, the Warped Planar Object Detection Network
                                                                   (WPOD-NET) [3] is used to search for license plates. The
                                                                   WPOD-NET performs affine transformations to rectify
                                                                   the LP area to resemble a frontal view. These detections
                                                                   are then passed to an OCR network for accurate charac-
                                                                   ter recognition and extraction. To balance computation
                                                                   time and performance, we chose YoloV4 [4]. For the clas-
                                                                   sification problem, we treated the network as a closed
                                                                   system, consolidating the outputs specifically related to
                                                                   vehicles such as cars, buses, and motorcycles, while ig-
Figure 1: Project tasks: (a) people/vehicles recognition; (b)      noring outputs related to other classes. The design of
fall detection; (c) anomaly detection; (d) parking lot handling.



cations, the state of the art is represented by computer
vision techniques based on the use of deep learning tech-
niques [1, 2].In the subsequent sections, we will discuss
the proposed solutions for each of the aforementioned
tasks.
                                                                   Figure 2: The proposed pipeline at work.
2. Plate Detection
The objective is to recognize vehicles and their license     WPOD-NET, which is responsible for warping the license
plates from a surveillance video feed and retrieve the as-   plate into a rectangular shape, was influenced by insights
sociated alphanumeric sequence. This sequence is subse-      from YOLO, SSD [5], and Spatial Transformer Networks
quently utilized for additional vehicle recognition within   (STN) [6]. Finally, in our OCR module, we used Tesser-
the system.                                                  act [7], a fine-tuned optical character recognition engine
                                                             trained on our license plate character dataset. Tesser-
                                                             act’s advantage over a simple CNN lies in its recurrent
2.1. Challenges                                              neural network (RNN) architecture [8], which takes into
Automatic license plate recognition faces several hurdles, account the sequential nature of the characters on a li-
specifically, (a) Variable Lighting: Extreme brightness, cense plate. This allows for accurate recognition as the
low light, and shadows can significantly reduce plate RNN captures the contextual dependencies between char-
visibility. The systems address this with techniques like acters. Tesseract’s extensive training on diverse datasets
adaptive thresholding and contrast enhancement; (b) Car makes it robust, handling different font styles, sizes, and
Position: Vehicles approach cameras at various angles noise levels commonly found in license plate images.
and distances. Sophisticated algorithms are required for
accurate plate localization and perspective correction to 2.3. Execution
account for these variations; (c) Occlusions: Objects like
bumpers, dirt, or even other vehicles can partially or fully The modules are designed for real-time execution in an
obscure the plate. Robust object detection and diverse embedded system environment, given the strict time con-
training data are crucial to overcome these occlusions; straints imposed by vehicle identification. Fig. 3 illus-
(d) Font Diversity: License plate formats and fonts differ trates the time required for the execution steps.
significantly across countries and even regions. Train-
ing models on a wide variety of datasets is essential for
generalization across different plate styles.

2.2. Methods
The proposed approach consists of three key steps: ve- Figure 3: Execution time of the pipeline.
hicle detection, license plate (LP) detection, and optical
character recognition (OCR), as shown in Figure 2. In
the first step, the system detects vehicles in the scene
using a dedicated module. Within each detected vehicle
3. Face Recognition
Our research team has developed a framework for secure
access control in the industrial area as a smart city envi-
ronment. The framework leverages surveillance cameras
positioned at entry points to industrial areas. It aims to
match detected faces with the license plates of associ-
ated vehicles. This ensures that the driver corresponds
to the registered vehicle owner, improving access control
security.

3.1. Challenges                                               Figure 4: Visual results on a customized test.
The framework faces different issues, particularly with
on-board processing. In addition, reflective surfaces and
occlusions caused by sunlight on vehicle windshields can      4.1. Challenges
hinder facial recognition.
                                                              Anomaly recognition in video streams presents a signifi-
                                                              cant challenge: identifying rare and short-lived events
3.2. Methods                                                  that deviate from the norm. These anomalies often occur
The face recognition process is divided into two main for just a few seconds, making them difficult for humans
steps, namely, face localization and face classification. to detect and nearly impossible to capture in a single,
In the localization step, faces are accurately localized universal model. The vast number of possible anomaly
within an image by a Multi-task Cascaded Convolutional types, locations, and contexts makes defining a compre-
Networks (MTCNN) algorithm [9]. Its unique cascade hensive model impractical. It would require an enormous
structure consists of three stages: Proposal Network (P- amount of data and manual effort. A more effective ap-
Net), Refinement Network (R-Net), and Output Network proach is to train models that can differentiate between
(O-Net). By simultaneously performing multiple tasks normal and abnormal activity, regardless of the specific
such as face detection, bounding box regression, and face anomaly type. This approach leverages the fact that nor-
landmark localization, MTCNN ensures a thorough and mal behavior typically occurs far more frequently than
accurate face identification. In particular, it excels at anomalies.
detecting faces at different scales and orientations while
maintaining impressive computational efficiency, making 4.2. Methods
it ideal for real-time applications.
    For face classification, an extremely effective approach We have used a reconstruction-based method, where
combines two different algorithms. The first is a face a model is trained to learn the normal patterns of the
alignment with an ensemble of regression trees [10]. Us- data, to be able to reconstruct them when new frames
ing an ensemble of regression trees, the algorithm pre- are presented. The model is designed to extract the spa-
dicts the positions of facial landmarks directly from image tiotemporal structure of the video stream to accurately
data, bypassing traditional optimization methods. The learn the pattern of normality for a scene without anoma-
second uses FaceNet[11], which efficiently maps a face lies [12](without the abnormal situation highlighted in
into a continuous embedding space, i.e., converting it red in Figure 5). During testing, the model provides in-
into a 128-feature embedding vector. This vector is then formation about the most anomalous areas of the video
matched to a face in the database using a one-shot ap- by comparing input frames with reconstructed frames.
proach (see Fig. 4 as an example of qualitative result on A relatively low score indicates a normal scene, while
a test image.)                                                a high score indicates the presence of anomalies. The
                                                              goal is to create an end-to-end model capable of learn-
                                                              ing the spatio-temporal patterns of the analyzed data
4. Anomaly Detection                                          and predicting when an event is anomalous compared
                                                              to the learned normality. The model used for this un-
Anomaly detection involves the identification of unusual supervised analysis is called CLSTM-AE, which stands
items, events, or observations that are significantly differ- for Long-Short Term Memory Convolutional-Transpose
ent from the norm or expected behavior, or that indicate Convolutional Autoencoder. The architecture is a spe-
unusual conditions. The activities conducted in the SE4I cial type of autoencoder [13]. This approach enables
project focused on identifying waste dumping where ac- real-time anomaly detection by identifying events that
cess is prohibited, and fall detection.
                                                             this approach is labor-intensive and requires continuous
                                                             monitoring.

                                                             4.3.2. Methods
                                                             Our method addresses the previous issues by enabling
                                                             fall detection with a simpler setup, a standard RGB cam-
                                                             era, eliminating the need for specialized equipment, and
                                                             AI-powered detection that uses a single AI module run-
                                                             ning on a GPU to analyze the video stream for instances
                                                             of falls. This approach eliminates the need for wearable
                                                             devices and reduces the reliance on human intervention.
                                                             Detection is to be performed in pedestrian areas and
                                                             parks, and so a dataset was created to fit this particular
                                                             environment and to train the model on data representing
                                                             the final context. The training dataset is illustrative of
Figure 5: Anomaly detection in a parking lot.
                                                             all the normal poses that people take while walking in
                                                             places like pedestrian areas and parks. The scenes were
                                                             therefore captured with a fixed camera about 3 meters
deviate significantly from the learned representation of     above the ground, pointing across an open space and
normal video data. The trained model compares the re-        covering walking, standing poses, and running events
constructed video frames with the original input. For nor-   in all directions and with or without obstacles/occlu-
mal events, the reconstructed frames closely resemble the    sions. The idea of training a model with only ”normal
originals, with minimal differences in pixel values. How-    events” is important because, in nature, abnormal events
ever, when anomalies occur, the network’s reconstruc-        (falls) are very rare and therefore expensive to acquire.
tion becomes less accurate. This is reflected in blurry or   The data preprocessing pipeline involves using the open-
distorted frames compared to the originals. By analyzing     pose framework [14] to extract skeleton key points from
this reconstruction error (the difference between original   each frame. From each skeleton, irrelevant keypoints
and reconstructed frames), one can identify anomalies in     are removed as they are considered noisy and skeletons
real-time.                                                   with significant missing keypoints are filtered out. Key-
                                                             point coordinates are then normalized using min/max
4.3. Fall Detection                                          normalization and discretized into coarser bins to pro-
                                                             vide numerical stability for the training phase. Finally,
Falls, especially outdoors where help might be delayed,      the data are shaped into time windows using sequences
can lead to serious injuries. Traditional fall detection     of 75 skeletal frames. Such windows form the basic unit
systems often rely on wearable sensors, which can be         on which the AI model operates. Since the video stream
inconvenient or impractical. The SE4I project addresses      is supposed to be captured at 25 FPS, working with 75
this challenge with a camera-based fall detection sys-       frame windows means analyzing human behavior over
tem using an LSTM Autoencoder. This system leverages         3-second actions. An overlap of 25 frames between con-
anomaly detection techniques within a computer vision        secutive windows is also included to maintain continuity
framework. It essentially learns what ”normal” move-         between the windows themselves. The model is based on
ment looks like and identifies deviations from this norm     a LSTM autoencoder [15], [16]. The execution time when
as potential falls. This approach offers several advan-      running on a consumer GPU allows for real-time perfor-
tages, specifically, no need for wearable Sensors, only      mance (see Fig. 6). Once the model has learned normal
camera-based detection working with existing surveil-
lance infrastructure, and the use of real-time alerts, en-
abling faster response times.

4.3.1. Challenges
Traditional fall detection systems typically rely on wear- Figure 6: Pipeline components execution times.
able sensors or specialized depth cameras. These meth-
ods can be intrusive to users and costly to deploy on a
large scale. On the other hand, relying solely on human human behavior patterns, it can be used to reconstruct
observation through video footage is an option. However, time windows. Reconstruction and input data are then
                                                           compared; if the reconstruction error exceeds a certain
threshold and deviates significantly from normal data,           Parking lot detection and car detection are performed
the input is intuitively flagged as a fall event. Overall, simultaneously to classify occupied or free parking lots
the results highlight the effectiveness of using learned on the basis of the IoU between parking lot and car masks
temporal skeletal patterns for robust anomaly detection detected by the Yolact++ module, respectively. For IoU
in the context of outdoor fall detection.                     values greater than an IoU threshold, the system classi-
                                                              fies parking lots as busy lots, as free lots, otherwise.
                                                              The Yolact++ architecture is based on the RetinaNet ar-
5. Parking Detection                                          chitecture [20], using pre-trained ResNet-101 stages. In
                                                              addition, Yolact++ introduces three improvements over
Parking detection for SE4I requires the development of
                                                              the base model: Fast Mask Re-Scoring Network Stage, De-
an automatic system that searches for free parking space
                                                              formable Convolutions with Intervals, and a Optimized
in one of the parking areas within the industrial area and
                                                              Prediction Head. The selection of the Yolact++ architec-
provides information to drivers who have requested a
                                                              ture for the parking lot detection problem was motivated
parking space. The Parking Guide and Information (PGI)
                                                              by the runtime requirements and the accuracy achieved
system [17] has been adopted as a solution for the parking
                                                              by this instance segmentation model.
detection task using a monitoring system. The proposed
                                                              A client-server system called PGI has been developed.
PGI system consists of two main parts. The former is
                                                              The clients include the drivers, administrators, and ma-
based on deep learning instance segmentation model to
                                                              chine learning systems. Drivers can search for parking
detect all available free spaces in a parking lot. The latter
                                                              lots, while administrators can add, remove, and moni-
is a client-server architecture that automatically guides
                                                              tor parking lots.System operations are performed on the
drivers to the closest parking lot with the highest number
                                                              server side, which is built using PHP and MySQL for
of available spaces.
                                                              database storage. Clients connect to the server through
                                                              a server interface using a Java Android app. The app
5.1. Challenges                                               provides various functionalities, such as guiding drivers
Parking lot detection systems using video surveillance to the nearest parking lot with available spaces using
face several difficulties: (a) the impact of weather, e.g., GPS, and monitoring areas using the Google StreetView
low visibility caused by fog, rain, and snow can signifi- API. The system presents favorable results with low loss
cantly decrease the accuracy of these systems, or harsh values and acceptable mAP for both the box and the mask,
weather conditions can obscure parking lot boundaries in determined using a 0.5 IoU threshold (see Table 1 and Fig.
the video feed; (b) diverse parking lot data, i.e., training 7).
robust parking detection models requires a large dataset              Metrics                         Average Values
with a wide variety of scenarios, including variations in             Box Localization Loss           2.027
parking space layouts, weather conditions, camera an-                 Class Confidence Loss           1.604
gles, obstructions, parking lot types (e.g., open-air, multi-         Mask Loss                       3.185
story), and lighting conditions (day/night); (c) real-time            Semantic Segmentation Loss      0.125
processing, that is, for practical applications, the system           I Loss                          0.116
needs to operate in real-time, this necessitates developing           Total Loss                      7.058
a light parking detection model that can run efficiently              mAP@0.50 Box                    80.5
on available hardware.                                                mAP@0.50 Mask                   76.62
                                                                Table 1
5.2. Methods                                                    Results on Yolact++ after fine-tuning and testing on the cus-
                                                                tom dataset.
This work conceived a model for parking lot detection
using an instance segmentation approach. Yolact++ [18],
which is an extension of Yolact [19], was trained with suc-
cessful results on a novel dataset appropriately designed       6. Integration and Infrastructure
for this task. The dataset consists of 1395 images and
23600 manually annotated parking lots, and it was built         The five intelligent modules of the SE4I project, plate
by using a web-scarping approach. The images, taken             detection, face detection, anomaly detection, fall detection,
from public access cameras, were selected to represent a        and parking detection, are part of a larger system powered
variety of conditions, i.e. weather and lighting conditions,    by a peer-to-peer network of NVIDIA Jetson Xavier
features, i.e, different camera angles, occlusions, shadows,    devices mounted on multifunctional light poles. This
presence of people or animals, camera heights, satellite im-    setup ensures efficient and real-time processing of the
agery in 2D and 3D, different types of lines and colors,        data collected by the surveillance cameras, as the com-
and different backgrounds.                                      putation is performed in the field and each device shares
                                                                    [4] A. Bochkovskiy, C. Wang, H. M. Liao, Yolov4: Opti-
                                                                        mal speed and accuracy of object detection, CoRR
                                                                        abs/2004.10934 (2020). arXiv:2004.10934 .
                                                                    [5] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed,
                                                                        C.-Y. Fu, A. C. Berg, Ssd: Single shot multibox
                                                                        detector, in: ECCV 2016, Springer Intl. Pub., 2016,
                                                                        pp. 21–37.
                                                                    [6] M. Jaderberg, K. Simonyan, A. Zisserman,
                                                                        K. Kavukcuoglu, Spatial transformer networks,
                                                                        CoRR abs/1506.02025 (2015).
                                                                    [7] R. Smith, An overview of the tesseract ocr engine,
                                                                        in: ICDAR 2007, volume 2, 2007, pp. 629–633.
                                                                    [8] S. Hochreiter, J. Schmidhuber, Long short-term
                                                                        memory, Neural computation 9 (1997) 1735–80.
                                                                    [9] K. Zhang, Z. Zhang, Z. Li, Y. Qiao, Joint face detec-
Figure 7: Visual results on test images. Masks are applied to           tion and alignment using multitask cascaded convo-
the lots, each with a different color to better distinguish each        lutional networks, IEEE Signal Processing Letters
instance. The associated probability score is printed on each           23 (2016) 1499–1503.
mask.                                                              [10] V. Kazemi, J. Sullivan, One millisecond face align-
                                                                        ment with an ensemble of regression trees, in: 2014
                                                                        IEEE Conference on Computer Vision and Pattern
data and JSON output with devices on other poles using                  Recognition, 2014, pp. 1867–1874.
a ZMQ publisher/subscriber pattern. Therefore, a dedi-             [11] F. Schroff, D. Kalenichenko, J. Philbin, Facenet: A
cated module manages the surveillance camera stream                     unified embedding for face recognition and cluster-
and associated metadata, such as brightness, frame rate,                ing, in: 2015 IEEE Conference on Computer Vision
contrast, etc. All modules are containerized as Docker                  and Pattern Recognition (CVPR), 2015, pp. 815–823.
solutions, allowing for flexible portability, easy installa-       [12] M. Hasan, J. Choi, J. Neumann, A. K. Roy-
tion, resilience, and scalable performance. The whole                   Chowdhury, L. S. Davis, Learning temporal regular-
system is based on Python and C++ programming lan-                      ity in video sequences, 2016. arXiv:1604.04574 .
guages and PyTorch, OpenCV, OpenPose, Onvif libraries              [13] Y. S. Chong, Y. H. Tay, Abnormal event detection
are used. This infrastructure guarantees the real-time                  in videos using spatiotemporal autoencoder, 2017.
requirements and the privacy of the video-monitored                     arXiv:1701.01546 .
areas.                                                             [14] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, Y. Sheikh,
                                                                        Openpose: Realtime multi-person 2d pose estima-
                                                                        tion using part affinity fields, 2019. doi:10.1109/
Acknowledgments                                                         TPAMI.2019.2929257 . arXiv:1812.08008 .
                                                                   [15] S. Hochreiter, J. Schmidhuber, Long short-term
This research was conducted as part of the Smart En-                    memory, Neural Comput. 9 (1997).
ergy Efficiency & Environment for Industry (SE4I) project,         [16] M. A. Kramer, Nonlinear principal component anal-
CUP 𝐼 66𝐺18000230005, funded by ”Progetti di ricerca                    ysis using autoassociative neural networks, Aiche
industriale e lo Sviluppo sperimentale nelle 12 aree di                 Journal 37 (1991) 233–243.
specializzazione individuate nel PNR 2015-2020, di cui al          [17] D. Acharya, W. Yan, K. Khoshelham, Real-time
D.D. del 13 luglio 2017 n. 1735”.                                       image-based parking occupancy detection using
                                                                        deep learning., Research@ Locate 4 (2018) 33–40.
                                                                   [18] C. Zhou, Yolact++ Better Real-Time Instance Seg-
References                                                              mentation, University of California, Davis, 2020.
 [1] A. Ferone, A. Maratea, Adaptive quick reduct for              [19] D. Bolya, C. Zhou, F. Xiao, Y. J. Lee, Yolact: Real-
     feature drift detection, Algorithms 14 (2021).                     time instance segmentation, in: Proceedings of the
 [2] A. Maratea, A. Ferone, Deep neural networks and                    IEEE/CVF international conference on computer
     explainable machine learning, in: WILF 2018, vol-                  vision, 2019, pp. 9157–9166.
     ume 11291 LNAI, 2019, p. 253 – 256.                           [20] T. Lin, P. Goyal, R. B. Girshick, K. He, P. Dol-
 [3] S. Montazzolli, C. Jung, License plate detection and               lár, Focal loss for dense object detection, CoRR
     recognition in unconstrained scenarios, in: ECCV                   abs/1708.02002 (2017). URL: http://arxiv.org/abs/
     2018, Springer Intl. Pub., 2018, pp. 593–609.                      1708.02002. arXiv:1708.02002 .