=Paper=
{{Paper
|id=Vol-3762/532
|storemode=property
|title=Cutting edge video analytics solutions: from the research to the market
|pdfUrl=https://ceur-ws.org/Vol-3762/532.pdf
|volume=Vol-3762
|authors=Mattia Marseglia,Domenico Rocco,Stefano Saldutti,Bruno Vento
|dblpUrl=https://dblp.org/rec/conf/ital-ia/MarsegliaRSV24
}}
==Cutting edge video analytics solutions: from the research to the market==
<pdf width="1500px">https://ceur-ws.org/Vol-3762/532.pdf</pdf>
<pre>
                                Cutting edge video analytics solutions: from the research to
                                the market
                                Mattia Marseglia1,† , Domenico Rocco1,† , Stefano Saldutti1,† and Bruno Vento1,†
                                1
                                    A.I. Tech srl - www.aitech.vision, Piazza Vittorio Emanuele 10, Penta(SA), 84123, Italy


                                                 Abstract
                                                 A.I. Tech was born as a spinoff company of the University of Salerno and designs and develops cutting edge video analytics
                                                 solutions based on deep learning, able to run on board of smart cameras and/or on devices with limited resource capabilities.
                                                 A.I. Tech solutions are designed to serve various vertical markets: retail, business intelligence, security and safety, smart
                                                 parking, smart city and smart roads. In this paper we present all these solutions, which are the products of years of research
                                                 transferred to the market.

                                                 Keywords
                                                 A.I. Tech, video analytics, cutting edge, computer vision


                                1. Company presentation                                                           Tech the “Innovation & Excellence Awards” for the year
                                                                                                                  2022, renewing the award also for the year 2023, consid-
                                A.I. Tech designs and develops cutting edge video an-                             ering the company as the most innovative in the field of
                                alytics solutions based on the most advanced artificial                           “AI Technology”.
                                intelligence and deep learning algorithms, also running                              The activities that A.I. Tech carries out, with a highly
                                directly on board of smart cameras, and therefore opti-                           technological and scientific content, require specialized
                                mized for low-performance hardware. A.I. Tech boasts                              skills in the field of Artificial Intelligence, Artificial Vision
                                partnerships with world leaders in their reference fields,                        and Embedded Systems. For this reason, the company
                                including (the list is not exhaustive) NVIDIA, Panasonic,                         has a very close collaboration relationship with the De-
                                Samsung, Hanwha Techwin, Mobotix, Axis, Hikvision,                                partment of Information and Electrical Engineering and
                                Dahua. In particular, Hanwha Techwin, Panasonic and                               Applied Mathematics (DIEM) of the University of Salerno.
                                Mobotix resell the video analytics solutions from A.I.                            In particular, there is also an agreement for the activation
                                Tech on a global scale. In 2017 A.I. Tech has been se-                            of company internships as well as scientific collabora-
                                lected among the Top25 international companies in the                             tions for the next years. These activities allow to transfer
                                field of Artificial Intelligence by CIO Applications Mag-                         the scientific skills of the DIEM research group in the
                                azine. In 2018 it enters the Top10 Most Innovative AI                             field of Artificial Vision and Artificial Intelligence, with
                                Solution Providers. Its technology was selected among                             a consequent technological transfer of research products
                                the finalists in the Benchmark Innovation Award in 2018,                          which takes the form of a series of cutting edge artifi-
                                2019, 2020, 2021 and 2022. In 2018 it wins the award in                           cial intelligence products, commercially available at an
                                the Business Intelligence category, with the AI-RETAIL                            international level.
                                video analytics solution. In 2020 A.I. Tech won the Cor-
                                porate LiveWire award in the “Most Innovative in Video
                                Analytics” category. In 2020 its solutions are finalists                          2. Overview of the solutions
                                in the Security and Fire Excellence Award, for the AI-
                                CROWD-DEEP product (in the Security Software Prod-                                                     Most of the deep learning based systems available nowa-
                                uct Innovation of the Year category) and for the WOW                                                   days in the market are realized on top of off-the-shelf
                                project (in the Security Project of the Year category). The                                            detectors. Anyway, designing software solutions engi-
                                AI-TRAFFIC solution for traffic monitoring is also the                                                 neered to be as accurate as the state-of-the-art without
                                winner of the IoMOBILITY AWARD 2020, in the Mobil-                                                     the computational burden typically required by deep neu-
                                ity Analytics category. Corporate LiveWire awarded A.I.                                                ral networks, is definitively more challenging. Realizing
                                                                                                                                       computationally inexpensive solutions is a mandatory
                                Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- requirement in several real-world applications where the
                                nized by CINI, May 29-30, 2024, Naples, Italy                                                          system is expected to process hundreds of video streams
                                †
                                  These authors contributed equally.                                                                   simultaneously in real-time keeping an affordable cost;
                                $ mattia.marseglia@aitech.vision (M. Marseglia);                                                       smart-cities are a noteworthy example of that. Moreover,
                                domenico.rocco@aitech.vision (D. Rocco);
                                                                                                                                       in different contexts the processing is required to be per-
                                stefano.saldutti@aitech.vision (S. Saldutti); br1.vento@gmail.com
                                (B. Vento)                                                                                             formed of on the edge due to environmental constraints,
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License therefore the video analytic application has to run on
                                          Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
board of smart cameras [1], with very limited hardware          an alarm if two or more persons are not respecting the
resources.                                                      social distances for a given amount of time; (iv) counting
   Within this context, a common design choice of all the       of people that cross virtual lines; (v) counting the number
A.I.Tech applications is to preserve the accuracy compa-        of pedestrians crossing one area and arriving in another,
rable with state-of-the-art detectors and classifiers based     building the origin-destination matrix. An example of
on heavy neural networks, but achieving the lowest hard-        the solution in action is shown in Figure 1c.
ware requirement together with the higher processing                AI-FIREPLUS 4 are the solutions focused on the early
throughput. Thanks to this, A.I. Tech plugins are able          detection of fires. It combines the analysis of movement
to run directly on board of a huge amount of different          and appearance with a deep neural network to detect
smart cameras providing open platforms to specific part-        the presence of flame or smoke within an area under
ners (and in particular on board of specific models of          monitoring [8], it can operate in both indoor and outdoor
the following camera manufacturers: Androvideo, Axis,           environments. The main benefit of this application is that
Bosch, Dahua, Hanwha Techwin, Hikvision, Mobotix,               it does not require thermal or thermographic sensors, but
Panasonic, Topview, Vivotek). A.I. Tech confirms to be,         traditional optic ones instead. An example is shown in
in the world, the video analytics vendor supporting the         Figure 1d.
highest number of camera platforms.                                 AI-INTRUSION 5 is the video analytic solution for the
                                                                detection of intruders (people or vehicles). It is capable to
                                                                detect: (i) intrusions or loitering within an area of interest
3. Video analytics products                                     framed by the camera; (ii) the crossing of a virtual line;
                                                                (iii) the crossing of multiple crossing lines (not necessarily
In this section we are going to describe 12 video analytics
                                                                parallel) in sequence. In addition to the size and the
solutions currently available on the market.
                                                                aspect ratio of the object, it uses a deep neural network
    AI-BIO 1 performs face analysis with the purpose of
                                                                to filter objects according to their class. An example is
extracting soft-biometric features like age, gender and
                                                                reported in Figure 1e.
emotion [2, 3, 4]. The application has a multitask architec-
                                                                    AI-LOST 6 is the video analysis application designed to
ture based on multiple deep neural networks engineered
                                                                detect removed or abandoned objects in restricted envi-
to be executed on board of embedded platforms and smart
                                                                ronments where constant surveillance cannot be guaran-
cameras. It can be used both for business intelligence and
                                                                teed [9]. The application can use a deep neural network
for digital signage applications [5]. In particular, in the
                                                                to recognize garbage or, alternatively, baggage. An ex-
last case, the aim is to personalize advertisement contents
                                                                ample is reported in Figure 1f.
on a monitor by taking into account the soft-biometric
                                                                    AI-LPR is the solution for license plate detection and
features extracted from the face of the person who is
                                                                recognition. Unlike other products available in the mar-
watching at the monitor. An example is shown in Figure
                                                                ket, it is fully based on deep learning for both plate de-
1a.
                                                                tection and license character recognition. An example of
    AI-CROWDCOUNTING 2 is a video analytics applica-
                                                                the product is shown in Figure 1g.
tion tailored to estimate, for statistical or alerting pur-
                                                                    AI-PARKING 7 is designed to monitor both indoor and
poses, the crowd density within specific very crowded
                                                                outdoor parking, so as to verify whether a parking spot is
areas of interest. Powered by a deep learning model and
                                                                free or occupied. Unlike other solutions based on vehicle
boosted by a distinctive training strategy [6], the system
                                                                detection, this is a very effective application requiring
is not only able to detect people fully visible in the scene,
                                                                that only a part of the vehicle must be visible to monitor
but also to identify those that are very occluded, thanks to
                                                                a spot. An example of AI-PARKING in action is available
a point-based head detection algorithm. This makes the
                                                                in Figure 1h.
application particularly suited for very crowded environ-
                                                                    AI-PEOPLE-DEEP 8 is the solution that exploits a deep
ments, such as stadiums, concerts or trade fairs. Figure
                                                                neural network to count the people framed by a camera
1b shows an example of the solution in action.
                                                                positioned in zenithal view. Inspired by [10], the applica-
    AI-CROWD-DEEP 3 is the video analytic solution for
                                                                tion is designed to work both indoors and outdoors where
people monitoring. Thanks to the combination of a pro-
                                                                it is possible to ensure that the illumination conditions
prietary deep learning based detector, a multi object
                                                                controlled. An example is reported in Figure 1i.
tracker [7] and a calibration mechanism, it is capable
                                                                    AI-PPE 9 is designed to detect people wearing personal
of: (i) estimating the number of people inside an area;
(ii) generating an alarm in case of overcrowding situa-         4
                                                                  https://www.youtube.com/watch?v=U1SwnESua0g
tions or in case of gathering detected; (iii) generating        5
                                                                  https://www.youtube.com/watch?v=3kUUOcofVow
                                                                6
                                                                  https://www.youtube.com/watch?v=gq24PrW6UwQ
1                                                               7
    https://www.youtube.com/watch?v=awze1fHoQEE                   https://www.youtube.com/watch?v=VDQ82Di4fZs
2                                                               8
    https://youtu.be/h0qDXkZkObU?si=Su6gStufv9NbUrK9              https://www.youtube.com/watch?v=x6N5g4Fs6_U
3                                                               9
    https://www.youtube.com/watch?v=BiCyon1KZco                   https://www.youtube.com/watch?v=-fz25HYcFLo
                 (a) AI-BIO                        (b) AI-CROWDCOUNTING                          (c) AI-CROWD-DEEP


              (d) AI-FIREPLUS                          (e) AI-INTRUSION                               (f) AI-LOST


                 (g) AI-LPR                              (h) AI-PARKING                          (i) AI-PEOPLE-DEEP


                 (j) AI-PPE                                (k) AI-RAIL                                (l) AI-SPILL


          (m) AI-TRAFFIC-DEEP                          (n) AI-VIOLATION                            (o) AI-WEATHER
Figure 1: Some examples of A.I. Tech video analytic plugins in action. Fig. 1a AI-BIO: for each person, the rectangle around
the face is shown in pink or in blue, depending on the gender of the person; moreover, the figure shows all the soft-biometric
features extracted by the software: the emotion and the age. Fig. 1b AI-CROWDCOUNTING: for each detected person,
the application draws a red point, showing in real-time the number of people present in the region of interest. Fig. 1c
AI-CROWD-DEEP: the yellow area highlights the region where the analysis is performed. The dotted white-red bounding box
around emphasizes a cluster of people that are not respecting the social distances. Fig. 1d AI-FIREPLUS: in green the area of
interest, while the red box calls attention to the detected flame. In the black grid, the detected smoke is highlighted in red. Fig.
1e AI-INTRUSION: the intrusion area is the red polygon and the multiple crossing lines are the numbered red lines below. In
the example, a person has been detected in the intrusion area. The P at the top left of the bounding box indicates that the
object is a person (rather than V for vehicle). Fig. 1f AI-LOST: the area of interest is the polygon in blue. The red bounding box
with the G string indicates that the detected object is garbage (instead of B for baggage). Fig. 1g AI-LPR: in green we can see
the license plate numbers recognized by the application. Fig. 1h AI-PARKING: the red boxes highlight occupied spots, while
green boxes those that are free. Fig. 1i AI-PEOPLE-DEEP: a red bounding box is drawn when a person crosses the virtual line.
Fig. 1j AI-PPE: for each detected person, the application draws a bounding box and a string indicating the recognized tool (W
for no ppe, WH for only helmet, WV for only vest and WHV for both helmet and vest). Fig. 1k AI-RAIL: a red bounding box is
drawn around detected objects if they are within a restricted area when the barrier blocks the road. Fig. 1l AI-SPILL: in the
scene a red bounding box is drawn around a person fallen within the area of interest. Fig. 1m AI-TRAFFIC-DEEP: the area
of interest where the evaluation is performed is in violet. A three dimensional bounding box is associated to each vehicle,
together with the three dimensions of each object (width, length, height), expressed in meters; the speed (s), expressed in
km/h; the category of the vehicle (Car in the example). Fig. 1n AI-VIOLATION: the status of the traffic light is shown in the
box on the side (green in the example), the area where the analysis is performed is in violet, the application allows to draw the
limit of the stopping line (red line). Fig. 1o AI-WEATHER: A sensor is placed near the road for monitoring, and another sensor
covering the entire image is utilized for classifying weather conditions. After the observation time within the sensors has
passed, the classification outputs are displayed.
protective equipment (PPE). The application is based on        1m.
the architecture described in [11]. The PPE combinations          AI-VIOLATION 13 is a vertical solution able to detect
that the application is able to detect are: "Helmet", "Vest"   traffic light violations (see Fig. 1n), namely the presence
and "Helmet and Vest". This solution can be used both          of vehicles crossing the stopping line while the traffic
in the case of access control system and for the surveil-      light is red. It is based on the above mentioned vehicle
lance of construction sites or places where works are in       detector and a classifier that allows surveillance cameras
progress. In the first case, the use of the product is meant   (which are commonly installed over the city) to read the
to verify that a worker is wearing the specified PPE, in       traffic light status without the need to install external
order to authorize him to enter a work area. In the sec-       devices. The state of a traffic light includes the color of
ond, the product can be used for continuous monitoring         the active traffic light circle and whether it is blinking or
of a work area with the aim of verifying that workers are      not. In particular, the application can identify vehicles
wearing all the PPE required. An example of the product        crossing the stop line at the traffic light while the traffic
is reported in Figure 1j.                                      light status is red and send a notification to report the
   AI-RAIL 10 is a video analysis application designed for     violation. This notification contains also information
enhancing railway safety. It combines traditional com-         about the vehicle, such as the type (between motorcycle,
puter vision techniques along with deep neural networks        bicycle, car, truck), the estimated average speed and all
to identify and analyze the behavior of vehicles, pedes-       the information that are necessary to decide whether
trians, and obstacles within sensitive areas such as level     there are legal limits for a fine.
crossings area or along railway lines. The analysis can           AI-WEATHER 14 is an innovative application that uses
be activated depending on the barrier status, which can        deep neural networks to monitor weather and road con-
be obtained by either an external signal or through neu-       ditions. This app can recognize a wide range of weather
ral networks integrated into the system. An example is         states, including sunny, cloudy, rainy, snowy and foggy,
shown in Figure 1k.                                            as well as road surface conditions, which can vary be-
   AI-SPILL 11 is designed to monitor a person walking in      tween dry, non-dry and flooding. This application is
an unsupervised area and detect if the person falls, rais-     designed to operate effectively in outdoor environments
ing an alarm if that happens. The analysis is performed        and requires visibility of both the road surface and the
using a mathematical model that allows to analyse the          sky at the same time (see Fig. 1o). AI-Weather offers a
behavior of a person moving in the scenario of interest,       variety of useful alerts to users, including sending peri-
especially walking and falling dynamics. An advanced           odic updates on weather and road conditions, as well as
neural network, trained with thousands of fallen people        instant notifications when the status of one of the sensors
samples and optimized for running on board the camera,         changes.
is then used to confirm the initial outcome of that model.
An example is reported in Figure 1l.
   AI-TRAFFIC-DEEP 12 is the video analysis solution           References
for road monitoring for both statistical and alarmist pur-
                                                                [1] V. Carletti, A. Greco, A. Saggese, M. Vento, An
poses. Technically speaking, the application is based on
                                                                    effective real time gender recognition system for
a deep learning based vehicle and people detector [12]
                                                                    smart cameras, J. Ambient Intell. Humaniz. Comput.
followed by a multi-object tracking module [7] and an
                                                                    11 (2020) 2407–2419.
advanced 3D scene reconstruction stage. It is capable
                                                                [2] A. Greco, A. Saggese, M. Vento, V. Vigilante, A
of: (i) counting and classifying vehicles among cars, mo-
                                                                    convolutional neural network for gender recog-
torcycles and trucks; (ii) estimating the average speed
                                                                    nition optimizing the accuracy/speed tradeoff,
and the color of each detected vehicle; (iii) evaluating the
                                                                    IEEE Access 8 (2020) 130771–130781. doi:10.1109/
density of vehicles on a road branch and raise an alarm if
                                                                    ACCESS.2020.3008793.
congestion is detected; (iv) detecting vehicles travelling
                                                                [3] A. Greco, A. Saggese, M. Vento, V. Vigilante, Gender
in the wrong direction or that stopped in some forbidden
                                                                    recognition in the wild: a robustness evaluation
areas; (v) detecting the presence of pedestrians on the
                                                                    over corrupted images 12 (2021).
road; (vi) counting the number of vehicles and pedestri-
                                                                [4] A. Greco, A. Saggese, M. Vento, V. Vigilante, Ef-
ans crossing one area and arriving in another, building
                                                                    fective training of convolutional neural networks
the origin-destination matrix; (vii) detecting lane changes
                                                                    for age estimation based on knowledge distillation,
and abnormal maneuvers (such as U-turns in prohibited
                                                                    Neural Comput. Appl. (2021).
areas) made by vehicles, based on crossing a set of user-
                                                                [5] A. Greco, A. Saggese, M. Vento, Digital signage by
configured virtual lines. An example is reported in Figure
                                                                    real-time gender recognition from face images, in:
10
     https://youtu.be/cDh1epks3x0?si=TCZlm8QJOG_FJ6bk
11                                                             13
     https://www.youtube.com/watch?v=pCFBnWC8uPQ                    https://www.youtube.com/watch?v=gAVEHPCckbE
12                                                             14
     https://www.youtube.com/watch?v=6yQS6n_nTcI                    https://www.youtube.com/watch?v=_gn-odtuWJo
     2020 IEEE International Workshop on Metrology
     for Industry 4.0 IoT, 2020, pp. 309–313.
 [6] L. Fotia, G. Percannella, A. Saggese, M. Vento,
     Highly crowd detection and counting based on
     curriculum learning, in: International Confer-
     ence on Computer Analysis of Images and Patterns,
     Springer, 2023, pp. 13–22.
 [7] P. Foggia, G. Percannella, A. Saggese, M. Vento,
     Real-time tracking of single people and groups si-
     multaneously by contextual graph-based reasoning
     dealing complex occlusions, in: 2013 IEEE Inter-
     national Workshop on Performance Evaluation of
     Tracking and Surveillance (PETS), IEEE, 2013.
 [8] P. Foggia, A. Saggese, M. Vento, Real-time fire
     detection for video-surveillance applications using
     a combination of experts based on color, shape, and
     motion, IEEE Transactions on Circuits and Systems
     for Video Technology 25 (2015) 1545–1556. doi:10.
     1109/TCSVT.2015.2392531.
 [9] P. Foggia, A. Greco, A. Saggese, M. Vento, A method
     for detecting long term left baggage based on heat
     map., in: VISAPP (2), 2015, pp. 385–391.
[10] A. Greco, A. Saggese, B. Vento, A robust and effi-
     cient overhead people counting system for retail
     applications, in: International Conference on Im-
     age Analysis and Processing, Springer, 2022, pp.
     139–150.
[11] A. Greco, S. Saldutti, B. Vento, Fast and effective de-
     tection of personal protective equipment on smart
     cameras, in: International Conference on Pattern
     Recognition, Springer, 2022, pp. 95–108.
[12] A. Greco, A. Saggese, M. Vento, V. Vigilante, Vehi-
     cles detection for smart roads applications on board
     of smart cameras: A comparative analysis, IEEE
     Trans. Intell. Transp. Syst. (2021) 1–13.

</pre>