<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detection of traffic anomalies for a safety system of smart city</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rifkat Minnikhanov</string-name>
          <email>its.center.kzn@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Dagaeva</string-name>
          <email>dagaevam@rambler.ru</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Igor Anikin</string-name>
          <email>anikinigor777@mail.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tikhon Bolshakov</string-name>
          <email>bear_333@mail.ru</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alisa Makhmutova</string-name>
          <email>phd.makhmutova@griat.kai.ru</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kamil Mingulov</string-name>
          <email>kamil_mingulov@mail.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eastern Graphics GmbH</institution>
          ,
          <addr-line>Ilmenau</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kazan National Research Technical, University named after A.N. Tupolev-, KAI</institution>
          ,
          <addr-line>Kazan</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Kazan National Research Technical, University named after A.N. Tupolev-, KAI</institution>
          ,
          <addr-line>Kazan</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Road Safety” State Company</institution>
          ,
          <addr-line>Kazan</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>337</fpage>
      <lpage>342</lpage>
      <abstract>
        <p>-Sustainable development of modern smart city depends on safety of its citizens and efficient management of the resources. Instant response to incidents and abnormal situations provides these objectives. Implementation of such response applications requires intelligent information processing and data analytics. This information can be collected by different ways: with using sensors, navigation systems, users and surveillance cameras. Video monitoring system can be a valuable source of such information. Cameras cover most of the city and could be efficiently used to find anomalies. Video monitoring system requires non-stop viewing, analysis of the current situation and anomalies identification. It could be provided without human intervention by modern applications based on machine learning and computer vision techniques. In this paper, we used both computer vision and machine learning methods for traffic anomalies detection and classification in real time. As a result of this work we suggest the approach for detection of vehicle/pedestrian violation of legal trajectory anomalies, which we tested on real-time video in Kazan city.</p>
      </abstract>
      <kwd-group>
        <kwd>smart city</kwd>
        <kwd>video monitoring systems</kwd>
        <kwd>machine learning</kwd>
        <kwd>object detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>INTRODUCTION</p>
      <p>Smart city concept has been created as the answer to
issues of modern large cities. This concept represents
interconnected systems of information and communication
technologies, which simplifies the management of internal
urban processes and makes the lives of residents more
comfortable and safer. The concept of smart city is based on
the principles of optimization of the transport system,
sustainable consumption of energy and other resources. It
also involves simplification of everyday processes (paying
bills online, quick search for a parking space, etc.), citizen’s
safety, comfort and active participation in urban life.</p>
      <p>
        Optimization of the transport system is the first task for a
smart city developing, because large cities have always
suffered from the problems of developing safe road systems
with efficient traffic flow [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][2]. Non-optimized traffic flow
raises a whole list of economic, social and environmental
issues: traffic delays, jams, and accidents on the road, fuel
consumption and pollution. Therefore, one of the most
important goal in modern smart city is to provide effective
traffic management, which can be done with intelligent
transport system (ITS) and its applications. It requires
continues collection of the actual information about road
situation as well as its constant monitoring. We can collect
relevant data from different sources,
categorized into few groups:
which
can
be
roadway data, which is collected by different IoT
devises (sensors), active or passive in nature [4];
vehicle-based data, collected by technologies, such as
electronic toll tags and radio navigation-satellite
services (global positioning systems (GPS),
GALILEO, GLONASS, etc.), which combined with
cell phone-based Bluetooth and Wi-Fi [5][6];
traveler-based data, voluntarily provided by drivers,
which use mobile communications and applications;
wide area data, which obtained by system networks,
space-based radars or Geographic Information
Systems (GIS).
indirect data from external systems. For example,
emergency management information systems (EMIS)
do not directly store information about the road
environment, but can store data about incidents in the
city. These incidents cause traffic jams, and impede
traffic in the city. Data mining, forecasting and
analysis of the EMIS data can reduce response time,
which will scale down or even prevent traffic jams
[7].
      </p>
      <p>Usage of video surveillance cameras for collection of
roadway information become popular in recent years due to
their coverage of city and efficient assisting in solving the
described above problems. Video processing offers
opportunities to meet the challenges of smart city, despite on
its shortcomings, like dependence on environmental factors
(rain, fog, brightness, etc.) and accuracy loss. First, the
installation of supported tools (cameras, wires etc) can be
performed without any additional work on the roadway, for
example, in comparison with the installation of sensors. The
second advantage is the price of devices and the cost of their
maintenance. Additional information received from the roads
is a third and significant advantage. With growth of
computational abilities in the past few years, the computer
vision and deep learning methods can be applied to detect
characteristics of traffic, which provides efficient automatic
statistical monitoring of the roads [8].</p>
      <p>All described possibilities and advantages lead to the idea
that video surveillance systems may be considered as
solution of ITS tasks for smart city. Despite a strong
Data Science
connection with human during anomalies detection on the
video, the task of objects’ anomaly detection can be fully
automated with the help of artificial intelligence (AI) like
neural network and computer vision algorithms. The main
goal of this paper is to give an overview of existing anomaly
detection approaches in deep learning field in terms of their
applicability for detection the anomalies on the video from
surveillance cameras of the Kazan city.</p>
    </sec>
    <sec id="sec-2">
      <title>ANOMALY DETECTION APPROACHES IN THE</title>
      <p>TRANSPORT ENVIRONMENT</p>
      <p>Incident control and anomalies management implies an
understanding of the state of normal behavior. Any
deviation from the norm will be considered as an anomaly.
The definition depends on several factors: field of activity,
type of processing data and its features, external conditions.
It is important to identify parameters of normal behavior for
different fields of activities (industry, road infrastructure,
security) to understand what could be detected as anomalies.</p>
      <p>In a transport environment normal behavior depends on
various parameters (road condition, participants’ behaviour,
weather). Therefore, it is challenging task to identify what
can be considered an anomaly and what is not.</p>
      <p>
        Surveillance cameras provide visual control of a given
area, which allows to constantly check zone of interest and
identify the different events or changes: law enforcement,
control over abandoned objects, crowd behavior [
        <xref ref-type="bibr" rid="ref8">9, 10</xref>
        ]. The
zone of interest for surveillance cameras will be the road
and the adjacent territory, and in our paper we consider
following anomalies on the road, such as:
 vehicle/pedestrian violates legal trajectory;
 traffic congestions (including traffic flow
deceleration/density increase on a road section).
      </p>
      <sec id="sec-2-1">
        <title>A. Trajectory-based Anomaly Detection</title>
        <p>
          Trajectory formation is complex and diverse task.
However, this area is receiving attention of the computer
vision community for last few years [
          <xref ref-type="bibr" rid="ref9">11</xref>
          ]. Besides transport
area trajectories are used in suspicious activity detection
[
          <xref ref-type="bibr" rid="ref10">12</xref>
          ], sports video analysis [
          <xref ref-type="bibr" rid="ref11">13</xref>
          ], video summarization [
          <xref ref-type="bibr" rid="ref12">14</xref>
          ],
synopsis generation [
          <xref ref-type="bibr" rid="ref13">15</xref>
          ]. Object’s trajectory it is captured
motion changes of moving objects or, simple saying, the
path. For many years researches face with problem of
formation of accurate path. Objects are constantly in motion,
and video capture many slightly different frames in each
second. Processing of each frame of the video could
significantly slow down further analysis. It is also important
to understand that the processing systems on the cameras
themselves have poor computing capabilities. Video
monitoring system developers have a trade-off between
processing speed and accuracy of the result.
        </p>
        <p>There are two possible approaches for trajectory-based
anomaly detection: first approach, often used in video
analysis systems on the cameras themselves, is to define the
normal behaviour or some rules and highlight areas of
interest and track all deviations. The second approach is
based on unsupervised learning, when we give to system the
opportunity to learn on large amount of data, determine the
norms automatically and then detect anomalies. Second
approach is less reliable for critical systems, when the
accuracy and confidence are extremely important, but it can
reveal nontrivial patterns, which have not been described
before. Another valuable advantage of the second approach
is in its scalability and adaptation to constantly changing
conditions (camera position, road conditions), which can
significantly reduce manual work.</p>
        <p>In this paper we focus on the second approach - on
unsupervised learning algorithms for anomaly detection.
Incident respond cannot be provided offline, all steps of
algorithm should be processed in closely to/or in real-time.
Therefore, some trade-off between accuracy and velocity of
processing with privilege to the second parameter should be
in mind.</p>
        <p>First, video should be received and pre-processed.
Usually video pre-processing applies to video frames. At this
stage, noise reduction algorithms, color rendering, and
contrast enhancements can be applied. After this step we can
apply object detection and tracking techniques in order to
extract information about entities in the frame and their
movement. Depending on the field of activity, the video may
contain objects that do not carry useful information (often
they can be attributed to the background – buildings, trees,
etc.). Not interesting for specific problem objects should be
excluded from consideration on this step. On the next step
we must highlight the trajectory of objects of interest. This
step is important and must be done regardless of how the
learning process will be organized (by the rules or with
unsupervised learning). There are examples of visualization
of trajectories and outlier detection on video frames on Fig.
1.</p>
        <p>On the next step represented outliers should be registered
as anomalies, because its trajectory stands out from normal.
Detection of anomalies requires a comparison of the current
trajectory of a moving object with the legitimate one.
Therefore, before we can compare trajectories, we must
identify the rules or patterns, which could be received with
the machine learning technique. It is hard process due to
difficulties in defining boundaries between the normal and
incorrect behaviour of vehicles, especially on different lanes
of the same road. Finally, after rules modelling we can
identify the anomalies.</p>
      </sec>
      <sec id="sec-2-2">
        <title>B. Traffic Congestions Detection</title>
        <p>
          Previously, we mentioned that the normal behaviour must
be identified. At first glance, it might seem that it is easy to
define a congestion. However, according to Downs there is
no universally accepted definition of traffic congestion [
          <xref ref-type="bibr" rid="ref14">16</xref>
          ].
Numerous definitions can be categorized into 3 groups
according to different measured parameters: demand
capacity related, delay-travel time related, cost related [
          <xref ref-type="bibr" rid="ref15">17</xref>
          ].
Depending on a chosen definition of traffic congestion,
different measurement metrics can be used:
 Speed. The average speed on any section of a road
can be used to infer the state of traffic at the present
point of time. This can be done by comparing present
speed of traffic to off-peak period speed.

        </p>
        <p>
          Travel time and delay. Congestion is a travel time or
delay in excess of the normally incurred under light or
Data Science
free-flow travel conditions [
          <xref ref-type="bibr" rid="ref16">18</xref>
          ]. Unacceptable
congestion is travel time or delay longer than
accepted norm. This norm may vary depending on
geographical location, time of the day etc.
dimension reduction, and classifier training play the most
important roles in the performance of object detection.





        </p>
        <p>
          Level of service measures. The level of service (LOS)
has been one of most popular measure of traffic
congestion. It was adopted in 1985 Highway Capacity
Manual [
          <xref ref-type="bibr" rid="ref17">19</xref>
          ]. LOS is subdivided into six classes
ranging from A to F, which are categorized according
vehicle-to-capacity ratio.
        </p>
        <p>In the scope of this work 4 approaches to congestion
detection were proposed:</p>
        <p>Measure time of presence of vehicles in frame and
compare with historical data.</p>
        <p>Measure velocity of vehicles and compare with
historical data.</p>
        <p>Count vehicles using virtual detecting line and
compare number of passing vehicles with traffic
capacity of a road calculated according to regulations
of Russian Federation.</p>
        <p>Measure the movement in a frame and compare with
calibration data.</p>
        <p>In both tasks for anomalies detection - trajectory-based or
traffic congestions - we need to process incoming video
frames and provide image processing for object detection
and tracking.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>III. IMAGE PROCESSING</title>
      <p>At the pre-processing step segmentation could be applied,
to subdivide an image into nonoverlapping - for objects
detection. When process of segmentation is complete,
describing selected regions features have to be found. We
can consider texture, color and shape as example of these
features. Each of the regions, represented on image, can be
described by properties of such features such as length, area
or width of the shape. Segments are classified to meaningful
classes by extracted features. For example, on a road image
classes may be cars, trucks, buses, pedestrians and so on. The
problems of scene segmentation and object classification can
be solved by expert systems, semantic networks and neural
network systems. Segmentation and classification together
make up process, which is called object detection. During
object tracking process we can develop object’s trajectory.</p>
      <sec id="sec-3-1">
        <title>A. Object Detection</title>
        <p>Object detection imply identification of object’s location
in the frame. This is a difficult task, because objects can be
classified in categories such as vehicles and people, and their
appearance in the frame vary. Variations arise not only from
changes in illumination and viewpoint, but also due to
nonrigid deformations and intraclass variability in shape and
other visual properties. For example, people wear different
clothes and take a variety of poses, while cars come in
various shapes and colors. Fig. 2 shows the taxonomy of
object detection approaches in remote sensing image, but
these approaches can be generalized to object detection as a
whole.</p>
        <p>The input of the classifier is a set of regions (sliding
windows or object proposals) with their corresponding
feature representations and the output is their corresponding
predicted labels. Feature extraction, feature fusion,</p>
        <p>
          Traditional techniques from statistical pattern recognition
like the Bayesian discriminant and the Parzen windows were
popular until the beginning of the 1990s. Since then, neural
networks have increasingly been used as an alternative to
classic pattern classifiers and clustering techniques [
          <xref ref-type="bibr" rid="ref18">20</xref>
          ].
Since 2012 after work of Krizhevsky et al. [
          <xref ref-type="bibr" rid="ref18">20</xref>
          ] majority of
state of the art object segmentation, and classification are
performed by Convolutional Neural Networks (CNN). CNN
is successfully used for complex objects detection [
          <xref ref-type="bibr" rid="ref19">21</xref>
          ].
These neural networks are outperforming other architectures
due to data size reduction on convolution and pooling steps,
which allow increasing complexity of neural network. There
are four main operations in CNN: convolution, non-linearity
(ReLU), pooling or sub sampling, and classification
(provided be fully connected layer).
        </p>
        <p>
          CNN for object detection are different in searching of
extracted features on the image. There are search by regions
on the image (R-CNN [
          <xref ref-type="bibr" rid="ref20">22</xref>
          ], Fast R-CNN, [
          <xref ref-type="bibr" rid="ref21">23</xref>
          ] Faster R-CNN
[
          <xref ref-type="bibr" rid="ref22">24</xref>
          ]) Single Shot MultiBox Detector (SSD) [
          <xref ref-type="bibr" rid="ref23">25</xref>
          ] and You
Only Look Once (YOLO) [
          <xref ref-type="bibr" rid="ref24">26</xref>
          ] method which predict
bounding boxes and probabilities for each region. In YOLO
network image is split into    grid. Each of resulting cells
predicts B bounding boxes and confidences. Each cell also
predicts class probabilities. Bounding boxes are combined
with classes.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Object Tracking</title>
        <p>
          Tracking is necessary step for extraction of object’s
trajectory through the video scene [
          <xref ref-type="bibr" rid="ref25 ref26">27, 28</xref>
          ]. Process of
object tracking can be subdivided into three consequent
steps - moving object detection, object classification and
inter-frame tracking. On the first step moving objects are
separated from static background. On second step - object
classification – identified on the previous step moving
objects are assigned to classes based on their features.
Finally, on step of tracking classified objects are identified
on subsequent frames. Neural networks are significantly
simplified the tracking task, producing the first two steps.
Therefore, we closely look on third step - the tracking
algorithms.
        </p>
        <p>
          Modern trackers are not ideal. It is difficult for them to
process frames with such environmental challenging factors
like occlusion (when an object closer to the camera overlaps
the object behind it), background clutters (road has similar
to vehicle color), motion blur (target region is blurred due to
the motion of vehicles or the camera) and etc. Modern
algorithms have to be able partially cover these difficulties.
Data Science
Boosting tracker [
          <xref ref-type="bibr" rid="ref27">29</xref>
          ] is based on an online AdaBoost
algorithm. The initial bounding box of an object is
considered as the positive example of the object, and the rest
is treated as background. Algorithm is old and outperformed
by many modern algorithms. Multiple Instance Learning
(MIL) tracker algorithm [
          <xref ref-type="bibr" rid="ref28">30</xref>
          ] is based on approach similar to
Boosting. The difference here is that this algorithm
generates multiple hypotheses in neighborhood of a center
of object. Together, all these hypotheses with original
bounding box are put into labeled as ‘bag’, each containing
many instances. In case of not well-centered prediction of
main bounding box, there is a high probability that positive
‘bag’ will contain a better prediction. Tracker does not
recover from full occlusion. Kernelized Correlation Filters
(KCF) tracker [
          <xref ref-type="bibr" rid="ref29">31</xref>
          ] is based on ideas of MIL and Boosting.
The fact that positive bag in MIL contain boxes with large
overlap allows to reduce complexity of correlation
calculation from O(n2) to O(n log n). This tracker is both
faster and more accurate than MIL and reports tracking
failure better. It does not recover from full occlusion.
Minimum Output Sum of Squared Error (MOSSE) tracker
[
          <xref ref-type="bibr" rid="ref30">32</xref>
          ] uses adaptive correlation for object tracking, which
produces stable correlation filters when initialized using a
single frame. MOSSE tracker is robust to variations in
lighting, scale, pose, and non-rigid deformations. It also
detects occlusion based upon the peak-to-sidelobe ratio,
which enables the tracker to pause and resume where it left
off when the object reappears. We compared such
qualitative signs as the smoothness of the object tracking,
work with different types of objects, the number of
occlusions. After numerous tests we decided to make our
choice on KCF tracking algorithm. This method not only
tracks objects correctly, but also does not produce so many
errors like other algorithms. Its advantage is that it can work
with images of low quality, resolution and frame rate.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>C. Pattern Extraction</title>
        <p>We need to extract frequent trajectories of road objects
as the pattern, which is expressed in the form of curve.
Since we had the goal to highlight the trajectory of only
road related objects, we removed all objects that are not
related to the class of vehicles and people, as well as all
objects with a recognition probability less than 50%. For the
remaining objects, we normalized their coordinates and pass
them to the trackers. At the same time, we assign a color to
each object for identification. Then, using the multitrack
object, we visualize the trajectory of moving objects in the
following way: the coordinates of the centre of the
previously tracked object are remembered and connected to
the centre of the next tracker position. After all frames have
been processed, we combine them back into a video. Each
object gets its own colour to draw a trajectory. After
receiving all trajectories, we can divide them into categories
by type of the objects and build groups of trajectories and
track anomalies (represented on Fig. 3). During the long
time of training, we get full collection of many similar
approximated trajectories, which must be revised to identify
the norm. Euclidian distance between object paths points
was chosen to do that. If the Euclidian distance is less than
the epsilon neighborhood, then we check whether the object
deviates from this trajectory in further moving.</p>
        <p>If final object’s path is not deviated from the trajectory,
we remove this path from the list of reference trajectories. If
the object’s starting place does not fit any reference path or
final trajectory is too far from the existing ones, then the
trajectory is added to the list for further analysis. The same
method was chosen for anomalies identification.</p>
        <p>а) b)
Fig. 3. Trajectories of objects: a) separately b) on the video frame.</p>
        <p>
          The other unsupervised method for pattern extraction is
based on DBSCAN algorithm [
          <xref ref-type="bibr" rid="ref31">33</xref>
          ], when we combine
objects for the purpose of congestion identification, lane
detection etc. If trajectory comparison method could be used
in real time mode, this method is more suitable for off line
checks. We can see there the formed cluster of vehicles by
their first appearance on Fig. 4 The main advantage of the
DBSCAN algorithm is that it allows to build clusters of
arbitrary shape.
        </p>
        <p>We used data from intersections of Kazan city for
trajectory-based anomaly detection task and data from
Moscow roadside cameras for congestion evaluation.</p>
        <p>First, we provided evaluation of object detection
algorithm. As we mentioned previously, our priority was
focused on the speed and accuracy. After numerous test,
represented in Fig. 5-6, we choose YOLO method, which
provided best results by both parameters.</p>
        <p>Fig. 7-8 shows the automatically detected anomaly of
deviation from the reference trajectory in the form of road
accident.</p>
        <p>For traffic congestion we tried few approaches. To test
performance of trackers a sequence of 600 frames was
prepared. Trackers are initialized with bounding box around
black car. Sequence is terminated if tracker fails to update.
Resolution of the video is 1280x720 pixels. In Table 1
evaluation of the results of the tracker performance test,
mark below 48-60 frames per second was labeled as Slow.
KCF and MOSSE trackers were picked as most applicable
for this task. KCF offers higher accuracy of tracking while
MOSSE offers pure speed.</p>
        <p>Approach, which is based on time of presence was found
to be impractical. This approach requires ideal detection and
tracking algorithms. Neither algorithms of YOLO family nor
any other modern algorithm has detection rate that would
make this approach work. On tracking failures time of
presence cannot be calculated correctly. Another problem of
this approach is connected to performance decrease with
growing number of trackers. Approach, which is based on
Data Science
speed is also impractical. Furthermore, measurement of
distance in pixels produces data incomparable to real
distances. Without manual calibration of a camera it is
impossible to get mapping from camera coordinates into
real-world ones. Vehicle count based approach gives better
result in comparison to previous two. In scenes with
congestion due to high level of occlusion vehicle detection is
unstable as well as tracking. Fig. 9 shows congestion
detection. Unfortunately, this approach requires high level of
user interaction for initialization.</p>
        <p>As a result of this work, the approach for detection
congestions and trajectories-based anomalies was developed.</p>
        <p>The developed approach is able to recognize
independent reference trajectories for certain classes of
objects with unsupervised learning algorithm, and identify
anomalies if the spatial trajectory of the object violates to
them. In the future, the expansion of the type of identified
anomalies, as well as testing the system in real time is
planned. There are various road rules for road lanes in the
city of Kazan. For example, there are special lane for public
transport and large sized vehicles. Drivers of vehicles often
violate the travel ban on these lines and drive along it. We
can also think about creating rules for each lane of the road,
which is a very challenging task from both unsupervised
learning and video processing parts.</p>
        <p>REFERENCES</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wiseman</surname>
          </string-name>
          , “
          <article-title>Real-time monitoring of traffic congestions</article-title>
          ,
          <source>” IEEE International Conferenceon Electro Information Technology</source>
          , pp.
          <fpage>501</fpage>
          -
          <lpage>505</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>K.</given-names>
            <surname>Zavitsas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Kaparias and M.G.H. Bell</surname>
          </string-name>
          , “
          <article-title>Transport problems in cities” [Online]</article-title>
          . URL: https://trimis.ec.europa.eu/sites/default/files/ project/documents/20120402_173932_45110_
          <string-name>
            <surname>D%</surname>
          </string-name>
          <year>201</year>
          .1%
          <fpage>20</fpage>
          -
          <lpage>%</lpage>
          20 Transport%20problems%
          <fpage>20in</fpage>
          %
          <fpage>20cities</fpage>
          %
          <fpage>20</fpage>
          -
          <lpage>%</lpage>
          20v3.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>F.J.</given-names>
            <surname>Kelly</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , “
          <article-title>Transport solutions for cleaner air</article-title>
          ,” Science, vol.
          <volume>352</volume>
          , no.
          <issue>6288</issue>
          , pp.
          <fpage>934</fpage>
          -
          <lpage>936</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>E. D'Andrea</surname>
            and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Marcelloni</surname>
          </string-name>
          , “
          <article-title>Detection of traffic congestion and incidents from GPS trace analysis</article-title>
          ,
          <source>” Expert Syst. Appl.</source>
          , vol.
          <volume>73</volume>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Transp. Eng.</surname>
          </string-name>
          , vol.
          <volume>133</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>165</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Dagaeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garaeva</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Anikin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Makhmutova</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Minnikhanov</surname>
          </string-name>
          , “
          <article-title>Big spatio-temporal data mining for emergency management information systems</article-title>
          ,
          <source>” IET Intelligent Transport Systems</source>
          , vol.
          <volume>13</volume>
          , no.
          <issue>11</issue>
          , pp.
          <fpage>1649</fpage>
          -
          <lpage>1657</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Chung</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Sohn</surname>
          </string-name>
          , “
          <article-title>Image-Based Learning to Measure Traffic Density Using a Deep Convolutional Neural Network,”</article-title>
          <source>IEEE Transactions On Intelligent Transportation Systems</source>
          , vol.
          <volume>19</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>1670</fpage>
          -
          <lpage>1675</lpage>
          , 2018
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Nedzvedz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ye</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ablameyko</surname>
          </string-name>
          , “
          <source>Crowd Abnormal Behaviour Identification Based on Integral Optical Flow in Video Surveillance Systems,” Informatica, Lith. Acad. Sci.</source>
          , vol.
          <volume>29</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>211</fpage>
          -
          <lpage>232</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nedzvedz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Nedzvedz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lv</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ablameyko</surname>
          </string-name>
          , “
          <article-title>Traffic extreme situations detection in video sequences based on integral optical flow,” Computer Optics</article-title>
          , vol.
          <volume>43</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>647</fpage>
          -
          <lpage>652</lpage>
          ,
          <year>2019</year>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179-2019-43-4-
          <fpage>647</fpage>
          -652.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.P.</given-names>
            <surname>Dogra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kar</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.P.</given-names>
            <surname>Roy</surname>
          </string-name>
          , “
          <article-title>Trajectory-Based Surveillance Analysis: A Survey,”</article-title>
          <source>IEEE Trans. Circuits Syst. Video Techn.</source>
          , vol.
          <volume>29</volume>
          , no.
          <issue>7</issue>
          , pp.
          <fpage>1985</fpage>
          -
          <lpage>1997</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Xiang</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Gong</surname>
          </string-name>
          , “
          <article-title>Video behavior profiling for anomaly detection</article-title>
          ,
          <source>” IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>30</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>893</fpage>
          -
          <lpage>908</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rehman</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Saba</surname>
          </string-name>
          , “
          <article-title>Features extraction for soccer video semantic analysis: current achievements and remaining issues</article-title>
          ,
          <source>” Artificial Intelligence Review</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wen</surname>
          </string-name>
          and
          <string-name>
            <given-names>X.</given-names>
            <surname>Ding</surname>
          </string-name>
          , “
          <article-title>Person-based video summarization by tracking and clustering temporal face sequences</article-title>
          ,
          <source>” uS Patent</source>
          <volume>9</volume>
          ,
          <issue>514</issue>
          ,
          <fpage>353</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pritch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rav-Acha</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Peleg</surname>
          </string-name>
          , “
          <article-title>Nonchronological video synopsis and indexing”</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>30</volume>
          , no.
          <issue>11</issue>
          , pp.
          <fpage>1971</fpage>
          -
          <lpage>1984</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Downs</surname>
          </string-name>
          , “
          <article-title>Still stuck in traffic: coping with peak-hour traffic congestion</article-title>
          ,” Washington, D.C: Brookings Institution Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Aftabuzzaman</surname>
          </string-name>
          , “
          <article-title>Measuring traffic congestion: a critical review</article-title>
          ,
          <source>” Australas. Transp. Res. FORUM</source>
          , vol.
          <volume>30</volume>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.S.</given-names>
            <surname>Levinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.J.</given-names>
            <surname>Lomax</surname>
          </string-name>
          , “
          <article-title>Developing a Travel Time Congestion Index,”</article-title>
          <source>Transp. Res. Rec. J. Transp. Res. Board</source>
          , vol.
          <volume>1564</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Egmont-Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ridder</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Handels</surname>
          </string-name>
          , “
          <article-title>Image processing with neural networks - a review,” Pattern Recognit</article-title>
          ., vol.
          <volume>35</volume>
          , no.
          <issue>10</issue>
          , pp.
          <fpage>2279</fpage>
          -
          <lpage>2301</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever and
          <string-name>
            <given-names>G.E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , “
          <article-title>ImageNet Classification with Deep Convolutional Neural Networks</article-title>
          ,
          <source>” Advances in Neural Information Processing Systems</source>
          , vol.
          <volume>25</volume>
          , pp.
          <fpage>1097</fpage>
          -
          <lpage>1105</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.P</given-names>
            <surname>Bohush</surname>
          </string-name>
          , I.Y Zakharava, “
          <article-title>Person tracking algorithm based on convolutional neural network for indoor video surveillance</article-title>
          ,”
          <source>Computer Optics</source>
          , vol.
          <volume>44</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>116</lpage>
          ,
          <year>2020</year>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179-CO-
          <volume>565</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Donahue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darrell</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Malik</surname>
          </string-name>
          , “
          <article-title>Rich Feature Hierarchies for Accurate Object Detection</article-title>
          and Semantic Segmentation,
          <source>” Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>
          , pp.
          <fpage>580</fpage>
          -
          <lpage>587</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          , “
          <string-name>
            <surname>Fast</surname>
            <given-names>R-CNN</given-names>
          </string-name>
          ,
          <source>” Proc. of the IEEE International Conference on Computer Vision</source>
          , pp.
          <fpage>1440</fpage>
          -
          <lpage>1448</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          , “
          <string-name>
            <surname>Faster</surname>
            <given-names>R-CNN</given-names>
          </string-name>
          :
          <article-title>Towards Real-Time Object Detection with Region Proposal Networks</article-title>
          ,
          <source>” IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>39</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>1137</fpage>
          -
          <lpage>1149</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Anguelov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Reed</surname>
          </string-name>
          , “SSD:
          <article-title>Single shot multibox detector</article-title>
          ,
          <source>” European Conf. on Computer Vision. Lecture Notes in Computer Science</source>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>37</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Divvala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          , “You Only Look Once: Unified,
          <string-name>
            <surname>Real-Time Object</surname>
            <given-names>Detection</given-names>
          </string-name>
          ,
          <source>” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>
          , pp.
          <fpage>779</fpage>
          -
          <lpage>788</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pawar</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhulekar</surname>
          </string-name>
          , “
          <article-title>Moving object tracking using optical flow and motion vector estimation</article-title>
          ,” 4th International Conference on Reliability,
          <source>Infocom Appendix (Trends and Future Directions)</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>R. O.</given-names>
            <surname>Chavez-Garcia</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Aycard</surname>
          </string-name>
          , “
          <article-title>Multiple Sensor Fusion and Classification for Moving Object Detection and Tracking,”</article-title>
          <source>IEEE Trans. Intell. Transp. Syst.</source>
          , vol.
          <volume>17</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>525</fpage>
          -
          <lpage>534</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>H.</given-names>
            <surname>Grabner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grabner</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Bischof</surname>
          </string-name>
          , “
          <article-title>Real-time tracking via online boosting</article-title>
          ,
          <source>” Proc. BMVC</source>
          , vol.
          <volume>1</volume>
          , pp.
          <fpage>47</fpage>
          -
          <lpage>56</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>B.</given-names>
            <surname>Babenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.-H.</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Belongie</surname>
          </string-name>
          , “
          <article-title>Visual Tracking with Online Multiple Instance Learning</article-title>
          ,
          <source>” IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>J.F.</given-names>
            <surname>Henriques</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Caseiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martins</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Batista</surname>
          </string-name>
          , “
          <article-title>High-Speed Tracking with Kernelized Correlation Filters,”</article-title>
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>37</volume>
          , no.
          <issue>3</issue>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bolme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.R.</given-names>
            <surname>Beveridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.A.</given-names>
            <surname>Draper</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.M.</given-names>
            <surname>Lui</surname>
          </string-name>
          , “
          <article-title>Visual object tracking using adaptive correlation filters,”</article-title>
          <source>IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>
          , pp.
          <fpage>2544</fpage>
          -
          <lpage>2550</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.P.</given-names>
            <surname>Kriegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sander</surname>
          </string-name>
          and
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          , “
          <article-title>A density-based algorithm for discovering clusters in large spatial databases with noise</article-title>
          ,
          <source>” Proceedings of the Second International Conference on Knowledge Discovery and Data Mining</source>
          , AAAI Press, pp.
          <fpage>226</fpage>
          -
          <lpage>231</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>