=Paper= {{Paper |id=Vol-2698/paper14 |storemode=property |title=Visual and Radar Sensor Fusion for Perimeter Protection and Homeland Security on Edge |pdfUrl=https://ceur-ws.org/Vol-2698/p14.pdf |volume=Vol-2698 |authors=Danny Buchman,Michail Drozdov,Aušra Mackuṫe-Varoneckiene,Tomas Krilavičius |dblpUrl=https://dblp.org/rec/conf/ivus/BuchmanDMK20 }} ==Visual and Radar Sensor Fusion for Perimeter Protection and Homeland Security on Edge== https://ceur-ws.org/Vol-2698/p14.pdf
Visual and Radar Sensor Fusion for Perimeter Protection
and Homeland Security on Edge
Danny Buchmana,b , Michail Drozdova,c , Aušra Mackuṫe-Varoneckiened,e and
Tomas Krilavičiusd,e
a JVC Sonderus, Vilnius, Lithuania
b Seraphim Optronics Ltd., Yokne’am Illit, Israel
c Geozondas Ltd., Vilnius, Lithuania
d Department of Applied Informatics, Vytautas Magnus University Kaunas, Lithuania
e Baltic Institute of Advanced Technology, Vilnius, Lithuania



                                          Abstract
                                          Today, in the border and perimeter protection, it is very common to use RADAR technology and pan-tilt (PT) cameras to have
                                          terrain dominance. The common solution uses both sources - while most of the threats are detected by radar, the camera
                                          used for inspection of motion, detected by radar. This solution is very dependent on radar performance and not effective for
                                          different scenarios when the radar is not capable to monitor the movement of all targets. Inputs from camera and radar are
                                          used in close integration to increase detection probability and reduce false alarms. In this work two alternative methods of
                                          radar and visual data fusion are proposed, data structures and processing algorithms are defined and results of experimental
                                          validation for both proposed methods are shown.

                                          Keywords
                                          Sensor fusion, Radar, Video motion detection, Perimeter protection, Kalman filter


1. Introduction                                                       fusion are defined in this context, with first covering
                                                                      such important parts of the system like data associa-
Sensor fusion is a large research topic. Its goal is to tion and state updates and second being more modular
combine multiple data sources to receive joined data, and distributed alternative.
which allows to improve processes or calculations, com- Methods based on Kalman family filters [2, 3, 4] are
pared to single-source data usage.                                    common, when dealing with the data-level fusion be-
   Tracking solution using radar as the only source of cause they enable to have process model independent
data suffers from unreliable detection or even absence from observation structure [5] while working with un-
of detection when dealing with mostly tangential tra- certain data. In the case of several sensors, such filters
jectories of observed objects. An attempt is made to allow to incorporate new data into the model as data
lessen this problem by adding a camera as a second gets available [3], [6].
source of data and combining radar tracking with video                  Due to the properties of Kalman filter (KF), it is re-
motion detection (VMD) while keeping a common tar- quired, that state update of the described dynamic pro-
get state for detections from both sources. It is also cess would be linear. It is common practice to use
expected, that fusion can add the benefit of reduced Cartesian coordinates to describe object state when
false detection rate since validation of tracks can be dealing with mostly linear movement. When trying
more reliable using two sources of information redun- to fuse camera and radar data, two issues are quite ap-
dant fusion scheme [1]).                                              parent:
   This work focuses on research related to the practi-
cal application of fusion between radar and video. Two                   1. Both radar and camera are acquiring data in Po-
main methods of fusion, namely data fusion and tracks                       lar coordinates.
                                                                         2. While full 3D Cartesian representation can be
IVUS 2020: Information Society and University Studies, 23 April 2020,       reconstructed from the radar data, it is not true
KTU Santaka Valley, Kaunas, Lithuania                                       for camera without several assumptions on the
" danny.buchman@seraphimas-hls.com (D. Buchman);                            geometry of setup.
michail.drozdov@seraphimas-hls.com (M. Drozdov);
ausra.mackute-varoneckiene@bpti.lt (A. Mackuṫe-Varoneckiene);                                                        The first problem can be solved in several ways. The
tomas.krilavicius@bpti.lt (T. Krilavičius)
                                                                                                                   usual practice is to keep the target state in Cartesian

                                    © 2020 Copyright for this paper by its authors. Use permitted under Creative   coordinates [2], [7] while measuring in Polar and trans-
                                    Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)                                        forming data before the update (converted measure-
ments [8], [9], [10]). The covariance matrix, which
is used in update and estimation, gets biased if trans-
formed directly. There are exist many solutions for
the linearization of space near estimated point to get
proper values for the covariance matrix [3], [11] (Ex-
tended Kalman Filter, Unscented Kalman filter to name
a few). After transformation state can be updated by
normal Kalman filter formulas.
   Second issue is not addressed by these solutions: if
Kalman filter would be used while keeping target state
in Cartesian coordinates, camera would change from
very precise sensor ( azand el angles) to very unprecise
( x, y, z), as distance to object is used in Polar to Carte-
sian transform for any direction and is not directly
measured by camera. There are numerous different
approaches to get at least some estimation of distance
from direct camera measurements:
    1. Use radar detections as a base and map cam-                         Figure 1: Coordinate systems used in experiments. Unit is
       era detections to radar improving angular res-                      pointed in Z direction
       olution [12], [13].
    2. Use homography1 estimation methods for cam-
       era calibration in the lab.                                         filter in Polar coordinates. This isn’t unknown in lit-
    3. Use corner reflector or another strongly reflec-                    erature, as bearing only tracking often uses Modified
       tive object to map precisely radar and camera                       Polar Coordinates (MPC) [8], [22] or Modified Spheri-
       detections into 3D [14], [15].                                      cal Coordinates (MSC) [23].
                                                                               Following issues are explored before finalizing data
    4. Use many assumptions2 on the positioning of
                                                                           model for fusion:
       detections relative to the optical axis (ground is
       straight plane, camera position, and orientation                       1. The impact of the origin of the coordinates – to
       is known, targets are always on the ground, etc.)                         use Cartesian or Polar coordinates in the linear
       [16], [17].                                                               Kalman filter, for the different movement pat-
    5. Use machine learning (ML) techniques, in the                              terns.
       cases when targets are specified (e.g. like detect-                    2. Improvement (if any) of the accuracy of state es-
       ing image size of cars the physical size of which                         timation in Polar coordinates, if the camera is
       is known) [13, 18].                                                       also used in Kalman filter updates.
    6. Use the movement of the camera and additional                          3. Detection of distance to objects from camera mea-
       features (lane lines) to acquire distance [19], [20],                     surements and the effect of the addition of this
       [21].                                                                     result to the Kalman filter measurement model.
For scenarios, arising in perimeter protection or home-                       As a consequence, some of the following sections
land security, method 6) is not applicable or too costly                   (section II, section V and section VI) are split into two
in a sense of performance. Usually, there are no prede-                    main parts - preliminary experiments (simulation) and
fined markers and the system is stationary (no trans-                      methods validation. It should be clear, however, that
lational movement). All other approaches can be ex-                        chronologically all preliminary tests were performed
plored. Ideal solution, however, would be to use cam-                      and results were analyzed before finalizing fusion mod-
era only in its strongest domain to augment informa-                       els and implementing fusion methods.
tion received by radar instead of increasing inaccura-
cies in one dimension while decreasing in others. One
such potential solution is to keep the state of Kalman                     2. Data Description
     1 A geometrical relation between two images of the same planar
                                                                           Coordinates, used in this research are defined as shown
surface, described by the transformation matrix                            in Fig.1
     2 The camera is moving by known pattern, the terrain is flat,

camera position and angle to the surface are known, etc.




                                                                      93
Figure 2: The setup of test trajectories for experiments



2.1. Simulation Data Description                                   5. Update rate
In this part of the research, the definition of custom       Simulation of detections for VMD and radar and sub-
trajectories for any number of targets as well as radar sequent registration without knowledge of ground truth
and video motion detection (VMD) noise models were or noise model is performed.
implemented. The following simulation parameters             Kalman filter state is defined by [𝑥 𝑣𝑥 𝑎𝑥 𝑧 𝑣𝑧 𝑎𝑧 ]T in
can be defined for radar:                                 case of Cartesian coordinates and by [𝑟𝑣𝑟 𝑎𝑟 𝑎𝑧𝑑𝑎𝑧𝑑𝑑𝑎𝑧]T
                                                          in Polar coordinates, where
    1. Range noise standard deviation
    2. Velocity noise standard deviation                      1. 𝑣𝑥 and 𝑣𝑧 are velocities in x and z dimensions
    3. Detection angle noise standard deviation                  respectively,
    4. False detection rate (per simulation area element)     2. 𝑎𝑥 and 𝑎𝑧 are accelerations
    5. Detection view angle                                   3. 𝑣𝑟 - range change rate
    6. Update rate                                            4. 𝑎𝑧 - azimuth angle
                                                              5. 𝑑𝑎𝑧 and 𝑑𝑑𝑎𝑧 - the rate of change for azimuth
   The following simulation parameters can be defined            and the rate of change of 𝑑𝑎𝑧 (angle accelera-
for the camera:                                                  tion)
   1. VMD detection box noise (in pixels)                          Measurements of radar and camera are simulated
   2. Camera resolution                                         using Gaussian noise model. For camera – different
   3. Angular field of view                                     accuracies of VMD were tested ranging from 1 pixel
   4. VMD false positive rate and area                          to 5 pixels.



                                                           94
   Measurements are filtered and the mean square er-            1. centralized architecture,
ror (MSE) is calculated for comparison of estimated             2. decentralized architecture,
positions to actual trajectories.                               3. distributed architecture,
   In Fig.2 test trajectories Trajectory 1 – T1, Trajec-        4. hierarchical architecture.
tory 2 – T2, Trajectory 3 – T3, which were used for
evaluation (camera and radar are at the point (0,0) in         Centralized architecture (data from all sources is pro-
a𝑥 − 𝑧 Cartesian coordinate system) are presented.          cessed in single module) is expected to be theoreti-
                                                            cally optimal in case of proper synchronization of data
                                                            sources and sufficient bandwidth for data transfer. It
2.2. Data for Evaluation of Fusion
                                                            can suffer, however, from lack of distribution of band-
       Methods                                              width and processing in case these resources are lim-
In case of track fusion inputs for fusion module are        ited for given task. Alternatives, solving this issue,
                                                            are decentralized architecture (fusion nodes incorpo-
    1. list of VMD outputs as bounding boxes [𝑥 𝑦 𝑤 ℎ]
                                                            rate raw data in different order and composition) and
       and ID of VMD track
                                                            distributed architecture (fusion nodes receive single sen-
    2. list of radar tracker outputs as [𝑥 𝑧 𝑣𝑥 𝑣𝑧 ] and ID
                                                            sor data and provide features to be fused). In our view,
       of radar track
                                                            decentralized architecture would introduce unneces-
   In case of data fusion inputs are                        sary complexity and implementation difficulty, so dis-
    1. list of bounding boxes [𝑥 𝑦 𝑤 ℎ] received directly tributed architecture is considered as only another op-
       from the "blob" stage of VMD pipeline                tion to centralized architecture.
   2. list of radar targets [𝑟 𝑣𝑟 𝑎𝑧]
   The main difference between sources of data for two 4. Proposed Fusion Methods
approaches is, that in case of data fusion there are many
false detections, which are not filtered by VMD or radar Based on the discussion in the previous section and the
tracker methods respectively.                             results of preliminary evaluations (described in section
                                                          VI), two main fusion approaches are established:

3. Data Fusion Methods                                         1. data fusion
                                                               2. tracks fusion
Overview of different sensor fusion classifications can
                                                             Data fusion is a centralized mixed input (DAI-FEO
be found in [1] for a further reading. Here we employ
                                                          and FEI-FEO) method, accepting raw outputs of radar
terminology, used by Castanedo in that paper.
                                                          and intermediate results of VMD (bounding boxes of
   Possible schemes of the sensor fusion in a given ap-
                                                          detected blobs).
plication are limited to redundant schemes (with other
                                                             Tracks fusion is distributed FEI-FEO method, accept-
possibilities being complementary and cooperative) bas-
                                                          ing features (tracks) from VMD and radar tracker mod-
ed on the relations of sensors used in the system. Both
                                                          ules.
types of sensors are measuring the same state of ob-
                                                             Both should solve the problem defined, but the pros
jects at the same time in the observed area.
                                                          and cons of approaches are different. Fusion of tracks
   Based on the idea of modularity of the system, it is
                                                          is performed after data from both sources is processed
clear, that radar/camera fusion should not be a decision-
                                                          and there is track detected on both sets of data. Then
making module. While detections are performed by
                                                          both tracks are matched (track to track association) to
the fusion module, the final decision is affected by ad-
                                                          better reflect the behavior of the object being tracked.
ditional rule-based filters and the recognition module.
                                                          In the case of one of the sources not returning track
The goal of the fusion module is to minimize the false
                                                          while other is returning track, different policies can
alarm rate (FAR) of the system and provide inputs for
                                                          be used to favor reduced false detection rate over ex-
decision-making modules. It can be said then, that fu-
                                                          tended tracking duration or vice versa. Comparing
sion module approaches can be limited to two (data in
                                                          with data fusion, tracks fusion is easier to debug and
- feature out (DAI-FEO), f eature in - feature out (FEI-
                                                          tune, since separate modules can be tested and tuned
FEO)), contrary to approaches, which provide data or
                                                          faster due to overall reduced complexity. Data fusion
decisions as outputs (data in - data out (DAI-DAO),
                                                          works on a lower level than tracks fusion. It has the
f eature in - decision out (FEI-DEO), decision in - de-
                                                          benefit of incorporating updates from both sources of
cision out (DEI-DEO)).
                                                          data into common state update converging to a true
   Four types of fusion architectures are defined in [1]:



                                                         95
state of the object being tracked faster. The role of           2. The track covariance matrix 𝑃𝑡 . 6x6 matrix or
policies in the case of one of the sources missing de-             4x4 matrix (if accelerations removed), describ-
tections is reduced since both sources of raw data can             ing the amount of variance in data and interstate
be treated similarly (both update Kalman filter state).            dependencies (covariance).
There is no need to tune policies for different cases           3. The track age. Time elapsed from the moment of
(no recent radar detection, no recent camera detection,            track creation by VMD or radar. It is updated if
higher than an average mismatch between sources, etc.),            either VMD or radar detection can be associated
it is handled by common update. The downside of data               with the current state vector.
fusion compared to track fusion is longer debugging             4. The track innovation error. Value, which is com-
and tuning. If conditions of experiments/use cases or              pared to predefined threshold values for track
equipment changes, that could add delivery time over-              validation and removal. Track innovation er-
head.                                                              ror (squared Mahalanobis distance) is calculated
                                                                   from state measurement residual (innovation) and
4.1. Data Fusion                                                   residual (innovation) covariance. It is one of the
                                                                   main criteria for positive detection.
General strategy for data-level fusion can be summa-
                                                                5. Track update timestamp for VMD
rized as follows:
                                                                6. Track update timestamp for radar
   1. Use radar detections as a base for track valida-          7. Object size. It can be calculated if both VMD and
      tion.                                                        radar detected an object.
   2. Any VMD detection can create a persistent track,          8. Object visual distance state. 2x1 vector (distance,
      but it will be validated without radar data only             rate of distance). It is estimated from the camera
      after a relatively long track age is reached.                and can be used if the positioning of the device
   3. Any track with both recent VMD and Radar de-                 is known. It is highly unstable due to partial ob-
      tections has a higher probability to be validated            structions and because of that is separated from
      as a real track, as a consequence, it is validated           the track state vector. It can be used by higher-
      at the lower age of the track.                               level decision-making modules.
   Such choices are proposed due to relative ease of            9. Object visual distance covariance matrix. 2x2
radar data validation - if the movement of a potential             matrix.
target in the area under test contradicts Doppler ve-          10. The total duration of detection for VMD. It is one
locity, reported by radar, such target can be quickly              of the main criteria for positive detection.
invalidated. There is no similar process for VMD.              11. The total duration of detection for radar. It is
   Full track state representation consists of the fol-            one of the main criteria for positive detection.
lowing parts:                                                The fusion scheme consists of data association, state
   1. Track state vector at time 𝑡 (based on definitions  update,  and management of tracks. Data association
      in Section II). 6x1 vector:                         (point  to track) in the first prototype is implemented
                                                          as simple Nearest Neighbour (NN) estimation in state
                                ⎡𝑥 ⎤                      space, favoring simplicity and speed. Joint Probabil-
                                ⎢𝑣𝑥 ⎥                     ity Data Association (JPDA) based association method
                                ⎢ ⎥
                                  𝑎                       [24] is used as an alternative in later versions. In NN
                           𝑋𝑡 = ⎢ 𝑥 ⎥ ,               (1)
                                ⎢𝑧⎥                       based data association each measurement is compared
                                ⎢ 𝑣𝑧 ⎥                    against the estimated state 𝑋𝑡|𝑡−1 at time 𝑡 by calculat-
                                ⎢ ⎥
                                  𝑎
                                ⎣ ⎦ 𝑧                     ing Mahalanobis distance based metric (discussed fur-
                                                          ther). The prefiltering of potential associations can be
      or its counterpart in Polar coordinates. In case performed using some simple heuristics like 2D dis-
      of constant velocity model (4x1 vector):            tance.
                                                             Linear KF is used for state estimation and update.
                                ⎡𝑥 ⎤                      Estimation step:
                                ⎢𝑣 ⎥
                           𝑋𝑡 = ⎢ 𝑥 ⎥ .               (2)
                                ⎢𝑧⎥                                           𝑋𝑡|𝑡−1 = 𝐴𝑋𝑡−1|𝑡−1 ,              (3)
                                ⎣ 𝑣𝑧 ⎦
                                                          where A is process matrix defined for a state with ac-




                                                          96
celerations as:
             ⎡1       𝑑𝑡   𝑑𝑡 2   0 0          0 ⎤                                       ⎡1   0    0    0⎤
             ⎢0       1    𝑑𝑡     0 0          0 ⎥                                   𝐻 = ⎢0   1    0    0⎥ ,            (10)
             ⎢                                    ⎥                                      ⎢               ⎥
           𝐴=⎢
              0       0     1     0 0          0 ⎥
                                                    ,   (4)                              ⎣0   0    1    0⎦
                                                2
             ⎢0       0     0     1 𝑑𝑡        𝑑𝑡 ⎥
             ⎢0       0     0     0 1         𝑑𝑡 ⎥                 which simplifies calculations. In the case of camera
             ⎢                                    ⎥                (VMD) update without the knowledge of the geomet-
             ⎣0       0     0     0 0          1 ⎦
                                                                   rical setup:
for state without acceleration as:
                                                                                         𝑍𝑡 = [𝑥𝑝 ] .                   (11)
                     ⎡1 𝑑𝑡        0 0⎤
                     ⎢0 1         0 0⎥                             here 𝑥𝑝 is just x position of the center of the bounding
                   𝐴=⎢                    ,             (5)
                                  1 𝑑𝑡 ⎥⎥                          box in screen coordinates. The measurement matrix
                     ⎢0 0
                     ⎣0 0         0 1⎦                             then

here 𝑋𝑡−1|𝑡−1 means posterior state from the previous                           𝐻 = [0    0    𝑤
                                                                                              HFOV      0      𝑤
                                                                                                               2 ],     (12)
update, 𝑑𝑡 is time passed from the last update. Further,
all the calculations will be described for state without           and the state vector needs to be augmented with addi-
accelerations to shorten notations, state vector dimen-            tional element 1 as the last row to allow matrix multi-
sionality 𝑚 will be used to describe sizes of matrices.            plication in (7). In the above formula, 𝑤 is the width of
The predicted covariance matrix is calculated as                   the video in pixels. The single element matrix 𝑌𝑡 can
                                                                   also be calculated by just using 𝑎𝑧 element of the state
                  𝑃𝑡|𝑡−1 = 𝐴𝑃𝑡−1|𝑡−1 𝐴T + 𝑄,            (6)        vector as:

where 𝑄 is a square 𝑚x𝑚 matrix defining process noise.                                        𝑎𝑧 𝑤 + 𝑤
                                                                                  𝑌𝑡 = [𝑥𝑝 − HFOV    2 ].               (13)
   The next step of processing is mixed - state update
and data association step. Innovation is calculated for              Next, innovation covariance is calculated
each pair of track and a prefiltered measurement to
later allow optimal NN data association:                                            𝑆𝑡 = 𝐻𝑡 𝑃𝑡|𝑡−1 𝐻𝑡T + 𝑅,             (14)

                                                        here 𝑅 is 𝑘x𝑘 diagonal measurement error matrix. It is
                     𝑌𝑡 = 𝑍𝑡 − 𝐻 𝑋𝑡|𝑡−1 ,               (7)
                                                        defined based on the parameters of sensors used. Er-
here 𝐻 is 𝑘x𝑚 measurement matrix, 𝑍𝑡 is the measure- ror ranges from the datasheet of the sensor is a good
ment vector of 𝑘x1 size. The size of the measurement starting point. Kalman gain
vector and the measurement matrix 𝑘 depend on the
type of sensor and coordinate systems used for mea-                         𝐾𝑡 = 𝑃𝑡|𝑡−1 − 𝐻 T 𝑆𝑡−1 ,             (15)
surements and the state. 𝑘 can be understood as num-
                                                        here inverse of 𝑆𝑡 is taken. Innovation error, which is
ber of parameters measured by a specific sensor and 𝐻
                                                        not used directly in the Kalman filter update, but is the
as mapping between state and measured parameters.
                                                        main criterion for data association:
If transformation from sensor coordinates to state co-
ordinates is linear, it can be performed by 𝐻 directly.                         𝜖 = 𝑌𝑡T 𝑆𝑡−1 𝑌𝑡 ,                (16)
In our experiment of using Polar representation of tar-
get state space                                            Lastly, if it is found, that there is the best match
                                                        between the track-measurement pair, the Kalman fil-
                                                        ter update step is finished by calculating the posterior
                             ⎡ 𝑟 ⎤                      state and covariance:
                             ⎢𝑣 ⎥
                        𝑋𝑡 = ⎢ 𝑟 ⎥ .                (8)
                             ⎢ 𝑎𝑧 ⎥                                         𝑋𝑡|𝑡 = 𝑋𝑡|𝑡−1 + 𝐾𝑡 𝑌𝑡 ,              (17)
                             ⎣𝑑𝑎𝑧 ⎦
                                                                           𝑃𝑡|𝑡 = (𝐼 − 𝐾𝑡 𝐻 )𝑃𝑡|𝑡−1 ,            (18)
   The data vector of radar and measurement matrix:
                                                        here 𝐼 is 𝑚x𝑚 identity matrix. The best match be-
                              ⎡𝑟 ⎤                      tween track and measurement is found by comparing
                        𝑍𝑡 = ⎢𝑣𝑟 ⎥ ,                (9) 𝜖 for each prefiltered pair. It is possible to fail to find a
                              ⎢ ⎥
                              ⎣𝑎𝑧 ⎦                     matching pair if all 𝜖 are larger than some predefined



                                                              97
threshold 𝜏 . In such a case, a new track state for the             /* merging of existing tracks                  */
track without measurement:                                          for each track t in memory do
                                                                        for each track t2 in memory except t do
                       𝑋𝑡|𝑡 = 𝑋𝑡|𝑡−1 ,                  (19)                apply (21);
                       𝑃𝑡|𝑡 = 𝑃𝑡|𝑡−1 .                  (20)                if (21) condition met then
                                                                                merge t and t2;
Track merging is performed if (similarly to (16)):                              remove t2 from memory;
                                                                            end
   𝐷 = (𝑋1𝑡 − 𝑋2𝑡 )T 𝑃match
                      −1
                            (𝑋1𝑡 − 𝑋2𝑡 )) < 𝐷thresh ,   (21)
                                                                        end
where 𝑋1𝑡 and 𝑋2𝑡 are states of two tracks to be com-               end
pared, 𝑃match is a diagonal matrix, can be treated as               /* updating of tracks by
typical variances of track state elements (with large                  measurement                                 */
values of 𝑃match more tracks will be matched), 𝐷thresh              if have a new measurement of any source then
is a threshold for matching.                                            for each track t in memory do
   With the main parts of the data association and track                    estimate prior state (3) and covariance
update discussed, global view of data fusion can be de-                      (6) prefilter list of possible
fined (see Algorithm 1).                                                     measurements;
   During the step of managing unmatched tracks, in-                        for each measurement with eps small
novation error of track (normally calculated as (16)),                       enough do
is increased. In current implementation following em-                           if measurement was already used
pirically found formula is used:                                                 then
                                                                                    check for more precise updates
                           √
           𝜖 = 𝜖 ∗ max(1 + 𝑑𝑡/3, 1 + 6𝑑𝑡/𝑇tr ),     (22)                              in earlier tracks;
                                                                                    if no better found then
here 𝑇tr is the age of the track. The idea is to increase                                update Kalman state (17),
𝜖 for new tracks faster, that older tracks, as the track                                  (18);
age usually shows, how reliable the current track is.                                    push updated state to stack
   The age of track is increased if there is VMD or Radar                                 of potential updates;
detection. If detection was from previous measure-                                       mark measurement as used;
ment (contrary to measurements, which came from                                     end
both sources on the same processing step) it is increased                       else
by the time difference between current and previous                                 update Kalman state (17), (18);
measurements. If track is picked up after an absence                                push updated state to stack of
of matching measurements, delta time is added based                                   potential updates;
on the type of update (frame update time for VMD or                                 mark measurement as used;
frame update time for Radar). Tracks without current                            end
matching measurements do not update age value.                              end
   For VMD only state (no recent radar detection) pre-                  end
vious angular movement is used along with size to                       for each track t in memory do
create view space gating for measurement to tracks                          apply the best update from the updated
matching. It is part of measurement prefiltering, men-                       states stack;
tioned earlier. If angular velocity isn’t initialized (the              end
track age is low), the maximum possible rate of move-               end
ment based on typical size and velocity ratio is used to            /* manage unmatched tracks                     */
create view space gate.                                             for each unmatched track do
   For a track, having recent Radar detection or over-                  increase track innovation error;
all high radar detection duration, the exact estimated                  if innovation error reaches threshold then
position is calculated and new detections of VMD and                        remove a track from memory
Radar are projected on common space to use spatial                      end
gating.                                                             end
   Tracks are created/initialized with every moving ob-             /* create potential tracks                     */
ject detection. For radar detection movement condi-                 for each unmatched measurement do
tion is non-zero Doppler velocity by default or can                     if satisfies movement conditions then
                                                                            create a track with an initial state and
                                                                              initial covariance;
                                                               98       end
                                                                    end
                                                                        Algorithm 1: Data fusion algorithm
                                                                  1. Calculate angle to the base of target based on the
                                                                     bottom edge of VMD detection and knowledge
                                                                     of VFOV of the camera as:
                                                                                            ℎVFOV
                                                                                       𝛾=         ,                (23)
                                                                                              𝑛𝑣
                                                                     where ℎ is projection of position, where target
                                                                     touches the ground on camera view, 𝑛𝑣 - vertical
                                                                     resolution (number of pixels along the vertical
                                                                     axis of image).
Figure 3: Distance estimation from camera                         2. Subtract this angle from the angle formed by straight-
                                                                     up direction and camera "looking" direction:

be set as one of the many algorithm parameters. All                                     𝛽′ = 𝛽 − 𝛾 .               (24)
VMD detections treated as moving by definition. Af-               3. Calculate distance, by knowing one side of the
ter the creation track is in a non-validated state. Tracks           triangle (height of camera) and the angle between
are considered to be validated after predefined age de-              this side and hypotenuse:
pending on the following properties:
                                                                                  𝐷 = 𝐻 tan 𝛽 ′ .                 (25)
   1. Existence or absence of recent VMD/Radar de-
      tection. Radar detections (without VMD detec-         There are other ways to estimate the distance to the
      tion) having a higher impact on validation thresh- target from the camera only. For example, if the tar-
      old than another way around.                       get is identified, the relative size of the target on the
   2. Track innovation error.                            image could signal distance. That requires, however,
   3. Total previous duration of detections for radar a feedback loop between the fusion module and the
      and VMD                                            recognition module.
   4. Trajectory type - mostly tangential movement
      with radar only detections should be verified for 4.3. Tracks Fusion
      a longer duration.
   5. Velocity thresholds.                               A general strategy for tracks fusion can be summa-
                                                         rized as follows:
  Tracks to be removed from potential tracks list if:
                                                             1. Both sources can create fusion tracks with an-
   1. Track innovation error grows too large. It is cal-        other component not present until the match will
      culated based on new measurements and also in-            be found at a later stage.
      cremented after the absence of radar detection         2. Input tracks are not verified against Kalman state
      (22).                                                     estimate (no data association), because this step
   2. The track is created by VMD detection and visual-         is already done in the tracker module.
      only innovation error grows to large. It is calcu-     3. VMD and radar tracker tracks can be merged if
      lated from angular measurements only and up-              merging requirements are met. From such mo-
      dated similarly to (22).                                  ment fusion track has both components.
                                                             4. The track can be split, if it is detected, that visual
4.2. Distance Estimation from a Camera                          and radar data diverge too much. Track split-
                                                                ting/merging is performed at each data update
Distance from camera measurements is calculated based           step.
on assumptions, that:
                                                            Additionally to track structure discussed in the pre-
   1. target is not blocked by some other objects        vious section following fields are defined for tracking
   2. height and elevation of the camera are known       state:
   3. detection is of ground-based targets                   1. Fused track ID (different from components).
   The main features used for calculations are presented          2. visualSeparated- boolean value, which shows, that
in Fig.3. Algorithm:                                                 track recently had a visual component but lost it
                                                                     due to diverging of visual and radar.



                                                             99
                                                               /* updating tracks structures                    */
                                                               if have a new radar tracker frame then
                                                                   for each track in frame do
                                                                       if The ID of a track can be found in
                                                                        already existing then
                                                                           append a new measurement;
Figure 4: Time alignment of radar and VMD tracks for                   else
matching                                                                   create a new list of track entries
                                                                            with new ID;
                                                                       end
   3. VMD track ID. With new data updates, the de-                 end
      gree of matching between fused tracks compo-             end
      nents is first checked for stored best previous          if have new frame of VMD tracks then
      matches                                                      same as for radar tracks;
   4. Radar tracker track ID. Same as above.                   end
                                                               /* matching of tracks                            */
   The track fusion algorithm is overviewed in Algo-           for each fused track do
rithm 2. The operation of mismatch calculation, used               calculate mismatch of radar and video;
in algorithm description on many occasions can be ex-              for all non-fused tracks of both types do
plained looking at Fig. 4. First, measurement times-                   calculate mismatch with appropriate
tamps are created for all entries of both types of tracks.              (radar vs. VMD) track;
Then estimations for matching are calculated by inter-                 if a better match found then
polation (or extrapolation, if on edges). Average az-                      assign a new component to fused;
imuth mismatch is used as a matching parameter. The                        release the previous component as
early exit of the matching function is possible if mis-                      non-fused;
match grows to a predefined value.                                     end
   The age of track is updated as per data fusion with             end
each new radar tracker output considered as new radar          end
measurement with time step equal to radar update du-           /* generation of tracks                          */
ration. Track time out is increased, if no measure-            for each combination of radar and VMD track
ments were added to track. This step is the same for            do
component tracks and the fused track. Time out for                 calculate mismatch; if mismatch small
deletion calculation, mentioned in Algorithm 2, is cal-             enough then
culated based on the current number of tracks. It is de-               create a new fused track with matched
fined as 3 s if the number of tracks is less than 𝑁max -                components
the maximum number of tracks. If, on the other hand,               end
the number of tracks is higher, allowed time out re-           end
duces:                                                         /* destruction of tracks                         */
                             2𝑁max − 𝑁cur
                𝑇timeout = 3               .          (26)     calculate time out of track for deletion;
                                𝑁max
                                                               delete all tracks with higher time out than
This assures, that all tracks are cleared if 𝑁cur reaches       allowed;
2𝑁max . If a number of tracks for some reason grows            /* tracks state update                           */
more than 2𝑁max , all tracks are cleared.                      for each fused track do
   Tracks are created/initialized with every moving ob-            if any of the track components received
ject detection from radar and every VMD detection.                  updates then
After creation, VMD track is in non-validated state,                   a full Kalman filter update
but track, created from radar tracker data directly, is            else
in a validated state. Fused track having both compo-                   update the state as an estimation only
nents can be split, if VMD measurements diverge from                    (17),(18)
tracker output too much. visualSeparated is set to true            end
in this case. It is done to prevent the reacquisition          end
of the same VMD track with a high mismatch factor.
                                                                Algorithm 2: The track fusion algorithm
VMD only part of such split inherits range data and all



                                                         100
Figure 5: Area of testing for fusion evaluation. Position of equipment marked as 0 distance. Distance from equipment to
one of the points of trajectory is shown



data, which is relevant to durations of detections and      data acquisition was performed simultaneously. Time
age of the track. It can be matched again later after vi-   synchronization was assured by knowing the starting
sualSeparated expires. The split tracks get invalidated     time of video and frame rate and storing radar raw data
for some short duration (less than second) by setting       or radar tracks with exact timestamps. Although the
it’s innovation error parameter to some high value and      discussed algorithms were not running at the time of
gradually reducing it after new measurements.               these measurements, close to real-time performance is
                                                            achieved later using the same hardware.

5. Experimental Setup
                                                            6. Results of Experimental
The area, selected for experiments is shown in Fig. 5.
The selected area allows the testing performance of          Investigation
tracking on big enough distance (more than 100 m)
                                                        Experimental investigation was subdivided into two
with parts of trajectories being almost exactly tangen-
                                                        stages. First, preliminary experiments were performed
tial while moving around the edge of the stadium. A
                                                        to select the state model and update strategy for fused
small amount of moving background (cars, people, trees)
                                                        tracks. During the second stage, the tracks fusion and
allows more control over experiments.
                                                        the data fusion approaches were validated.
   Three experiments are performed:
    1. A person moving clockwise around the edge of
       the stadium without stopping                     6.1. Comparison of Cartesian
    2. A person moving counter-clockwise around the            Coordinates and Polar Coordinates
       edge of the stadium, stopping, then proceeding          for KF State Representation
    3. A person moving clockwise around the edge of
                                                        The impact of the selected system of coordinates to
       the stadium, then changing the moving pattern
                                                        performance is presented in (a) - (c) pictures in Fig.
       from mostly tangential to radial
                                                        7. KF state representation by Cartesian or Polar coor-
   An example frame of video with detection displayed dinates produces very close results and a visual sep-
is presented in Fig. 6. Data were acquired using the aration of results is hardly noticeable. In Table 1, the
NXP iMX6 SoM based embedded system with a quad- results of models, in which Polar or Cartesian coor-
core 1.2 GHz processor. Video recording and radar



                                                         101
Figure 6: Example frame of first sequence



Table 1                                                                 Table 2
Comparison of performance of models in which Polar or                   Comparison of performance of models with/without adding
Cartesian coordinates ar used for KF state representation               camera to state using MSE
using MSE                                                                   Trajectory   Measured error   KF error Polar   Fusion error
  Trajectory   Measured error   KF error Cartesian   KF error Polar             T1          2.4647           0.87358         0.72844
      T1          2.4647             0.85995            0.87358                 T2          2.2787           0.58846         0.51873
      T2          2.2787             0.62444            0.58846                 T3          2.7926           0.63428         0.58261
      T3          2.7926             0.60758            0.63428


                                                                        6.3. Accuracy of Filtering with Distance
dinates are used for KF state representation perfor-
mances, are presented. Since results are very similar,
                                                                             Calculation from Camera Data
it can be concluded, that there is no significant differ-               Results of fusion error and fusion with distance mod-
ence in different KF state representations. To obtain                   els performance evaluation using MSE statistics are
error for each model, 10 simulations were performed                     presented in Table 3. The best minimum values as
for each and mean MSE calculated.                                       well as mean values of MSE shows, that fusion with
                                                                        distance evaluation model performance outperforms
6.2. Accuracy of Filtering with Camera                                  model without distance evaluation performance.
                                                                           An example of a typical simulation run with differ-
     Data added
                                                                        ent models evaluated is shown in Fig. 8
The resulting performances of models with/without
adding camera to state update, represented through 6.4. Fusion Methods Evaluation
MSE, are shown in Table 2. It can be observed, that KF
state updated using camera output represents ground The main metrics for evaluation of two fusion approaches
truth more accurately than updated by radar data only. were Object count accuracy (OCA) and FAR. OCA is
As before, mean MSE is calculated by running simula-
tion 10 times for each trajectory.



                                                                      102
Figure 7: Comparison of performance of models in which Polar or Cartesian coordinates for KF state representation are
used: (a) – test trajectory T1, (b) – test trajectory T2, (c) – test trajectory T3.


Table 3
Fusion error and fusion with distance estimation error comparison using MSE
                           Trajectory                         MAX       MIN     MEAN      STD
                                        Measured error        3.0179   2.2735   2.6342   0.2677
                                            KF error          0.8749   0.4311   0.6654   0.1456
                           T1
                                          Fusion error        0.8065   0.3300   0.5491   0.1471
                                     Fusion with DIST error   0.8438   0.2631   0.5346   0.1720
                                        Measured error        2.8525   1.9410   2.4447   0.3272
                                            KF error          1.1779   0.4676   0.7667   0.2336
                           T2
                                          Fusion error        0.9825   0.3526   0.5915   0.2170
                                     Fusion with DIST error   0.9841   0.3333   0.5543   0.2203
                                        Measured error        3.0052   1.5744   2.3672   0.4143
                                            KF error          0.9347   0.3607   0.6654   0.1919
                           T3
                                          Fusion error        0.7708   0.2087   0.5219   0.1843
                                     Fusion with DIST error   0.6282   0.1687   0.4243   0.1536


defined as                                                    where 𝑃𝑡𝐺 and 𝑃𝑡𝐷 are sets of ground truth points and
                                                              detected points in measurement frame 𝑡 respectively,
                                   min(𝑀𝑡𝐺 , 𝑀𝑡𝐷 )
                                                         (27) 𝑀𝑡 and 𝑀𝑡 are quantities of ground truth and de-
                                                                𝐺         𝐷
             OCA𝑡 (𝑃𝑡𝐺 , 𝑃𝑡𝐷 ) =                     ,
                                      𝑀𝑡𝐺 +𝑀𝑡𝐷                tected instances respectively. Overall OCA is defined
                                          2
                                                              as the average OCA of all frames of measurements.



                                                              103
Figure 8: Typical results of simulation run: (a) – measured positions (T3 trajectory), (b) position estimation squared error



Table 4                                                                2. While radar only tracker performs without false
Data fusion and tracks fusion evaluation results                          alarms during the first two sequences, it is demon-
               Radar tracker     Data fusion   Tracks fusion              strated with the third sequence, that target di-
  Sequences
              mean
                        FAR
                                mean
                                         FAR
                                               mean
                                                        FAR               rection changes can cause false tracks to appear.
               OCA               OCA            OCA
                                                                       3. Two issues, mentioned above, can be solved with
     1st      0.5034      0     0.9935     0   0.9839     0
     2nd      0.2771      0     0.8909     0   0.8832     0               any of two fusion of radar and camera approaches,
     3rd      0.907    0.0135   0.9800     0   0.9360     0               as it is seen from evaluation results. OCA in-
                                                                          creased drastically in both cases compared to radar
                                                                          only tracking.
FAR is defined as number of frames with false tracks,                  4. Data fusion offers slightly better performance,
divided by total observed frames:                                         reflected by higher OCA values. In practice, it
                                𝑁𝜏                                        means faster track validation and more robust
                        FAR =      .                    (28)              tracking with missed detections from either VMD
                                𝑁
                                                                          or radar.
Any false track appearing in the frame constitutes to                  5. The addition of distance measurements from the
given frame becoming a false positive.                                    camera didn’t prove to be stable method for tracks
   The focus of the experimental investigation was on                     matching or state updates in practice. Although
elimination of false detections and reduction of missed                   simulation was suggesting accuracy improvement,
detections rate. Progress towards both goals can be                       real measurements were highly unstable while
successfully monitored using selected metrics [25, 26].                   using this approach.
Rather conservative policies for tracks validation were
chosen for both versions of fusion to highlight the pos-
sibility to avoid false alarms while still keeping high 8. Acknowledgments
enough detection rate (indirectly shown by OCA) for
all practical purposes. Evaluation results are presented Research is partially funded by Lithuanian Research
in Table 4. Best results are obtained by Data fusion and Council Project Nr. 09.3.3-ESFA-V-711-01-0001, and par-
close to the best results are obtained by Tracks fusion. tially funded by Lithuanian Business Support Agency
                                                         Project Nr. 01.2.1-LVPA-T-848-01-0002.

7. Conclusions
                                                                 References
   1. It was observed experimentally, that radar only
      tracking suffers from many missed detections, if               [1] F. Castanedo, A review of data fusion techniques,
      the target trajectory is close to tangential.                      The Scientific World Journal 2013 (2013).




                                                               104
 [2] C. Napoli, E. Tramontana, M. Wozniak, En-                 tomi, Obstacle detection using millimeter-wave
     hancing environmental surveillance against or-            radar and its visualization on image sequence,
     ganised crime with radial basis neural networks,          in: Proceedings of the 17th International Confer-
     in: 2015 IEEE Symposium Series on Computa-                ence on Pattern Recognition, 2004. ICPR 2004.,
     tional Intelligence, IEEE, 2015, pp. 1476–1483.           volume 3, IEEE, 2004, pp. 342–345.
 [3] Y. B. Shalom, Multitarget-multisensor tracking:      [16] G. P. Stein, O. Mano, A. Shashua, Vision-based
     advanced applications, Artech House, Boston,              acc with a single camera: bounds on range and
     MA (1990).                                                range rate accuracy, in: IEEE IV2003 Intelli-
 [4] Y. Bar-Shalom, X.-R. Li, Multitarget-multisensor          gent Vehicles Symposium. Proceedings (Cat. No.
     tracking: principles and techniques, volume 19,           03TH8683), IEEE, 2003, pp. 120–125.
     YBs Storrs, CT, 1995.                                [17] F. Liu, J. Sparbert, C. Stiller, Immpda vehicle
 [5] D. Willner, C. Chang, K. Dunn, Kalman filter              tracking system using asynchronous sensor fu-
     algorithms for a multi-sensor system, in: 1976            sion of radar and vision, in: 2008 IEEE Intelligent
     IEEE Conference on Decision and Control in-               Vehicles Symposium, IEEE, 2008, pp. 168–173.
     cluding the 15th Symposium on Adaptive Pro-          [18] F. Beritelli, G. Capizzi, G. L. Sciuto, C. Napoli,
     cesses, IEEE, 1976, pp. 570–574.                          F. Scaglione, Rainfall estimation based on the in-
 [6] N. Kaempchen, K. Dietmayer, Data synchroniza-             tensity of the received signal in a lte/4g mobile
     tion strategies for multi-sensor fusion, in: Pro-         terminal by using a probabilistic neural network,
     ceedings of the IEEE Conference on Intelligent            IEEE Access 6 (2018) 30865–30873.
     Transportation Systems, volume 85, 2003, pp. 1–      [19] A. Sole, et al., Solid or not solid: Vision for radar
     9.                                                        target validation, in: IEEE Intelligent Vehicles
 [7] E. A. Wan, R. Van Der Merwe, S. Haykin, The un-           Symposium, 2004, IEEE, 2004, pp. 819–824.
     scented kalman filter, Kalman filtering and neu-     [20] C. Kreucher, S. Lakshmanan, K. Kluge, A driver
     ral networks 5 (2001) 221–280.                            warning system based on the lois lane detec-
 [8] D. Laneuville, C. Jauffret, Recursive bearings-           tion algorithm, in: Proceedings of IEEE inter-
     only tma via unscented kalman filter: Cartesian           national conference on intelligent vehicles, vol-
     vs. modified polar coordinates, in: 2008 IEEE             ume 1, Stuttgart, Germany, 1998, pp. 17–22.
     Aerospace Conference, IEEE, 2008, pp. 1–11.          [21] R. Deriche, O. Faugeras, Tracking line segments,
 [9] J. Lian-Meng, P. Quan, F. Xiao-Xue, Y. Feng, A            Image and vision computing 8 (1990) 261–270.
     robust converted measurement kalman filter for       [22] S. D. Gupta, J. Y. Yu, M. Mallick, M. Coates,
     target tracking, in: Proceedings of the 31st Chi-         M. Morelande, Comparison of angle-only filter-
     nese Control Conference, IEEE, 2012, pp. 3754–            ing algorithms in 3d using ekf, ukf, pf, pff, and
     3758.                                                     ensemble kf, in: 2015 18th International Confer-
[10] S. V. Bordonaro, Converted measurement track-             ence on Information Fusion (Fusion), IEEE, 2015,
     ers for systems with nonlinear measurement                pp. 1649–1656.
     functions, Ph.D. thesis, The school of the thesis,   [23] D. V. Stallard, Angle-only tracking filter in mod-
     Doctoral Dissertation, 2015.                              ified spherical coordinates, Journal of Guidance,
[11] S. Blackman, R. Popoli, Design and analysis of            Control, and Dynamics 14 (1991) 694–696.
     modern tracking systems(book), Norwood, MA:          [24] T. Fortmann, Y. Bar-Shalom, M. Scheffe, Sonar
     Artech House, 1999. (1999).                               tracking of multiple targets using joint proba-
[12] D. Y. Kim, M. Jeon, Data fusion of radar and im-          bilistic data association, IEEE journal of Oceanic
     age measurements for multi-object tracking via            Engineering 8 (1983) 173–184.
     kalman filtering, Information Sciences 278 (2014)    [25] F. Beritelli, G. Capizzi, G. Lo Sciuto, C. Napoli,
     641–652.                                                  M. Woźniak, A novel training method to preserve
[13] X. Wu, J. Ren, Y. Wu, J. Shao, Study on target            generalization of rbpnn classifiers applied to ecg
     tracking based on vision and radar sensor fusion,         signals diagnosis, Neural Networks 108 (2018)
     in: WCX World Congress Experience. SAE Inter-             331–338.
     national, 2018.                                      [26] G. Capizzi, G. Lo Sciuto, C. Napoli, D. Polap,
[14] G. Alessandretti, A. Broggi, P. Cerri, Vehicle and        M. Wozniak, Small lung nodules detection based
     guard rail detection using radar and vision data          on fuzzy-logic and probabilistic neural network
     fusion, IEEE transactions on intelligent trans-           with bioinspired reinforcement learning, IEEE
     portation systems 8 (2007) 95–105.                        Transactions on Fuzzy Systems 28 (2020) 1178–
[15] S. Sugimoto, H. Tateda, H. Takahashi, M. Oku-             1189.



                                                      105