=Paper= {{Paper |id=Vol-2698/paper14 |storemode=property |title=Visual and Radar Sensor Fusion for Perimeter Protection and Homeland Security on Edge |pdfUrl=https://ceur-ws.org/Vol-2698/p14.pdf |volume=Vol-2698 |authors=Danny Buchman,Michail Drozdov,Aušra Mackuṫe-Varoneckiene,Tomas Krilavičius |dblpUrl=https://dblp.org/rec/conf/ivus/BuchmanDMK20 }} ==Visual and Radar Sensor Fusion for Perimeter Protection and Homeland Security on Edge== https://ceur-ws.org/Vol-2698/p14.pdf

Visual and Radar Sensor Fusion for Perimeter Protection
and Homeland Security on Edge
Danny Buchmana,b , Michail Drozdova,c , Aušra Mackuṫe-Varoneckiened,e and
Tomas Krilavičiusd,e
a JVC Sonderus, Vilnius, Lithuania
b Seraphim Optronics Ltd., Yokne’am Illit, Israel
c Geozondas Ltd., Vilnius, Lithuania
d Department of Applied Informatics, Vytautas Magnus University Kaunas, Lithuania
e Baltic Institute of Advanced Technology, Vilnius, Lithuania

Abstract
Today, in the border and perimeter protection, it is very common to use RADAR technology and pan-tilt (PT) cameras to have
terrain dominance. The common solution uses both sources - while most of the threats are detected by radar, the camera
used for inspection of motion, detected by radar. This solution is very dependent on radar performance and not effective for
different scenarios when the radar is not capable to monitor the movement of all targets. Inputs from camera and radar are
used in close integration to increase detection probability and reduce false alarms. In this work two alternative methods of
radar and visual data fusion are proposed, data structures and processing algorithms are defined and results of experimental
validation for both proposed methods are shown.

Keywords
Sensor fusion, Radar, Video motion detection, Perimeter protection, Kalman filter

1. Introduction fusion are defined in this context, with first covering
such important parts of the system like data associa-
Sensor fusion is a large research topic. Its goal is to tion and state updates and second being more modular
combine multiple data sources to receive joined data, and distributed alternative.
which allows to improve processes or calculations, com- Methods based on Kalman family filters [2, 3, 4] are
pared to single-source data usage. common, when dealing with the data-level fusion be-
Tracking solution using radar as the only source of cause they enable to have process model independent
data suffers from unreliable detection or even absence from observation structure [5] while working with un-
of detection when dealing with mostly tangential tra- certain data. In the case of several sensors, such filters
jectories of observed objects. An attempt is made to allow to incorporate new data into the model as data
lessen this problem by adding a camera as a second gets available [3], [6].
source of data and combining radar tracking with video Due to the properties of Kalman filter (KF), it is re-
motion detection (VMD) while keeping a common tar- quired, that state update of the described dynamic pro-
get state for detections from both sources. It is also cess would be linear. It is common practice to use
expected, that fusion can add the benefit of reduced Cartesian coordinates to describe object state when
false detection rate since validation of tracks can be dealing with mostly linear movement. When trying
more reliable using two sources of information redun- to fuse camera and radar data, two issues are quite ap-
dant fusion scheme [1]). parent:
This work focuses on research related to the practi-
cal application of fusion between radar and video. Two 1. Both radar and camera are acquiring data in Po-
main methods of fusion, namely data fusion and tracks lar coordinates.
2. While full 3D Cartesian representation can be
IVUS 2020: Information Society and University Studies, 23 April 2020, reconstructed from the radar data, it is not true
KTU Santaka Valley, Kaunas, Lithuania for camera without several assumptions on the
" danny.buchman@seraphimas-hls.com (D. Buchman); geometry of setup.
michail.drozdov@seraphimas-hls.com (M. Drozdov);
ausra.mackute-varoneckiene@bpti.lt (A. Mackuṫe-Varoneckiene); The first problem can be solved in several ways. The
tomas.krilavicius@bpti.lt (T. Krilavičius)
usual practice is to keep the target state in Cartesian

© 2020 Copyright for this paper by its authors. Use permitted under Creative coordinates [2], [7] while measuring in Polar and trans-
Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) forming data before the update (converted measure-
ments [8], [9], [10]). The covariance matrix, which
is used in update and estimation, gets biased if trans-
formed directly. There are exist many solutions for
the linearization of space near estimated point to get
proper values for the covariance matrix [3], [11] (Ex-
tended Kalman Filter, Unscented Kalman filter to name
a few). After transformation state can be updated by
normal Kalman filter formulas.
Second issue is not addressed by these solutions: if
Kalman filter would be used while keeping target state
in Cartesian coordinates, camera would change from
very precise sensor ( azand el angles) to very unprecise
( x, y, z), as distance to object is used in Polar to Carte-
sian transform for any direction and is not directly
measured by camera. There are numerous different
approaches to get at least some estimation of distance
from direct camera measurements:
1. Use radar detections as a base and map cam- Figure 1: Coordinate systems used in experiments. Unit is
era detections to radar improving angular res- pointed in Z direction
olution [12], [13].
2. Use homography1 estimation methods for cam-
era calibration in the lab. filter in Polar coordinates. This isn’t unknown in lit-
3. Use corner reflector or another strongly reflec- erature, as bearing only tracking often uses Modified
tive object to map precisely radar and camera Polar Coordinates (MPC) [8], [22] or Modified Spheri-
detections into 3D [14], [15]. cal Coordinates (MSC) [23].
Following issues are explored before finalizing data
4. Use many assumptions2 on the positioning of
model for fusion:
detections relative to the optical axis (ground is
straight plane, camera position, and orientation 1. The impact of the origin of the coordinates – to
is known, targets are always on the ground, etc.) use Cartesian or Polar coordinates in the linear
[16], [17]. Kalman filter, for the different movement pat-
5. Use machine learning (ML) techniques, in the terns.
cases when targets are specified (e.g. like detect- 2. Improvement (if any) of the accuracy of state es-
ing image size of cars the physical size of which timation in Polar coordinates, if the camera is
is known) [13, 18]. also used in Kalman filter updates.
6. Use the movement of the camera and additional 3. Detection of distance to objects from camera mea-
features (lane lines) to acquire distance [19], [20], surements and the effect of the addition of this
[21]. result to the Kalman filter measurement model.
For scenarios, arising in perimeter protection or home- As a consequence, some of the following sections
land security, method 6) is not applicable or too costly (section II, section V and section VI) are split into two
in a sense of performance. Usually, there are no prede- main parts - preliminary experiments (simulation) and
fined markers and the system is stationary (no trans- methods validation. It should be clear, however, that
lational movement). All other approaches can be ex- chronologically all preliminary tests were performed
plored. Ideal solution, however, would be to use cam- and results were analyzed before finalizing fusion mod-
era only in its strongest domain to augment informa- els and implementing fusion methods.
tion received by radar instead of increasing inaccura-
cies in one dimension while decreasing in others. One
such potential solution is to keep the state of Kalman 2. Data Description
1 A geometrical relation between two images of the same planar
Coordinates, used in this research are defined as shown
surface, described by the transformation matrix in Fig.1
2 The camera is moving by known pattern, the terrain is flat,

camera position and angle to the surface are known, etc.

93
Figure 2: The setup of test trajectories for experiments

2.1. Simulation Data Description 5. Update rate
In this part of the research, the definition of custom Simulation of detections for VMD and radar and sub-
trajectories for any number of targets as well as radar sequent registration without knowledge of ground truth
and video motion detection (VMD) noise models were or noise model is performed.
implemented. The following simulation parameters Kalman filter state is defined by [𝑥 𝑣𝑥 𝑎𝑥 𝑧 𝑣𝑧 𝑎𝑧 ]T in
can be defined for radar: case of Cartesian coordinates and by [𝑟𝑣𝑟 𝑎𝑟 𝑎𝑧𝑑𝑎𝑧𝑑𝑑𝑎𝑧]T
in Polar coordinates, where
1. Range noise standard deviation
2. Velocity noise standard deviation 1. 𝑣𝑥 and 𝑣𝑧 are velocities in x and z dimensions
3. Detection angle noise standard deviation respectively,
4. False detection rate (per simulation area element) 2. 𝑎𝑥 and 𝑎𝑧 are accelerations
5. Detection view angle 3. 𝑣𝑟 - range change rate
6. Update rate 4. 𝑎𝑧 - azimuth angle
5. 𝑑𝑎𝑧 and 𝑑𝑑𝑎𝑧 - the rate of change for azimuth
The following simulation parameters can be defined and the rate of change of 𝑑𝑎𝑧 (angle accelera-
for the camera: tion)
1. VMD detection box noise (in pixels) Measurements of radar and camera are simulated
2. Camera resolution using Gaussian noise model. For camera – different
3. Angular field of view accuracies of VMD were tested ranging from 1 pixel
4. VMD false positive rate and area to 5 pixels.

94
Measurements are filtered and the mean square er- 1. centralized architecture,
ror (MSE) is calculated for comparison of estimated 2. decentralized architecture,
positions to actual trajectories. 3. distributed architecture,
In Fig.2 test trajectories Trajectory 1 – T1, Trajec- 4. hierarchical architecture.
tory 2 – T2, Trajectory 3 – T3, which were used for
evaluation (camera and radar are at the point (0,0) in Centralized architecture (data from all sources is pro-
a𝑥 − 𝑧 Cartesian coordinate system) are presented. cessed in single module) is expected to be theoreti-
cally optimal in case of proper synchronization of data
sources and sufficient bandwidth for data transfer. It
2.2. Data for Evaluation of Fusion
can suffer, however, from lack of distribution of band-
Methods width and processing in case these resources are lim-
In case of track fusion inputs for fusion module are ited for given task. Alternatives, solving this issue,
are decentralized architecture (fusion nodes incorpo-
1. list of VMD outputs as bounding boxes [𝑥 𝑦 𝑤 ℎ]
rate raw data in different order and composition) and
and ID of VMD track
distributed architecture (fusion nodes receive single sen-
2. list of radar tracker outputs as [𝑥 𝑧 𝑣𝑥 𝑣𝑧 ] and ID
sor data and provide features to be fused). In our view,
of radar track
decentralized architecture would introduce unneces-
In case of data fusion inputs are sary complexity and implementation difficulty, so dis-
1. list of bounding boxes [𝑥 𝑦 𝑤 ℎ] received directly tributed architecture is considered as only another op-
from the "blob" stage of VMD pipeline tion to centralized architecture.
2. list of radar targets [𝑟 𝑣𝑟 𝑎𝑧]
The main difference between sources of data for two 4. Proposed Fusion Methods
approaches is, that in case of data fusion there are many
false detections, which are not filtered by VMD or radar Based on the discussion in the previous section and the
tracker methods respectively. results of preliminary evaluations (described in section
VI), two main fusion approaches are established:

3. Data Fusion Methods 1. data fusion
2. tracks fusion
Overview of different sensor fusion classifications can
Data fusion is a centralized mixed input (DAI-FEO
be found in [1] for a further reading. Here we employ
and FEI-FEO) method, accepting raw outputs of radar
terminology, used by Castanedo in that paper.
and intermediate results of VMD (bounding boxes of
Possible schemes of the sensor fusion in a given ap-
detected blobs).
plication are limited to redundant schemes (with other
Tracks fusion is distributed FEI-FEO method, accept-
possibilities being complementary and cooperative) bas-
ing features (tracks) from VMD and radar tracker mod-
ed on the relations of sensors used in the system. Both
ules.
types of sensors are measuring the same state of ob-
Both should solve the problem defined, but the pros
jects at the same time in the observed area.
and cons of approaches are different. Fusion of tracks
Based on the idea of modularity of the system, it is
is performed after data from both sources is processed
clear, that radar/camera fusion should not be a decision-
and there is track detected on both sets of data. Then
making module. While detections are performed by
both tracks are matched (track to track association) to
the fusion module, the final decision is affected by ad-
better reflect the behavior of the object being tracked.
ditional rule-based filters and the recognition module.
In the case of one of the sources not returning track
The goal of the fusion module is to minimize the false
while other is returning track, different policies can
alarm rate (FAR) of the system and provide inputs for
be used to favor reduced false detection rate over ex-
decision-making modules. It can be said then, that fu-
tended tracking duration or vice versa. Comparing
sion module approaches can be limited to two (data in
with data fusion, tracks fusion is easier to debug and
- feature out (DAI-FEO), f eature in - feature out (FEI-
tune, since separate modules can be tested and tuned
FEO)), contrary to approaches, which provide data or
faster due to overall reduced complexity. Data fusion
decisions as outputs (data in - data out (DAI-DAO),
works on a lower level than tracks fusion. It has the
f eature in - decision out (FEI-DEO), decision in - de-
benefit of incorporating updates from both sources of
cision out (DEI-DEO)).
data into common state update converging to a true
Four types of fusion architectures are defined in [1]:

95
state of the object being tracked faster. The role of 2. The track covariance matrix 𝑃𝑡 . 6x6 matrix or
policies in the case of one of the sources missing de- 4x4 matrix (if accelerations removed), describ-
tections is reduced since both sources of raw data can ing the amount of variance in data and interstate
be treated similarly (both update Kalman filter state). dependencies (covariance).
There is no need to tune policies for different cases 3. The track age. Time elapsed from the moment of
(no recent radar detection, no recent camera detection, track creation by VMD or radar. It is updated if
higher than an average mismatch between sources, etc.), either VMD or radar detection can be associated
it is handled by common update. The downside of data with the current state vector.
fusion compared to track fusion is longer debugging 4. The track innovation error. Value, which is com-
and tuning. If conditions of experiments/use cases or pared to predefined threshold values for track
equipment changes, that could add delivery time over- validation and removal. Track innovation er-
head. ror (squared Mahalanobis distance) is calculated
from state measurement residual (innovation) and
4.1. Data Fusion residual (innovation) covariance. It is one of the
main criteria for positive detection.
General strategy for data-level fusion can be summa-
5. Track update timestamp for VMD
rized as follows:
6. Track update timestamp for radar
1. Use radar detections as a base for track valida- 7. Object size. It can be calculated if both VMD and
tion. radar detected an object.
2. Any VMD detection can create a persistent track, 8. Object visual distance state. 2x1 vector (distance,
but it will be validated without radar data only rate of distance). It is estimated from the camera
after a relatively long track age is reached. and can be used if the positioning of the device
3. Any track with both recent VMD and Radar de- is known. It is highly unstable due to partial ob-
tections has a higher probability to be validated structions and because of that is separated from
as a real track, as a consequence, it is validated the track state vector. It can be used by higher-
at the lower age of the track. level decision-making modules.
Such choices are proposed due to relative ease of 9. Object visual distance covariance matrix. 2x2
radar data validation - if the movement of a potential matrix.
target in the area under test contradicts Doppler ve- 10. The total duration of detection for VMD. It is one
locity, reported by radar, such target can be quickly of the main criteria for positive detection.
invalidated. There is no similar process for VMD. 11. The total duration of detection for radar. It is
Full track state representation consists of the fol- one of the main criteria for positive detection.
lowing parts: The fusion scheme consists of data association, state
1. Track state vector at time 𝑡 (based on definitions update, and management of tracks. Data association
in Section II). 6x1 vector: (point to track) in the first prototype is implemented
as simple Nearest Neighbour (NN) estimation in state
⎡𝑥 ⎤ space, favoring simplicity and speed. Joint Probabil-
⎢𝑣𝑥 ⎥ ity Data Association (JPDA) based association method
⎢ ⎥
𝑎 [24] is used as an alternative in later versions. In NN
𝑋𝑡 = ⎢ 𝑥 ⎥ , (1)
⎢𝑧⎥ based data association each measurement is compared
⎢ 𝑣𝑧 ⎥ against the estimated state 𝑋𝑡|𝑡−1 at time 𝑡 by calculat-
⎢ ⎥
𝑎
⎣ ⎦ 𝑧 ing Mahalanobis distance based metric (discussed fur-
ther). The prefiltering of potential associations can be
or its counterpart in Polar coordinates. In case performed using some simple heuristics like 2D dis-
of constant velocity model (4x1 vector): tance.
Linear KF is used for state estimation and update.
⎡𝑥 ⎤ Estimation step:
⎢𝑣 ⎥
𝑋𝑡 = ⎢ 𝑥 ⎥ . (2)
⎢𝑧⎥ 𝑋𝑡|𝑡−1 = 𝐴𝑋𝑡−1|𝑡−1 , (3)
⎣ 𝑣𝑧 ⎦
where A is process matrix defined for a state with ac-

96
celerations as:
⎡1 𝑑𝑡 𝑑𝑡 2 0 0 0 ⎤ ⎡1 0 0 0⎤
⎢0 1 𝑑𝑡 0 0 0 ⎥ 𝐻 = ⎢0 1 0 0⎥ , (10)
⎢ ⎥ ⎢ ⎥
𝐴=⎢
0 0 1 0 0 0 ⎥
, (4) ⎣0 0 1 0⎦
2
⎢0 0 0 1 𝑑𝑡 𝑑𝑡 ⎥
⎢0 0 0 0 1 𝑑𝑡 ⎥ which simplifies calculations. In the case of camera
⎢ ⎥ (VMD) update without the knowledge of the geomet-
⎣0 0 0 0 0 1 ⎦
rical setup:
for state without acceleration as:
𝑍𝑡 = [𝑥𝑝 ] . (11)
⎡1 𝑑𝑡 0 0⎤
⎢0 1 0 0⎥ here 𝑥𝑝 is just x position of the center of the bounding
𝐴=⎢ , (5)
1 𝑑𝑡 ⎥⎥ box in screen coordinates. The measurement matrix
⎢0 0
⎣0 0 0 1⎦ then

here 𝑋𝑡−1|𝑡−1 means posterior state from the previous 𝐻 = [0 0 𝑤
HFOV 0 𝑤
2 ], (12)
update, 𝑑𝑡 is time passed from the last update. Further,
all the calculations will be described for state without and the state vector needs to be augmented with addi-
accelerations to shorten notations, state vector dimen- tional element 1 as the last row to allow matrix multi-
sionality 𝑚 will be used to describe sizes of matrices. plication in (7). In the above formula, 𝑤 is the width of
The predicted covariance matrix is calculated as the video in pixels. The single element matrix 𝑌𝑡 can
also be calculated by just using 𝑎𝑧 element of the state
𝑃𝑡|𝑡−1 = 𝐴𝑃𝑡−1|𝑡−1 𝐴T + 𝑄, (6) vector as:

where 𝑄 is a square 𝑚x𝑚 matrix defining process noise. 𝑎𝑧 𝑤 + 𝑤
𝑌𝑡 = [𝑥𝑝 − HFOV 2 ]. (13)
The next step of processing is mixed - state update
and data association step. Innovation is calculated for Next, innovation covariance is calculated
each pair of track and a prefiltered measurement to
later allow optimal NN data association: 𝑆𝑡 = 𝐻𝑡 𝑃𝑡|𝑡−1 𝐻𝑡T + 𝑅, (14)

here 𝑅 is 𝑘x𝑘 diagonal measurement error matrix. It is
𝑌𝑡 = 𝑍𝑡 − 𝐻 𝑋𝑡|𝑡−1 , (7)
defined based on the parameters of sensors used. Er-
here 𝐻 is 𝑘x𝑚 measurement matrix, 𝑍𝑡 is the measure- ror ranges from the datasheet of the sensor is a good
ment vector of 𝑘x1 size. The size of the measurement starting point. Kalman gain
vector and the measurement matrix 𝑘 depend on the
type of sensor and coordinate systems used for mea- 𝐾𝑡 = 𝑃𝑡|𝑡−1 − 𝐻 T 𝑆𝑡−1 , (15)
surements and the state. 𝑘 can be understood as num-
here inverse of 𝑆𝑡 is taken. Innovation error, which is
ber of parameters measured by a specific sensor and 𝐻
not used directly in the Kalman filter update, but is the
as mapping between state and measured parameters.
main criterion for data association:
If transformation from sensor coordinates to state co-
ordinates is linear, it can be performed by 𝐻 directly. 𝜖 = 𝑌𝑡T 𝑆𝑡−1 𝑌𝑡 , (16)
In our experiment of using Polar representation of tar-
get state space Lastly, if it is found, that there is the best match
between the track-measurement pair, the Kalman fil-
ter update step is finished by calculating the posterior
⎡ 𝑟 ⎤ state and covariance:
⎢𝑣 ⎥
𝑋𝑡 = ⎢ 𝑟 ⎥ . (8)
⎢ 𝑎𝑧 ⎥ 𝑋𝑡|𝑡 = 𝑋𝑡|𝑡−1 + 𝐾𝑡 𝑌𝑡 , (17)
⎣𝑑𝑎𝑧 ⎦
𝑃𝑡|𝑡 = (𝐼 − 𝐾𝑡 𝐻 )𝑃𝑡|𝑡−1 , (18)
The data vector of radar and measurement matrix:
here 𝐼 is 𝑚x𝑚 identity matrix. The best match be-
⎡𝑟 ⎤ tween track and measurement is found by comparing
𝑍𝑡 = ⎢𝑣𝑟 ⎥ , (9) 𝜖 for each prefiltered pair. It is possible to fail to find a
⎢ ⎥
⎣𝑎𝑧 ⎦ matching pair if all 𝜖 are larger than some predefined

97
threshold 𝜏 . In such a case, a new track state for the /* merging of existing tracks */
track without measurement: for each track t in memory do
for each track t2 in memory except t do
𝑋𝑡|𝑡 = 𝑋𝑡|𝑡−1 , (19) apply (21);
𝑃𝑡|𝑡 = 𝑃𝑡|𝑡−1 . (20) if (21) condition met then
merge t and t2;
Track merging is performed if (similarly to (16)): remove t2 from memory;
end
𝐷 = (𝑋1𝑡 − 𝑋2𝑡 )T 𝑃match
−1
(𝑋1𝑡 − 𝑋2𝑡 )) < 𝐷thresh , (21)
end
where 𝑋1𝑡 and 𝑋2𝑡 are states of two tracks to be com- end
pared, 𝑃match is a diagonal matrix, can be treated as /* updating of tracks by
typical variances of track state elements (with large measurement */
values of 𝑃match more tracks will be matched), 𝐷thresh if have a new measurement of any source then
is a threshold for matching. for each track t in memory do
With the main parts of the data association and track estimate prior state (3) and covariance
update discussed, global view of data fusion can be de- (6) prefilter list of possible
fined (see Algorithm 1). measurements;
During the step of managing unmatched tracks, in- for each measurement with eps small
novation error of track (normally calculated as (16)), enough do
is increased. In current implementation following em- if measurement was already used
pirically found formula is used: then
check for more precise updates
√
𝜖 = 𝜖 ∗ max(1 + 𝑑𝑡/3, 1 + 6𝑑𝑡/𝑇tr ), (22) in earlier tracks;
if no better found then
here 𝑇tr is the age of the track. The idea is to increase update Kalman state (17),
𝜖 for new tracks faster, that older tracks, as the track (18);
age usually shows, how reliable the current track is. push updated state to stack
The age of track is increased if there is VMD or Radar of potential updates;
detection. If detection was from previous measure- mark measurement as used;
ment (contrary to measurements, which came from end
both sources on the same processing step) it is increased else
by the time difference between current and previous update Kalman state (17), (18);
measurements. If track is picked up after an absence push updated state to stack of
of matching measurements, delta time is added based potential updates;
on the type of update (frame update time for VMD or mark measurement as used;
frame update time for Radar). Tracks without current end
matching measurements do not update age value. end
For VMD only state (no recent radar detection) pre- end
vious angular movement is used along with size to for each track t in memory do
create view space gating for measurement to tracks apply the best update from the updated
matching. It is part of measurement prefiltering, men- states stack;
tioned earlier. If angular velocity isn’t initialized (the end
track age is low), the maximum possible rate of move- end
ment based on typical size and velocity ratio is used to /* manage unmatched tracks */
create view space gate. for each unmatched track do
For a track, having recent Radar detection or over- increase track innovation error;
all high radar detection duration, the exact estimated if innovation error reaches threshold then
position is calculated and new detections of VMD and remove a track from memory
Radar are projected on common space to use spatial end
gating. end
Tracks are created/initialized with every moving ob- /* create potential tracks */
ject detection. For radar detection movement condi- for each unmatched measurement do
tion is non-zero Doppler velocity by default or can if satisfies movement conditions then
create a track with an initial state and
initial covariance;
98 end
end
Algorithm 1: Data fusion algorithm
1. Calculate angle to the base of target based on the
bottom edge of VMD detection and knowledge
of VFOV of the camera as:
ℎVFOV
𝛾= , (23)
𝑛𝑣
where ℎ is projection of position, where target
touches the ground on camera view, 𝑛𝑣 - vertical
resolution (number of pixels along the vertical
axis of image).
Figure 3: Distance estimation from camera 2. Subtract this angle from the angle formed by straight-
up direction and camera "looking" direction:

be set as one of the many algorithm parameters. All 𝛽′ = 𝛽 − 𝛾 . (24)
VMD detections treated as moving by definition. Af- 3. Calculate distance, by knowing one side of the
ter the creation track is in a non-validated state. Tracks triangle (height of camera) and the angle between
are considered to be validated after predefined age de- this side and hypotenuse:
pending on the following properties:
𝐷 = 𝐻 tan 𝛽 ′ . (25)
1. Existence or absence of recent VMD/Radar de-
tection. Radar detections (without VMD detec- There are other ways to estimate the distance to the
tion) having a higher impact on validation thresh- target from the camera only. For example, if the tar-
old than another way around. get is identified, the relative size of the target on the
2. Track innovation error. image could signal distance. That requires, however,
3. Total previous duration of detections for radar a feedback loop between the fusion module and the
and VMD recognition module.
4. Trajectory type - mostly tangential movement
with radar only detections should be verified for 4.3. Tracks Fusion
a longer duration.
5. Velocity thresholds. A general strategy for tracks fusion can be summa-
rized as follows:
Tracks to be removed from potential tracks list if:
1. Both sources can create fusion tracks with an-
1. Track innovation error grows too large. It is cal- other component not present until the match will
culated based on new measurements and also in- be found at a later stage.
cremented after the absence of radar detection 2. Input tracks are not verified against Kalman state
(22). estimate (no data association), because this step
2. The track is created by VMD detection and visual- is already done in the tracker module.
only innovation error grows to large. It is calcu- 3. VMD and radar tracker tracks can be merged if
lated from angular measurements only and up- merging requirements are met. From such mo-
dated similarly to (22). ment fusion track has both components.
4. The track can be split, if it is detected, that visual
4.2. Distance Estimation from a Camera and radar data diverge too much. Track split-
ting/merging is performed at each data update
Distance from camera measurements is calculated based step.
on assumptions, that:
Additionally to track structure discussed in the pre-
1. target is not blocked by some other objects vious section following fields are defined for tracking
2. height and elevation of the camera are known state:
3. detection is of ground-based targets 1. Fused track ID (different from components).
The main features used for calculations are presented 2. visualSeparated- boolean value, which shows, that
in Fig.3. Algorithm: track recently had a visual component but lost it
due to diverging of visual and radar.

99
/* updating tracks structures */
if have a new radar tracker frame then
for each track in frame do
if The ID of a track can be found in
already existing then
append a new measurement;
Figure 4: Time alignment of radar and VMD tracks for else
matching create a new list of track entries
with new ID;
end
3. VMD track ID. With new data updates, the de- end
gree of matching between fused tracks compo- end
nents is first checked for stored best previous if have new frame of VMD tracks then
matches same as for radar tracks;
4. Radar tracker track ID. Same as above. end
/* matching of tracks */
The track fusion algorithm is overviewed in Algo- for each fused track do
rithm 2. The operation of mismatch calculation, used calculate mismatch of radar and video;
in algorithm description on many occasions can be ex- for all non-fused tracks of both types do
plained looking at Fig. 4. First, measurement times- calculate mismatch with appropriate
tamps are created for all entries of both types of tracks. (radar vs. VMD) track;
Then estimations for matching are calculated by inter- if a better match found then
polation (or extrapolation, if on edges). Average az- assign a new component to fused;
imuth mismatch is used as a matching parameter. The release the previous component as
early exit of the matching function is possible if mis- non-fused;
match grows to a predefined value. end
The age of track is updated as per data fusion with end
each new radar tracker output considered as new radar end
measurement with time step equal to radar update du- /* generation of tracks */
ration. Track time out is increased, if no measure- for each combination of radar and VMD track
ments were added to track. This step is the same for do
component tracks and the fused track. Time out for calculate mismatch; if mismatch small
deletion calculation, mentioned in Algorithm 2, is cal- enough then
culated based on the current number of tracks. It is de- create a new fused track with matched
fined as 3 s if the number of tracks is less than 𝑁max - components
the maximum number of tracks. If, on the other hand, end
the number of tracks is higher, allowed time out re- end
duces: /* destruction of tracks */
2𝑁max − 𝑁cur
𝑇timeout = 3 . (26) calculate time out of track for deletion;
𝑁max
delete all tracks with higher time out than
This assures, that all tracks are cleared if 𝑁cur reaches allowed;
2𝑁max . If a number of tracks for some reason grows /* tracks state update */
more than 2𝑁max , all tracks are cleared. for each fused track do
Tracks are created/initialized with every moving ob- if any of the track components received
ject detection from radar and every VMD detection. updates then
After creation, VMD track is in non-validated state, a full Kalman filter update
but track, created from radar tracker data directly, is else
in a validated state. Fused track having both compo- update the state as an estimation only
nents can be split, if VMD measurements diverge from (17),(18)
tracker output too much. visualSeparated is set to true end
in this case. It is done to prevent the reacquisition end
of the same VMD track with a high mismatch factor.
Algorithm 2: The track fusion algorithm
VMD only part of such split inherits range data and all

100
Figure 5: Area of testing for fusion evaluation. Position of equipment marked as 0 distance. Distance from equipment to
one of the points of trajectory is shown

data, which is relevant to durations of detections and data acquisition was performed simultaneously. Time
age of the track. It can be matched again later after vi- synchronization was assured by knowing the starting
sualSeparated expires. The split tracks get invalidated time of video and frame rate and storing radar raw data
for some short duration (less than second) by setting or radar tracks with exact timestamps. Although the
it’s innovation error parameter to some high value and discussed algorithms were not running at the time of
gradually reducing it after new measurements. these measurements, close to real-time performance is
achieved later using the same hardware.

5. Experimental Setup
6. Results of Experimental
The area, selected for experiments is shown in Fig. 5.
The selected area allows the testing performance of Investigation
tracking on big enough distance (more than 100 m)
Experimental investigation was subdivided into two
with parts of trajectories being almost exactly tangen-
stages. First, preliminary experiments were performed
tial while moving around the edge of the stadium. A
to select the state model and update strategy for fused
small amount of moving background (cars, people, trees)
tracks. During the second stage, the tracks fusion and
allows more control over experiments.
the data fusion approaches were validated.
Three experiments are performed:
1. A person moving clockwise around the edge of
the stadium without stopping 6.1. Comparison of Cartesian
2. A person moving counter-clockwise around the Coordinates and Polar Coordinates
edge of the stadium, stopping, then proceeding for KF State Representation
3. A person moving clockwise around the edge of
The impact of the selected system of coordinates to
the stadium, then changing the moving pattern
performance is presented in (a) - (c) pictures in Fig.
from mostly tangential to radial
7. KF state representation by Cartesian or Polar coor-
An example frame of video with detection displayed dinates produces very close results and a visual sep-
is presented in Fig. 6. Data were acquired using the aration of results is hardly noticeable. In Table 1, the
NXP iMX6 SoM based embedded system with a quad- results of models, in which Polar or Cartesian coor-
core 1.2 GHz processor. Video recording and radar

101
Figure 6: Example frame of first sequence

Table 1 Table 2
Comparison of performance of models in which Polar or Comparison of performance of models with/without adding
Cartesian coordinates ar used for KF state representation camera to state using MSE
using MSE Trajectory Measured error KF error Polar Fusion error
Trajectory Measured error KF error Cartesian KF error Polar T1 2.4647 0.87358 0.72844
T1 2.4647 0.85995 0.87358 T2 2.2787 0.58846 0.51873
T2 2.2787 0.62444 0.58846 T3 2.7926 0.63428 0.58261
T3 2.7926 0.60758 0.63428

6.3. Accuracy of Filtering with Distance
dinates are used for KF state representation perfor-
mances, are presented. Since results are very similar,
Calculation from Camera Data
it can be concluded, that there is no significant differ- Results of fusion error and fusion with distance mod-
ence in different KF state representations. To obtain els performance evaluation using MSE statistics are
error for each model, 10 simulations were performed presented in Table 3. The best minimum values as
for each and mean MSE calculated. well as mean values of MSE shows, that fusion with
distance evaluation model performance outperforms
6.2. Accuracy of Filtering with Camera model without distance evaluation performance.
An example of a typical simulation run with differ-
Data added
ent models evaluated is shown in Fig. 8
The resulting performances of models with/without
adding camera to state update, represented through 6.4. Fusion Methods Evaluation
MSE, are shown in Table 2. It can be observed, that KF
state updated using camera output represents ground The main metrics for evaluation of two fusion approaches
truth more accurately than updated by radar data only. were Object count accuracy (OCA) and FAR. OCA is
As before, mean MSE is calculated by running simula-
tion 10 times for each trajectory.

102
Figure 7: Comparison of performance of models in which Polar or Cartesian coordinates for KF state representation are
used: (a) – test trajectory T1, (b) – test trajectory T2, (c) – test trajectory T3.

Table 3
Fusion error and fusion with distance estimation error comparison using MSE
Trajectory MAX MIN MEAN STD
Measured error 3.0179 2.2735 2.6342 0.2677
KF error 0.8749 0.4311 0.6654 0.1456
T1
Fusion error 0.8065 0.3300 0.5491 0.1471
Fusion with DIST error 0.8438 0.2631 0.5346 0.1720
Measured error 2.8525 1.9410 2.4447 0.3272
KF error 1.1779 0.4676 0.7667 0.2336
T2
Fusion error 0.9825 0.3526 0.5915 0.2170
Fusion with DIST error 0.9841 0.3333 0.5543 0.2203
Measured error 3.0052 1.5744 2.3672 0.4143
KF error 0.9347 0.3607 0.6654 0.1919
T3
Fusion error 0.7708 0.2087 0.5219 0.1843
Fusion with DIST error 0.6282 0.1687 0.4243 0.1536

defined as where 𝑃𝑡𝐺 and 𝑃𝑡𝐷 are sets of ground truth points and
detected points in measurement frame 𝑡 respectively,
min(𝑀𝑡𝐺 , 𝑀𝑡𝐷 )
(27) 𝑀𝑡 and 𝑀𝑡 are quantities of ground truth and de-
𝐺 𝐷
OCA𝑡 (𝑃𝑡𝐺 , 𝑃𝑡𝐷 ) = ,
𝑀𝑡𝐺 +𝑀𝑡𝐷 tected instances respectively. Overall OCA is defined
2
as the average OCA of all frames of measurements.

103
Figure 8: Typical results of simulation run: (a) – measured positions (T3 trajectory), (b) position estimation squared error

Table 4 2. While radar only tracker performs without false
Data fusion and tracks fusion evaluation results alarms during the first two sequences, it is demon-
Radar tracker Data fusion Tracks fusion strated with the third sequence, that target di-
Sequences
mean
FAR
mean
FAR
mean
FAR rection changes can cause false tracks to appear.
OCA OCA OCA
3. Two issues, mentioned above, can be solved with
1st 0.5034 0 0.9935 0 0.9839 0
2nd 0.2771 0 0.8909 0 0.8832 0 any of two fusion of radar and camera approaches,
3rd 0.907 0.0135 0.9800 0 0.9360 0 as it is seen from evaluation results. OCA in-
creased drastically in both cases compared to radar
only tracking.
FAR is defined as number of frames with false tracks, 4. Data fusion offers slightly better performance,
divided by total observed frames: reflected by higher OCA values. In practice, it
𝑁𝜏 means faster track validation and more robust
FAR = . (28) tracking with missed detections from either VMD
𝑁
or radar.
Any false track appearing in the frame constitutes to 5. The addition of distance measurements from the
given frame becoming a false positive. camera didn’t prove to be stable method for tracks
The focus of the experimental investigation was on matching or state updates in practice. Although
elimination of false detections and reduction of missed simulation was suggesting accuracy improvement,
detections rate. Progress towards both goals can be real measurements were highly unstable while
successfully monitored using selected metrics [25, 26]. using this approach.
Rather conservative policies for tracks validation were
chosen for both versions of fusion to highlight the pos-
sibility to avoid false alarms while still keeping high 8. Acknowledgments
enough detection rate (indirectly shown by OCA) for
all practical purposes. Evaluation results are presented Research is partially funded by Lithuanian Research
in Table 4. Best results are obtained by Data fusion and Council Project Nr. 09.3.3-ESFA-V-711-01-0001, and par-
close to the best results are obtained by Tracks fusion. tially funded by Lithuanian Business Support Agency
Project Nr. 01.2.1-LVPA-T-848-01-0002.

7. Conclusions
References
1. It was observed experimentally, that radar only
tracking suffers from many missed detections, if [1] F. Castanedo, A review of data fusion techniques,
the target trajectory is close to tangential. The Scientific World Journal 2013 (2013).

104
[2] C. Napoli, E. Tramontana, M. Wozniak, En- tomi, Obstacle detection using millimeter-wave
hancing environmental surveillance against or- radar and its visualization on image sequence,
ganised crime with radial basis neural networks, in: Proceedings of the 17th International Confer-
in: 2015 IEEE Symposium Series on Computa- ence on Pattern Recognition, 2004. ICPR 2004.,
tional Intelligence, IEEE, 2015, pp. 1476–1483. volume 3, IEEE, 2004, pp. 342–345.
[3] Y. B. Shalom, Multitarget-multisensor tracking: [16] G. P. Stein, O. Mano, A. Shashua, Vision-based
advanced applications, Artech House, Boston, acc with a single camera: bounds on range and
MA (1990). range rate accuracy, in: IEEE IV2003 Intelli-
[4] Y. Bar-Shalom, X.-R. Li, Multitarget-multisensor gent Vehicles Symposium. Proceedings (Cat. No.
tracking: principles and techniques, volume 19, 03TH8683), IEEE, 2003, pp. 120–125.
YBs Storrs, CT, 1995. [17] F. Liu, J. Sparbert, C. Stiller, Immpda vehicle
[5] D. Willner, C. Chang, K. Dunn, Kalman filter tracking system using asynchronous sensor fu-
algorithms for a multi-sensor system, in: 1976 sion of radar and vision, in: 2008 IEEE Intelligent
IEEE Conference on Decision and Control in- Vehicles Symposium, IEEE, 2008, pp. 168–173.
cluding the 15th Symposium on Adaptive Pro- [18] F. Beritelli, G. Capizzi, G. L. Sciuto, C. Napoli,
cesses, IEEE, 1976, pp. 570–574. F. Scaglione, Rainfall estimation based on the in-
[6] N. Kaempchen, K. Dietmayer, Data synchroniza- tensity of the received signal in a lte/4g mobile
tion strategies for multi-sensor fusion, in: Pro- terminal by using a probabilistic neural network,
ceedings of the IEEE Conference on Intelligent IEEE Access 6 (2018) 30865–30873.
Transportation Systems, volume 85, 2003, pp. 1– [19] A. Sole, et al., Solid or not solid: Vision for radar
9. target validation, in: IEEE Intelligent Vehicles
[7] E. A. Wan, R. Van Der Merwe, S. Haykin, The un- Symposium, 2004, IEEE, 2004, pp. 819–824.
scented kalman filter, Kalman filtering and neu- [20] C. Kreucher, S. Lakshmanan, K. Kluge, A driver
ral networks 5 (2001) 221–280. warning system based on the lois lane detec-
[8] D. Laneuville, C. Jauffret, Recursive bearings- tion algorithm, in: Proceedings of IEEE inter-
only tma via unscented kalman filter: Cartesian national conference on intelligent vehicles, vol-
vs. modified polar coordinates, in: 2008 IEEE ume 1, Stuttgart, Germany, 1998, pp. 17–22.
Aerospace Conference, IEEE, 2008, pp. 1–11. [21] R. Deriche, O. Faugeras, Tracking line segments,
[9] J. Lian-Meng, P. Quan, F. Xiao-Xue, Y. Feng, A Image and vision computing 8 (1990) 261–270.
robust converted measurement kalman filter for [22] S. D. Gupta, J. Y. Yu, M. Mallick, M. Coates,
target tracking, in: Proceedings of the 31st Chi- M. Morelande, Comparison of angle-only filter-
nese Control Conference, IEEE, 2012, pp. 3754– ing algorithms in 3d using ekf, ukf, pf, pff, and
3758. ensemble kf, in: 2015 18th International Confer-
[10] S. V. Bordonaro, Converted measurement track- ence on Information Fusion (Fusion), IEEE, 2015,
ers for systems with nonlinear measurement pp. 1649–1656.
functions, Ph.D. thesis, The school of the thesis, [23] D. V. Stallard, Angle-only tracking filter in mod-
Doctoral Dissertation, 2015. ified spherical coordinates, Journal of Guidance,
[11] S. Blackman, R. Popoli, Design and analysis of Control, and Dynamics 14 (1991) 694–696.
modern tracking systems(book), Norwood, MA: [24] T. Fortmann, Y. Bar-Shalom, M. Scheffe, Sonar
Artech House, 1999. (1999). tracking of multiple targets using joint proba-
[12] D. Y. Kim, M. Jeon, Data fusion of radar and im- bilistic data association, IEEE journal of Oceanic
age measurements for multi-object tracking via Engineering 8 (1983) 173–184.
kalman filtering, Information Sciences 278 (2014) [25] F. Beritelli, G. Capizzi, G. Lo Sciuto, C. Napoli,
641–652. M. Woźniak, A novel training method to preserve
[13] X. Wu, J. Ren, Y. Wu, J. Shao, Study on target generalization of rbpnn classifiers applied to ecg
tracking based on vision and radar sensor fusion, signals diagnosis, Neural Networks 108 (2018)
in: WCX World Congress Experience. SAE Inter- 331–338.
national, 2018. [26] G. Capizzi, G. Lo Sciuto, C. Napoli, D. Polap,
[14] G. Alessandretti, A. Broggi, P. Cerri, Vehicle and M. Wozniak, Small lung nodules detection based
guard rail detection using radar and vision data on fuzzy-logic and probabilistic neural network
fusion, IEEE transactions on intelligent trans- with bioinspired reinforcement learning, IEEE
portation systems 8 (2007) 95–105. Transactions on Fuzzy Systems 28 (2020) 1178–
[15] S. Sugimoto, H. Tateda, H. Takahashi, M. Oku- 1189.

105