Vehicle Tracking at Urban Intersections
            Using Dense Stereo
               Alexander BARTH 1 , David PFEIFFER and Uwe FRANKE
                Daimler AG, Group Research and Advanced Engineering,
                               Sindelfingen, Germany

            Abstract. A new approach for vehicle tracking at urban intersections based on
            stereo-vision is proposed. Objects are represented as rigid 3D point clouds and
            tracked by means of extended Kalman filtering. In this contribution we combine the
            advantages of a generic feature-based 3D point cloud model with vehicle specific
            geometrical and kinematical constraints to estimate the pose and motion state of
            oncoming vehicles at intersections. Real-time dense stereo disparity maps provide
            new opportunities in reconstruction of the 3D driving scene. An efficient and com-
            pact Stixel World representation is computed that segments the scene into drivable
            freespace and obstacles on the ground. Based on this data we derive the silhouette
            of an object in the image and constrain its pose in space during turning maneuvers.
            The system has been successfully tested on various real-world scenarios and runs
            in real-time on VGA images in our demonstration car.
            Keywords. Vehicle Tracking, Driver Assistance Systems, Dense Stereo Vision,
            Kalman filtering


Introduction

Detecting and tracking other traffic participants at urban intersections has attracted spe-
cial attention in the intelligent vehicle domain due to the large number of accidents still
occurring day-to-day. Monitoring moving objects at such accident hotspots with station-
ary cameras, typically from elevated position, has been addressed by many researchers
in the past, e.g. [1,2,3].
     Previous work on vision-based vehicle tracking from a moving platform mainly con-
centrates on highway scenarios. However, precise information on the behavior of the on-
coming and cross traffic at intersections provides a fundamental basis for future driver
assistance and safety applications.
     In general, one can distinguish between geometric and feature-based vehicle track-
ing approaches. Geometric approaches try to fit a geometric model, e.g. a cuboid [4,5]
or more sophisticated vehicle models [3], to the given sensor data. Such approaches per-
form well as long as the model is a sufficient approximation of the real object, and the
data is reliable.

  1 Corresponding Author: Alexander Barth, Daimler AG, GR/PAP, HPC 050/G024, Sindelfingen, Germany;

E-mail: alexander.barth@daimler.com.
     Feature-based methods, as for example [6,7,8,9], model an object by a set of char-
acteristic features, e.g. gray value or color statistics, edges, corners, etc. These statistics
can be determined online and are usually more flexible than geometric models. A draw-
back of such methods is that, without any geometric constraints, detected objects may
be incomplete, i.e., parts of the object that are not covered by a feature are missing, or
features belonging two different physical objects are merged, i.e., the separation between
two close objects can fail.
     In [9], we have proposed a feature-based vehicle tracking approach that simulta-
neously estimates the pose and motion parameters of a rigid point cloud, representing
the vehicle’s shape. Object points are detected and grouped on the Gestalt principle of
common fate, i.e., points of common motion are likely to belong to the same object. The
system has been successfully applied to predict the driving path of oncoming vehicles at
country road scenes.
     In this contribution, we will extend this feature-based point cloud model by a geo-
metric model to overcome two fundamental assumptions made in the original approach.
First, it is not longer required that the vehicle dimension can be reconstructed sufficiently
well from the observed point cloud. Secondly, a good estimate of the vehicle’s center rear
axle is essential to predict the object pose at highly dynamic turn maneuvers. However, it
cannot be observed from motion as long as the vehicle is moving straight or not moving
at all. The geometric model allows for adding direct measurements for the center rear
axle position based on an object’s image silhouette and dense stereo disparity maps.
     To be able to process the large amount of dense stereo data in real-time, an efficient
Stixel World representation, which has been recently proposed by Badino et al [10], is
used. This representation models both the drivable freespace and its boundaries, corre-
sponding to obstacles over ground.
     In Section 1, we will first briefly summarize our feature-based vehicle tracking ap-
proach and then extend this model in Section 2 by geometric constraints. Section 3 gives
a short overview on the stixel representation, that is used in Section 4 and 5 for an initial
pose refinement and to derive additional filter measurements respectively. Experimental
results on real-world scenes are shown and discussed in Section 6.


1. Feature-Based Vehicle Tracking Approach

The object model consists of a state vector x, including pose and motion parameters of
an observed object, and a rigid 3D point cloud Θ:


                                      OBJ |= {x, Θ}                                        (1)

     The main idea is to estimate x based on the 3D displacement of the rigid point
cloud in a stereo image sequence using an extended Kalman filter [11]. Fig. 1(a) gives
an overview on the general system.
     Formally, vehicles are modeled as rigid body, whose pose relative to the ego vehicle
is defined by the transformation of a local object coordinate system with respect to the
ego coordinate system attached to the ego vehicle. The object pose can be fully described
by an (arbitrary) reference point on the object, P ref , representing the object origin, and
the Euler angle ψ, indicating the rotation around the height axis (see Fig. 1(b)).
                                                            Object

                                                                                  Prot

                      Depth from         Motion from        x
                       Stereo            Optical Flow
                                                                            Pref= (X, 0, Z)T
                                                                     ψ
                           3D displacements                                              z

                       Object               Motion
                       Model                Model               z

                        Extended Kalman Filter                                                  x
                                                                     (0, 0, 0)T
                        Object Pose and Motion                                    Ego Vehicle

                                   (a)                                     (b)

          Figure 1. (a) System overview. (b) Object coordinate systems and pose parameters.


     Incorporating vehicle specific characteristics, it is further assumed that the Z-axis
of the object coordinate system is ideally aligned with the longitudinal axis of the ve-
hicle, i.e., ψ corresponds to the moving direction. Lateral movements are restricted to
circular path motion based on a simplified bicycle motion model. This motion model is
parametrized by velocity v and acceleration v̇ in the moving direction as well as the yaw
rate ψ̇, i.e., the change of orientation. The model further requires the object origin to
be located at the center rear axle of the vehicle. We will denote this characteristic point
as rotation point, P rot , in the following. Its relative position to the reference point is
typically not known at initialization and has to be estimated.
     Pose and motion parameters are summarized in the following state vector:

                                                                                            T

                    x =  e Xref , e Zref , o Xrot , o Zrot , ψ , v, ψ̇, v̇  .                     (2)
                                                                           
                         |                 {z                 } | {z }
                                                     pose                          motion


     with reference point e P ref = [ e Xref , 0, e Zref ]T in ego coordinates and rotation
point o P rot = [ o Xrot , 0, o Zrot ]T in object coordinates. For simplicity, a planar ground
is assumed, i.e., e Yref = o Yrot = 0 independent of the lateral and longitudinal posi-
tion. This model can be easily extended by incorporating height information on the 3D
geometry of the ground, e.g., using a height varying road model as proposed in [12].
The ego-motion is estimated and compensated, before the filter prediction step, using the
method proposed in [13].
     The object dimension is not explicitly modeled. Instead it is assumed that the ob-
ject’s shape is sufficiently represented by a set of 3D points. Each point o Pm =
                    T
[ o Xm o Ym o Zm ] ∈ Θ, 1 ≤ m ≤ M , has a fixed position within the object coordinate
system and can be observed in terms of an image coordinate hum (t), vm (t)i and stereo
disparity dm (t) at time t. These point measurements build the measurement vector z p
with
                                             Initial Point Cloud
                                             Updated Point Cloud


           1.4
           1.2
             1
           0.8
       Y


           0.6
           0.4
            −3
                 −2
                      −1
                             0
                                 1                                 −1
                                         2                 0
                                             3       1
                                     Z
                                                            X

                                 (a)                                                        (b)

Figure 2. (a) Object point cloud at initialization and after 10 time steps. The initial point cloud, spread over
6 m in longitudinal direction, is successively refined. (b) Example 6D-Vision motion field used for object
detection. The color encodes lateral velocity (red fast, green slow). Points at the front of vehicle no. 1 show
larger lateral velocities compared to points at the rear, since the vehicle is turning.


                      z p (t) = [u1 (t), v1 (t), d1 (t), . . . , uM (t), vM (t), dM (t)]T .                 (3)

     The projection of each point onto the image plane is tracked using a feature tracker,
e.g. the well-known KLT-tracker [14], to be able to reassign measurements of the same
3D point over a sequence of images. The nonlinear measurement model directly follows
from the transformation between object and camera coordinates and the well-known pro-
jection equations of a finite perspective camera.
     Since the exact position of a given object point is typically not known at initializa-
tion, it has to be estimated from the noisy measurements. For real-time applicability, the
problem of motion estimation is separated from the problem of shape reconstruction.
Thus, instead of estimating shape and motion simultaneously by integrating Θ into x,
the point positions are refined outside the Kalman filter.
     For each object point o Pm , observed several times in terms of o P̃m (t), a maxi-
mum likelihood estimation, assuming uncorrelated measurements and zero-mean Gaus-
sian measurement noise [15], is given by

                                                                       −1
                                                 t
                                                 X                            t
                                                                              X
                       o                                  −1                          −1
                           Pm (t + 1) =                 Cm  (j)                    Cm  (j) o P̃m (j)      (4)
                                                 j=tm                         j=tm


     where Cm (t) denotes the 3 × 3 covariance matrix of o P̃m (t), and tm the discrete
time step point o Pm has been added to the model. An example for an initial noisy object
point cloud and the same point cloud refined over 10 time steps is given in Figure 2(a).
     Moving objects are detected from the 3D motion field of a number of feature points,
distributed over the whole image (see Figure 2(b)). The motion vectors are estimated
based on the principle of 6D-Vision, i.e., the pointwise fusion of depth and motion by
Kalman filtering [16].
     The vehicle tracking is initialized from a cluster of points moving in the same direc-
tion with equal velocity. The average velocity vector gives the moving direction (initial
orientation) and the initial vehicle speed. Both the reference point and the rotation point
are initialized at the centroid of the initial point cloud.
                                                                                   l

                                                                        Prot           Z


                                                                                                front
                                                                  rear
                                                          w
         h                                                             X
                       Prot                  w                        ρ
                          l
                     (a) Perspective View                            (b) Bird’s Eye View

Figure 3. Geometric box model with the rotation point at the center rear axle located at a constant distance ρ
from the vehicle’s rear. The object corners as well as the side centers have a fixed position with respect to the
rotation point describable in terms of w, h, l, and ρ.


2. Geometric Object Model

We now extend the object model in Eq. 1 by a cuboid D, yielding


                                            OBJ |= {x, Θ, D}                                                 (5)

     with D = [w, l, h]T , approximating the object dimension in terms of width w, length
l, and height h, independent of the currently observed point cloud.
     This has several advantages. A cuboid covers a certain region in space or on the
image plane that can be used to associate new points to an object. Decoupling the ob-
ject dimension from the point cloud is extremely helpful if parts of the point cloud are
occluded or lost by the feature tracker. In addition, restricting the cuboid dimension to
the expected size of road vehicles allows for rejecting points with similar motion pattern
belonging to different, close-by objects in dense traffic scenes.
     The cuboid model is used to integrate a basic semantical meaning on vehicle sides
(front, rear, left, right) or characteristic points, e.g. front left corner or center of the right
side. The objective is to align the sides of the cuboid with the physical sides of the
vehicle. As can be seen in Fig. 3, all points on the cube sides have a fixed position with
respect to a virtual vehicle coordinate system with the origin at the rotation point. The
Z-distance ρ between rotation point and rear side is a given (vehicle specific) constant.
In practice, ρ = 1 m is a good approximation for most road vehicles.
     Accordingly, if the dimension of the object is known correctly, it is straight forward
to describe the rotation point relative to a given object side or corner. This means, all cor-
ners or sides observable by a sensor can be used to constrain the position of the rotation
point.
     Based on this idea, we introduce a second vector z rot of K direct measurements
for the rotation point, with z rot = [ o P̃ rot,1 , . . . , o P̃ rot,K ]T , and concatenate this vector
with the point measurement vector in Eq. (3) to the total measurement vector z with

                                                       
                                                   zp
                                               z=         .                                                  (6)
                                                  z rot
 (a) Dense disparity image (SGM)                (b) Free space                  (c) Stixel representation

Figure 4. (a) Dense stereo results overlaid on the image of an urban traffic situation. The colors encode the
distance, red means close, green represents far. Note that SGM delivers measurements even for most pixels on
the road. (b) Freespace obtained from an evaluation of the stereo occupancy grid. (c) Stixel representation for
this situation, the colors encode the lateral distance to the expected driving corridor shown in blue.

     The additional measurements are intended to stabilize the rotation point position
and to prevent that the filter wrongly compensates ambiguities due to error-prone point
positions in the object model by shifting the rotation point outside the object at rotational
movements. This problem will be addressed in detail in the experimental results (see
Section 6). The unknown object dimension is updated outside the Kalman filter by low
pass filtering in this approach, when dimensional measurements are available.
     At this point the extended model is independent of where the actual measurements
for the rotation point or cuboid dimensions come from. In the following, we present an
example realization based on dense stereo data.


3. Stixel-based Scene Representation

Real-time implementations of dense stereo algorithms, such as Semi-Global Matching
(SGM) [17], on dedicated hardware [18] provide significantly more information on the
3D environment compared to sparse stereo methods. The gain in information and preci-
sion allows for improved scene reconstruction and object modeling.
     However, more information means there is also more data to process. Thus, effi-
cient data representations are beneficial to ease further interpretation. We use a very
comprehensive but powerful scene representation that has been recently proposed by
Badino et al. [10] for the intelligent vehicle domain.
     It is based on the fact that traffic scenes typically consist of a relative planar free
space which is limited by 3D obstacles that are nearly perpendicular to the ground. The
so called Stixel World represents the 3D scene by a set of rectangular sticks, named
“stixels”, as shown in Fig. 4(c). Each stixel is defined by its 3D position relative to the
camera and stands vertically on the ground, having a certain height. Each stixel limits
the free space and approximates the object boundaries. As illustrated in Fig. 4, the Stixel
World is created by the following steps:
     First, we compute a dense disparity image using SGM. Fig. 4(a) shows that SGM is
able to model object boundaries precisely. In addition, the smoothness constraint used in
the algorithm leads to smooth estimates in low contrast regions, exemplary seen on the
street and the untextured parts of the vehicles and buildings.
     In the second step, a stochastic occupancy grid is generated from the stereo dispar-
ities using the method presented in [19]. An occupancy grid is a two-dimensional array
which models occupancy evidence of the environment. Only those 3D measurements
lying above the road are registered as obstacles in the occupancy grid. From this grid
the freespace shown in Fig. 4(b) is computed. The used dynamic programming approach
turns out to be highly robust w.r.t. disparity noise.
     Each free space point of the polygon in Fig. 4(b) indicates not only the interruption
the free space but also the base-point of a potential obstacle located at that position. Fol-
lowing the considered image column upward, the disparity is nearly constant but jumps
to a smaller value above the object. This allows us to compute the upper boundary (i.e.
the height) of the obstacles in a second pass of dynamic programming.
     Given base and top point of an obstacle in a certain image column, all object dispar-
ities are averaged using a robust estimator to determine the distance of this stixel. This
averaging significantly reduces the disparity noise and improves the depth accuracy. In
praxis we use a stixel width of 3-7 pixel, which further improves the results. The final
result is depicted in Fig. 4(c).
     The sketched stixel representation features the following properties:
    • Compactness: A significant reduction of the data volume is offered. If, for exam-
      ple, the width of the stixels is set to 5 pixels, a scene from a VGA image can be
      represented by 640/5 = 128 stixels instead of 300.000 disparities.
    • Completeness: The geometrical information contained in this representation is
      sufficient for many recognition tasks in driver assistance.
    • Robustness: Outliers in the data have minimal or no impact on the resulting rep-
      resentation.
     It is straight forward to cluster groups of stixels to objects based on depth disconti-
nuities. Throughout this article it is assumed that vehicles at intersections are sufficiently
isolated, i.e., the left and right end of a stixel cluster as well as all stixels in between
correspond to a single object.


4. Initial Pose Refinement

The motion-based object detection method initializes objects according to the initial
point cloud and with an expected dimension of typical road vehicles. As can be seen in
Fig. 5(b), the white box, indicating the projection of the initial object hypothesis onto the
image plane, is not accurately aligned with the object boundaries. At the same time, the
corresponding stixel cluster provides a very good segmentation as visualized in Fig. 5(a).
Therefore, the motion based object detection and initialization method is extended by an
initial pose refinement step. The objective is to compute an improved initial vehicle pose
that is consistent with the stixel data and that places the rotation point closer to the actual
center rear axle.
     The idea is as follows: The image columns of the left and right stixel of the cor-
responding stixel cluster, denoted as ul and ur respectively, define the viewing range
and constrain the expected vehicle position in lateral direction (see Fig. 5(c)). At the
same time, the disparity values dl and dr of the boundary stixels introduce constraints on
distance.
     Depending on the given object pose hypothesis hb    x, Di,
                                                             b the left and right most stixel
are directly linked to corresponding object corners. The inner stixels cannot be assigned
                               (a) Stixel Cluster                   (b) Refinement Result
                               Pose Hypothesis                          Refined Pose

                                1
                                                      4
                                                                    1
                                                                                               4

                                                      3                    2               3

                                      2


                       image plane
                                     ul      ur                           ul           ur


                                             camera                                    camera
                                             origin                                    origin

                                                 (c) Geometric Constraints

Figure 5. Six constraints on depth and lateral position are derived from a stixel cluster and assigned to four
characteristic object points, that depend on visibility properties of the pose prior. The initial pose is iteratively
refined by a maximum likelihood estimation.

to a concrete point at the visible object sides that easy, although they also provide valu-
able information on depth. Due to perspective, one or two vehicle sides are visible in
the image at one time. Thus, we divide the inner stixels based on an expectation on the
number of image columns to be covered by a visible side. The median disparity dsi over
all stixels assigned to a given object side si , i ∈ {1, 2}, is taken as additional depth con-
straint on the center of that side. Since the median is more robust to outliers compared to
the mean, inaccuracies in assigning the inner stixels to object sides are acceptable.
      Formally, we can summarize the constraints in a vector c as


                                                                                T
                                          c = [ul , ur , dl , dr , ds1 , ds2 ] .                                 (7)

     Now, fhb b , with
            x,Di


                                                      fhb b (y) = c,
                                                        x,Di                                                     (8)

     defines the functional model between the constraints and the parameters y to be
refined:

                                                 o                             T
                                          y=          Xrot , o Zrot , ψ, Ω(b
                                                                           x)          .                         (9)
    Here, Ω(b x) denotes the size of the object side presumingly covered by more stixels
based on the pose prior x   b, i.e., only one dimension, width or length, is estimated at
one time. With an increasing stereo uncertainty at larger distances (e.g. > 40 m for a
0.3 m stereo baseline), the parameter vector is reduced to contain only the rotation point
position, since reliable size measurements cannot be obtained and the object motion gives
a much better estimate on orientation.
    The parameters are estimated using a maximum likelihood estimation [15]. Since
fhb
  x,Di is nonlinear, it has to be linearized at y 0 derived from the pose prior. Then, the
    b
parameter updates ∆y, with y ≈ y 0 + ∆y, are computed as
                                −1                          
                    ∆y = AT Σ−1
                             cc A    AT −1
                                       Σ cc   c − fhb
                                                    x,Di
                                                      b  (y 0 )  ,                               (10)

                                                      ∂f           
                                                            hx,
                                                             b Di
    where the matrix A indicates the Jacobian                                , and Σcc the covariance
                                                                c
                                                            ∂y
                                                                        y0
matrix of the constraints.
     This estimation procedure is iterated a few times (typically three iterations are suf-
ficient). If the updates converge to approximately zero, the initial object pose is updated
by the refined pose. Otherwise it is not changed to prevent a degradation. Fig. 5(b) shows
the result of the refinement step for the poor initialization example discussed before. As
can be seen, the orange box approximates the object much better and the rotation point
has been moved from the centroid of the point cloud towards the rear.
     Note that the refinement approach proposed above assumes an object to be com-
pletely visible in the image. Special handling of partly occlusions is outside the scope of
this article.


5. Stixel Measurements

The pose refinement method proposed above, is used not only at initialization, but also to
yield direct measurements of the rotation point during the tracking phase. The measure-
ment vector z rot (see Sec. 2) is set to z rot = [ o Xrot , o Zrot ]T . The Kalman filter state
prediction is used as pose prior for computing the rotation point measurements.
     The measurement noise can be derived from the estimation procedure, since
                      −1
Σ∆y∆y = AT Σ−1    cc A      gives the covariance of the parameter updates. The final co-
variance matrix is given by error propagation.
     The simultaneously estimated size Ω(b   x) is used to update the corresponding cuboid
dimension, width or length, by a slow low pass outside the Kalman filter. In addition,
height measurements are easily obtained from the stixel cluster.


6. Experimental Results

The proposed system has been tested on various real-world intersections scenes. The
tracking results of an example sequence are superimposed in Fig. 6. The bounding box
indicates the object pose and the carpet on the ground the predicted driving path based on
the current motion state assuming constant yaw rate and constant acceleration. Optical
                             (a)                                                        (b)                                                           (c)

Figure 6. Example tracking results of an intersection scene with the estimated pose and predicted driving path
superimposed. A second oncoming object is detected as soon as it starts moving (right image).

                                                                                                                               Rotation point drift
                       32                                                                           4
                                                                                                            Initial Object Cuboid
                                                                                                   3.5      Unconstraint Rotation Point
                       30
                                                                                                            Constraint Rotation Point
                                                                        Unconstrained               3
                       28
                                                                                                   2.5
                       26
                                                                                                    2
          Distance Z


                       24
                                                                                                   1.5
                                                                                              z


                       22
                                                                                                    1
                       20
                                                                                                   0.5
                             Constrained
                       18
                                                                                                    0

                       16                                                                         −0.5

                       14                                                                          −1
                        −5                 0                        5                   10                 −4         −3         −2           −1       0    1
                                               Lateral position X                                                                         x


         (a) Estimated trajectories in ego-coordinates                                                   (b) Drift of Prot w.r.t. initial pose

Figure 7. Additional rotation point measurements prevent that the Kalman filter minimizes the measurement
error of the observed noisy object points by shifting the rotation point to a position outside the object.


flow vectors are also visualized. The object is successfully tracked through the maneuver
until it leaves the visual field of the camera.
     The estimated trajectory of this sequence (black solid line) and the corresponding
object poses (green solid boxes) are shown in Fig. 7(a) from a bird’s eye view. Without
additional rotation point measurements the filtering fails in this sequence, indicated by
the second trajectory in this figure (red dashed boxes).
     The Kalman filter minimizes the residual between predicted and measured object
points. If the filter is allowed to change the rotation point without any geometric con-
straints, it is possible that the rotation point drifts away from the object point cloud as
occurring in this example (see Fig. 7(b)). A reliable prediction of the turn maneuver is
not possible anymore, thus, the object track is rejected. This demonstrates the importance
of the geometric constraints on the rotation point.
     Further example results are depicted in Fig. 8. The Kalman filter state estimates of
the yaw rate and velocity of the first row sequence are shown in Fig. 9. The velocity
profile shows a typical acceleration behavior at turn maneuvers. The driver first slightly
reduces the velocity and then accelerates after the maximum turn rate is reached.
     The system stably runs at 80 ms cycle time on 640 × 480 images, including 40
ms for the freespace and stixel computation, 25 ms for feature tracking and ego-motion
computation, as well as 2 − 3 ms for tracking and pose refinement of a single object on
a current Quad Core processer, without exploitation of massive parallel computing, e.g.,
on the graphics card.
                                                                                   31

                                                                                   30

                                                                                   29

                                                                                   28


                                                                      Distance Z
                                                                                   27

                                                                                   26

                                                                                   25

                                                                                   24

                                                                                   23

                                                                                   22
                                                                                    −4    −2             0        2         4        6   8
                                                                                                          Lateral position X

                 (a)                               (b)                                                       (c)

                                                                                   36

                                                                                   34

                                                                                   32


                                                                      Distance Z
                                                                                   30

                                                                                   28

                                                                                   26

                                                                                   24
                                                                                               −5                  0             5
                                                                                                          Lateral position X


                 (d)                               (e)                                                       (f)

                                                                                   18


                                                                                   17
                                                                      Distance Z


                                                                                   16


                                                                                   15


                                                                                   14


                                                                                         −3         −2      −1       0       1       2   3
                                                                                                          Lateral position X

                 (g)                               (h)                                                        (i)

Figure 8. Example results with bounding box and predicted driving path superimposed (first column), stixel
clusters and SGM stereo (second column), and resulting trajectories from bird’s eye view (third column).


7. Conclusion

We have presented a hybrid vehicle tracking approach that combines a feature-based
point cloud model with geometric constraints for application of tracking turning vehicles
at intersections.
     The Stixel World provides a powerful and efficient representation for the precise lo-
calization and segmentation of object boundaries. This information has been used for an
initial pose refinement step and to derive additional measurements that constrain the po-
sition of the rotation point to the object’s lateral center during filtering. The experimental
results have shown a significant improvement of the tracking of oncoming vehicles at
intersections.
     The realization based on dense stereo stixels has demonstrated the practical usage
of the generic measurement model. Alternative image-based methods for extracting the
object silhouette in the image or other range sensors, such as lidar scanners, could be
integrated accordingly.
     Further investigations will include the direct integration of the single stixels into the
object shape model and to address the problems arising at extreme dense traffic scenes
that have been excluded in this contribution.
                                    1                                                                      10.5

                                                                                                            10
                                   0.8
                                                                                                            9.5


                                                                                 Absolute Velocity (m/s)
               Yaw Rate (rad/s)
                                   0.6                                                                       9

                                                                                                            8.5
                                   0.4
                                                                                                             8

                                   0.2                                                                      7.5

                                                                                                             7
                                    0
                                                                                                            6.5

                                  −0.2                                                                       6
                                      0     10      20        30       40   50                                0     10      20        30       40   50
                                                 Frame # (∆ T=0.04s)                                                     Frame # (∆ T=0.04s)

                                          (a) Estimated Yaw Rate                                                  (b) Estimated Velocity

Figure 9. Motion state estimation results. The precise yaw rate and velocity estimate of the oncoming traffic
provides useful information for future driver assistance systems.

References

 [1]   H. Veeraraghavan and N. Papanikolopoulos, “Combining multiple tracking modalities for vehicle track-
       ing at traffic intersections,” in IEEE Conf. on Robotics and Automation, 2004.
 [2]   S. Atev, H. Arumugam, O. Masoud, R. Janardan, and N. Papanikolopoulos, “A vision-based approach to
       collision prediction at traffic intersections,” Intelligent Transportation Systems, IEEE Transactions on,
       vol. 6, no. 4, pp. 416–423, Dec. 2005.
 [3]   A. Ottlik and H. H. Nagel, “Initialization of model-based vehicle tracking in video sequences of inner-
       city intersections,” Int. J. Comput. Vision, vol. 80, no. 2, pp. 211–225, 2008.
 [4]   R. Danescu, S. Nedevschi, M. Meinecke, and T. Graf, “Stereovision based vehicle tracking in urban
       traffic environments,” Intelligent Transportation Systems, IEEE Conference on, pp. 400–404, 2007.
 [5]   B. Barrois, S. Hristova, C. Woehler, F. Kummert, and C. Hermes, “3D pose estimation of vehicles using
       a stereo camera,” in Intelligent Vehicles Symposium, IEEE, 2009.
 [6]   D. Beymer, P. McLauchlan, B. Coifman, and J. Malik, “A real-time computer vision system for measur-
       ing traffic parameters,” in Computer Vision and Pattern Recognition, San Juan, Puerto Rico, 1997, pp.
       495–501.
 [7]   T. Dang, C. Hoffmann, and C. Stiller, “Fusing optical flow and stereo disparity for object tracking,”
       IEEE 5th International Conference on Intelligent Transportation Systems, pp. 112–117, 2002.
 [8]   B. Leibe, N. Cornelis, K. Cornelis, and L. Van Gool, “Dynamic 3D scene analysis from a moving
       vehicle,” in Computer Vision and Pattern Recognition, CVPR. IEEE Conference on, 2007, pp. 1–8.
 [9]   A. Barth and U. Franke, “Where will the oncoming vehicle be the next second?” in Intelligent Vehicles
       Symposium, IEEE, 2008.
[10]   H. Badino, U. Franke, and D. Pfeiffer, “The stixel world - a compact medium level representation of the
       3D world,” in DAGM Symposium, Jena, Germany, September 2009.
[11]   Y. Bar-Shalom, X. Rong Li, and T. Kirubarajan, Estimation with Applications To Tracking and Naviga-
       tion. John Wiley & Sons, Inc, 2001.
[12]   A. Wedel, U. Franke, H. Badino, and D. Cremers, “B-spline modeling of road surfaces for freespace
       estimation,” in IEEE Intelligent Vehicles Symposium, 2008, pp. 828–833.
[13]   H. Badino, “A robust approach for ego-motion estimation using a mobile stereo platform,” in 1st Intern.
       Workshop on Complex Motion (IWCM04), Guenzburg, Germany, October 12 - 14 2004.
[14]   C. Tomasi and T. Kanade, “Detection and tracking of point features,” Carnegie Mellon University, Tech.
       Rep. CMU-CS-91-132, April 1991.
[15]   C. McGlone, Ed., Manual of Photogrammetry, 5th ed. Amer. Soc. Photogrammetry, 2004.
[16]   U. Franke, C. Rabe, H. Badino, and S. Gehrig, “6D-vision: Fusion of stereo and motion for robust
       environment perception,” in 27th DAGM Symposium, 2005, pp. 216–223.
[17]   H. Hirschmüller, “Accurate and efficient stereo processing by semi-global matching and mutual infor-
       mation,” in Computer Vision and Pattern Recognition, CVPR, vol. 2, June 2005, pp. 807–814 vol. 2.
[18]   F. E. S. Gehrig and T. Meyer, “A real-time low-power stereo engine using semi-global matching,” in
       International Conference on Computer Vision Systems, ICVS, 2009.
[19]   H. Badino, T. Vaudrey, U. Franke, and R. Mester, “Stereo-based free space computation in complex
       traffic scenarios,” in IEEE Southwest Symposium on Image Analysis and Interpretation, 2008.