<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Vehicle Tracking at Urban Intersections Using Dense Stereo</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander BARTH</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David PFEIFFER</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Uwe FRANKE Daimler AG</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Group Research</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Advanced Engineering</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sindelfingen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>A new approach for vehicle tracking at urban intersections based on stereo-vision is proposed. Objects are represented as rigid 3D point clouds and tracked by means of extended Kalman filtering. In this contribution we combine the advantages of a generic feature-based 3D point cloud model with vehicle specific geometrical and kinematical constraints to estimate the pose and motion state of oncoming vehicles at intersections. Real-time dense stereo disparity maps provide new opportunities in reconstruction of the 3D driving scene. An efficient and compact Stixel World representation is computed that segments the scene into drivable freespace and obstacles on the ground. Based on this data we derive the silhouette of an object in the image and constrain its pose in space during turning maneuvers. The system has been successfully tested on various real-world scenarios and runs in real-time on VGA images in our demonstration car.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Vehicle Tracking</kwd>
        <kwd>Driver Assistance Systems</kwd>
        <kwd>Dense Stereo Vision</kwd>
        <kwd>Kalman filtering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Detecting and tracking other traffic participants at urban intersections has attracted
special attention in the intelligent vehicle domain due to the large number of accidents still
occurring day-to-day. Monitoring moving objects at such accident hotspots with
stationary cameras, typically from elevated position, has been addressed by many researchers
in the past, e.g. [1,2,3].</p>
      <p>Previous work on vision-based vehicle tracking from a moving platform mainly
concentrates on highway scenarios. However, precise information on the behavior of the
oncoming and cross traffic at intersections provides a fundamental basis for future driver
assistance and safety applications.</p>
      <p>In general, one can distinguish between geometric and feature-based vehicle
tracking approaches. Geometric approaches try to fit a geometric model, e.g. a cuboid [4,5]
or more sophisticated vehicle models [3], to the given sensor data. Such approaches
perform well as long as the model is a sufficient approximation of the real object, and the
data is reliable.</p>
      <p>Feature-based methods, as for example [6,7,8,9], model an object by a set of
characteristic features, e.g. gray value or color statistics, edges, corners, etc. These statistics
can be determined online and are usually more flexible than geometric models. A
drawback of such methods is that, without any geometric constraints, detected objects may
be incomplete, i.e., parts of the object that are not covered by a feature are missing, or
features belonging two different physical objects are merged, i.e., the separation between
two close objects can fail.</p>
      <p>In [9], we have proposed a feature-based vehicle tracking approach that
simultaneously estimates the pose and motion parameters of a rigid point cloud, representing
the vehicle’s shape. Object points are detected and grouped on the Gestalt principle of
common fate, i.e., points of common motion are likely to belong to the same object. The
system has been successfully applied to predict the driving path of oncoming vehicles at
country road scenes.</p>
      <p>In this contribution, we will extend this feature-based point cloud model by a
geometric model to overcome two fundamental assumptions made in the original approach.
First, it is not longer required that the vehicle dimension can be reconstructed sufficiently
well from the observed point cloud. Secondly, a good estimate of the vehicle’s center rear
axle is essential to predict the object pose at highly dynamic turn maneuvers. However, it
cannot be observed from motion as long as the vehicle is moving straight or not moving
at all. The geometric model allows for adding direct measurements for the center rear
axle position based on an object’s image silhouette and dense stereo disparity maps.</p>
      <p>To be able to process the large amount of dense stereo data in real-time, an efficient
Stixel World representation, which has been recently proposed by Badino et al [10], is
used. This representation models both the drivable freespace and its boundaries,
corresponding to obstacles over ground.</p>
      <p>In Section 1, we will first briefly summarize our feature-based vehicle tracking
approach and then extend this model in Section 2 by geometric constraints. Section 3 gives
a short overview on the stixel representation, that is used in Section 4 and 5 for an initial
pose refinement and to derive additional filter measurements respectively. Experimental
results on real-world scenes are shown and discussed in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>1. Feature-Based Vehicle Tracking Approach</title>
      <p>The object model consists of a state vector x, including pose and motion parameters of
an observed object, and a rigid 3D point cloud :</p>
      <p>OBJ j= fx; g
(1)</p>
      <p>The main idea is to estimate x based on the 3D displacement of the rigid point
cloud in a stereo image sequence using an extended Kalman filter [11]. Fig. 1(a) gives
an overview on the general system.</p>
      <p>Formally, vehicles are modeled as rigid body, whose pose relative to the ego vehicle
is defined by the transformation of a local object coordinate system with respect to the
ego coordinate system attached to the ego vehicle. The object pose can be fully described
by an (arbitrary) reference point on the object, P ref , representing the object origin, and
the Euler angle , indicating the rotation around the height axis (see Fig. 1(b)).</p>
      <p>Depth from</p>
      <p>Stereo</p>
      <p>Motion from</p>
      <p>Optical Flow
3D displacements
Object
Model</p>
      <p>Motion</p>
      <p>Model
Extended Kalman Filter
Object Pose and Motion
(a)
x
z</p>
      <p>Prot
ψ</p>
      <p>Pref= (X, 0, Z)T</p>
      <p>z
(0, 0, 0)T
(b)</p>
      <p>Ego Vehicle
x</p>
      <p>Incorporating vehicle specific characteristics, it is further assumed that the Z-axis
of the object coordinate system is ideally aligned with the longitudinal axis of the
vehicle, i.e., corresponds to the moving direction. Lateral movements are restricted to
circular path motion based on a simplified bicycle motion model. This motion model is
parametrized by velocity v and acceleration v_ in the moving direction as well as the yaw
rate _ , i.e., the change of orientation. The model further requires the object origin to
be located at the center rear axle of the vehicle. We will denote this characteristic point
as rotation point, P rot, in the following. Its relative position to the reference point is
typically not known at initialization and has to be estimated.</p>
      <p>Pose and motion parameters are summarized in the following state vector:
2</p>
      <p>3T
p{ozse
x = 6 eXref ; eZref ; oXrot; oZrot; ; v; _ ; v_ 7 :
4| 5
} |mo{tzion}
(2)
with reference point eP ref = [ eXref ; 0; eZref ]T in ego coordinates and rotation
point oP rot = [ oXrot; 0; oZrot]T in object coordinates. For simplicity, a planar ground
is assumed, i.e., eYref = oYrot = 0 independent of the lateral and longitudinal
position. This model can be easily extended by incorporating height information on the 3D
geometry of the ground, e.g., using a height varying road model as proposed in [12].
The ego-motion is estimated and compensated, before the filter prediction step, using the
method proposed in [13].</p>
      <p>The object dimension is not explicitly modeled. Instead it is assumed that the
object’s shape is sufficiently represented by a set of 3D points. Each point oPm =
[ oXm oYm oZm]T 2 , 1 m M , has a fixed position within the object coordinate
system and can be observed in terms of an image coordinate hum(t); vm(t)i and stereo
disparity dm(t) at time t. These point measurements build the measurement vector zp
with</p>
      <p>zp(t) = [u1(t); v1(t); d1(t); : : : ; uM (t); vM (t); dM (t)]T:
(3)</p>
      <p>The projection of each point onto the image plane is tracked using a feature tracker,
e.g. the well-known KLT-tracker [14], to be able to reassign measurements of the same
3D point over a sequence of images. The nonlinear measurement model directly follows
from the transformation between object and camera coordinates and the well-known
projection equations of a finite perspective camera.</p>
      <p>Since the exact position of a given object point is typically not known at
initialization, it has to be estimated from the noisy measurements. For real-time applicability, the
problem of motion estimation is separated from the problem of shape reconstruction.
Thus, instead of estimating shape and motion simultaneously by integrating into x,
the point positions are refined outside the Kalman filter.</p>
      <p>
        For each object point oPm, observed several times in terms of oP~m(t), a
maximum likelihood estimation, assuming uncorrelated measurements and zero-mean
Gaussian measurement noise [
        <xref ref-type="bibr" rid="ref1">15</xref>
        ], is given by
      </p>
      <p>3 1
j=tm</p>
      <p>t
X Cm1(j) oP~m(j)
j=tm
(4)
where Cm(t) denotes the 3 3 covariance matrix of oP~m(t), and tm the discrete
time step point oPm has been added to the model. An example for an initial noisy object
point cloud and the same point cloud refined over 10 time steps is given in Figure 2(a).</p>
      <p>Moving objects are detected from the 3D motion field of a number of feature points,
distributed over the whole image (see Figure 2(b)). The motion vectors are estimated
based on the principle of 6D-Vision, i.e., the pointwise fusion of depth and motion by
Kalman filtering [16].</p>
      <p>The vehicle tracking is initialized from a cluster of points moving in the same
direction with equal velocity. The average velocity vector gives the moving direction (initial
orientation) and the initial vehicle speed. Both the reference point and the rotation point
are initialized at the centroid of the initial point cloud.</p>
      <p>h</p>
      <p>Prot
l
w
w
r
e
a
r</p>
      <p>Prot
X
ρ</p>
      <p>Z
f
r
o
n
t
(a) Perspective View
(b) Bird’s Eye View</p>
    </sec>
    <sec id="sec-3">
      <title>2. Geometric Object Model</title>
      <p>We now extend the object model in Eq. 1 by a cuboid D, yielding</p>
      <p>OBJ j= fx; ; Dg
with D = [w; l; h]T, approximating the object dimension in terms of width w, length
l, and height h, independent of the currently observed point cloud.</p>
      <p>This has several advantages. A cuboid covers a certain region in space or on the
image plane that can be used to associate new points to an object. Decoupling the
object dimension from the point cloud is extremely helpful if parts of the point cloud are
occluded or lost by the feature tracker. In addition, restricting the cuboid dimension to
the expected size of road vehicles allows for rejecting points with similar motion pattern
belonging to different, close-by objects in dense traffic scenes.</p>
      <p>The cuboid model is used to integrate a basic semantical meaning on vehicle sides
(front, rear, left, right) or characteristic points, e.g. front left corner or center of the right
side. The objective is to align the sides of the cuboid with the physical sides of the
vehicle. As can be seen in Fig. 3, all points on the cube sides have a fixed position with
respect to a virtual vehicle coordinate system with the origin at the rotation point. The
Z-distance between rotation point and rear side is a given (vehicle specific) constant.
In practice, = 1 m is a good approximation for most road vehicles.</p>
      <p>Accordingly, if the dimension of the object is known correctly, it is straight forward
to describe the rotation point relative to a given object side or corner. This means, all
corners or sides observable by a sensor can be used to constrain the position of the rotation
point.</p>
      <p>Based on this idea, we introduce a second vector zrot of K direct measurements
for the rotation point, with zrot = [ oP~rot;1; : : : ; oP~rot;K ]T, and concatenate this vector
with the point measurement vector in Eq. (3) to the total measurement vector z with
z =
zp
zrot
:
(5)
(6)
(a) Dense disparity image (SGM)
(b) Free space
(c) Stixel representation</p>
      <p>The additional measurements are intended to stabilize the rotation point position
and to prevent that the filter wrongly compensates ambiguities due to error-prone point
positions in the object model by shifting the rotation point outside the object at rotational
movements. This problem will be addressed in detail in the experimental results (see
Section 6). The unknown object dimension is updated outside the Kalman filter by low
pass filtering in this approach, when dimensional measurements are available.</p>
      <p>At this point the extended model is independent of where the actual measurements
for the rotation point or cuboid dimensions come from. In the following, we present an
example realization based on dense stereo data.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Stixel-based Scene Representation</title>
      <p>Real-time implementations of dense stereo algorithms, such as Semi-Global Matching
(SGM) [17], on dedicated hardware [18] provide significantly more information on the
3D environment compared to sparse stereo methods. The gain in information and
precision allows for improved scene reconstruction and object modeling.</p>
      <p>However, more information means there is also more data to process. Thus,
efficient data representations are beneficial to ease further interpretation. We use a very
comprehensive but powerful scene representation that has been recently proposed by
Badino et al. [10] for the intelligent vehicle domain.</p>
      <p>It is based on the fact that traffic scenes typically consist of a relative planar free
space which is limited by 3D obstacles that are nearly perpendicular to the ground. The
so called Stixel World represents the 3D scene by a set of rectangular sticks, named
“stixels”, as shown in Fig. 4(c). Each stixel is defined by its 3D position relative to the
camera and stands vertically on the ground, having a certain height. Each stixel limits
the free space and approximates the object boundaries. As illustrated in Fig. 4, the Stixel
World is created by the following steps:</p>
      <p>First, we compute a dense disparity image using SGM. Fig. 4(a) shows that SGM is
able to model object boundaries precisely. In addition, the smoothness constraint used in
the algorithm leads to smooth estimates in low contrast regions, exemplary seen on the
street and the untextured parts of the vehicles and buildings.</p>
      <p>In the second step, a stochastic occupancy grid is generated from the stereo
disparities using the method presented in [19]. An occupancy grid is a two-dimensional array
which models occupancy evidence of the environment. Only those 3D measurements
lying above the road are registered as obstacles in the occupancy grid. From this grid
the freespace shown in Fig. 4(b) is computed. The used dynamic programming approach
turns out to be highly robust w.r.t. disparity noise.</p>
      <p>Each free space point of the polygon in Fig. 4(b) indicates not only the interruption
the free space but also the base-point of a potential obstacle located at that position.
Following the considered image column upward, the disparity is nearly constant but jumps
to a smaller value above the object. This allows us to compute the upper boundary (i.e.
the height) of the obstacles in a second pass of dynamic programming.</p>
      <p>Given base and top point of an obstacle in a certain image column, all object
disparities are averaged using a robust estimator to determine the distance of this stixel. This
averaging significantly reduces the disparity noise and improves the depth accuracy. In
praxis we use a stixel width of 3-7 pixel, which further improves the results. The final
result is depicted in Fig. 4(c).</p>
      <p>The sketched stixel representation features the following properties:</p>
      <p>Compactness: A significant reduction of the data volume is offered. If, for
example, the width of the stixels is set to 5 pixels, a scene from a VGA image can be
represented by 640/5 = 128 stixels instead of 300:000 disparities.</p>
      <p>Completeness: The geometrical information contained in this representation is
sufficient for many recognition tasks in driver assistance.</p>
      <p>Robustness: Outliers in the data have minimal or no impact on the resulting
representation.</p>
      <p>It is straight forward to cluster groups of stixels to objects based on depth
discontinuities. Throughout this article it is assumed that vehicles at intersections are sufficiently
isolated, i.e., the left and right end of a stixel cluster as well as all stixels in between
correspond to a single object.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Initial Pose Refinement</title>
      <p>The motion-based object detection method initializes objects according to the initial
point cloud and with an expected dimension of typical road vehicles. As can be seen in
Fig. 5(b), the white box, indicating the projection of the initial object hypothesis onto the
image plane, is not accurately aligned with the object boundaries. At the same time, the
corresponding stixel cluster provides a very good segmentation as visualized in Fig. 5(a).
Therefore, the motion based object detection and initialization method is extended by an
initial pose refinement step. The objective is to compute an improved initial vehicle pose
that is consistent with the stixel data and that places the rotation point closer to the actual
center rear axle.</p>
      <p>The idea is as follows: The image columns of the left and right stixel of the
corresponding stixel cluster, denoted as ul and ur respectively, define the viewing range
and constrain the expected vehicle position in lateral direction (see Fig. 5(c)). At the
same time, the disparity values dl and dr of the boundary stixels introduce constraints on
distance.</p>
      <p>Depending on the given object pose hypothesis hxb; Db i, the left and right most stixel
are directly linked to corresponding object corners. The inner stixels cannot be assigned
(a) Stixel Cluster
Pose Hypothesis
1
(b) Refinement Result</p>
      <p>Refined Pose
image plane</p>
      <p>2
ul
ur
4
3
1
2
ul
3</p>
      <p>4
ur
camera
origin
(7)
(8)
(9)
camera
origin</p>
      <p>(c) Geometric Constraints
to a concrete point at the visible object sides that easy, although they also provide
valuable information on depth. Due to perspective, one or two vehicle sides are visible in
the image at one time. Thus, we divide the inner stixels based on an expectation on the
number of image columns to be covered by a visible side. The median disparity dsi over
all stixels assigned to a given object side si, i 2 f1; 2g, is taken as additional depth
constraint on the center of that side. Since the median is more robust to outliers compared to
the mean, inaccuracies in assigning the inner stixels to object sides are acceptable.</p>
      <p>Formally, we can summarize the constraints in a vector c as
Now, fhxb;Db i, with
c = [ul; ur; dl; dr; ds1; ds2]T :</p>
      <p>fhxb;Db i(y) = c;
y =</p>
      <p>oXrot; oZrot; ; (xb) T :
defines the functional model between the constraints and the parameters y to be
refined:</p>
      <p>Here, (x) denotes the size of the object side presumingly covered by more stixels
b
based on the pose prior x, i.e., only one dimension, width or length, is estimated at
b
one time. With an increasing stereo uncertainty at larger distances (e.g. &gt; 40 m for a
0:3 m stereo baseline), the parameter vector is reduced to contain only the rotation point
position, since reliable size measurements cannot be obtained and the object motion gives
a much better estimate on orientation.</p>
      <p>
        The parameters are estimated using a maximum likelihood estimation [
        <xref ref-type="bibr" rid="ref1">15</xref>
        ]. Since
is nonlinear, it has to be linearized at y0 derived from the pose prior. Then, the
pfhaxbra;Dmbieter updates y, with y y0 + y, are computed as
y =
      </p>
      <p>AT
cc1A
1</p>
      <p>AT
cc1 c
fhxb;Db i (y0) ;</p>
      <p>(10)
where the matrix A indicates the Jacobian
, and
cc the covariance
matrix of the constraints.</p>
      <p>This estimation procedure is iterated a few times (typically three iterations are
sufficient). If the updates converge to approximately zero, the initial object pose is updated
by the refined pose. Otherwise it is not changed to prevent a degradation. Fig. 5(b) shows
the result of the refinement step for the poor initialization example discussed before. As
can be seen, the orange box approximates the object much better and the rotation point
has been moved from the centroid of the point cloud towards the rear.</p>
      <p>Note that the refinement approach proposed above assumes an object to be
completely visible in the image. Special handling of partly occlusions is outside the scope of
this article.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Stixel Measurements</title>
      <p>The pose refinement method proposed above, is used not only at initialization, but also to
yield direct measurements of the rotation point during the tracking phase. The
measurement vector zrot (see Sec. 2) is set to zrot = [ oXrot; oZrot]T. The Kalman filter state
prediction is used as pose prior for computing the rotation point measurements.</p>
      <p>The measurement noise can be derived from the estimation procedure, since
y y = AT cc1A 1 gives the covariance of the parameter updates. The final
covariance matrix is given by error propagation.</p>
      <p>The simultaneously estimated size (xb) is used to update the corresponding cuboid
dimension, width or length, by a slow low pass outside the Kalman filter. In addition,
height measurements are easily obtained from the stixel cluster.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Experimental Results</title>
      <p>The proposed system has been tested on various real-world intersections scenes. The
tracking results of an example sequence are superimposed in Fig. 6. The bounding box
indicates the object pose and the carpet on the ground the predicted driving path based on
the current motion state assuming constant yaw rate and constant acceleration. Optical
(a)
(b)
(c)
flow vectors are also visualized. The object is successfully tracked through the maneuver
until it leaves the visual field of the camera.</p>
      <p>The estimated trajectory of this sequence (black solid line) and the corresponding
object poses (green solid boxes) are shown in Fig. 7(a) from a bird’s eye view. Without
additional rotation point measurements the filtering fails in this sequence, indicated by
the second trajectory in this figure (red dashed boxes).</p>
      <p>The Kalman filter minimizes the residual between predicted and measured object
points. If the filter is allowed to change the rotation point without any geometric
constraints, it is possible that the rotation point drifts away from the object point cloud as
occurring in this example (see Fig. 7(b)). A reliable prediction of the turn maneuver is
not possible anymore, thus, the object track is rejected. This demonstrates the importance
of the geometric constraints on the rotation point.</p>
      <p>Further example results are depicted in Fig. 8. The Kalman filter state estimates of
the yaw rate and velocity of the first row sequence are shown in Fig. 9. The velocity
profile shows a typical acceleration behavior at turn maneuvers. The driver first slightly
reduces the velocity and then accelerates after the maximum turn rate is reached.</p>
      <p>The system stably runs at 80 ms cycle time on 640 480 images, including 40
ms for the freespace and stixel computation, 25 ms for feature tracking and ego-motion
computation, as well as 2 3 ms for tracking and pose refinement of a single object on
a current Quad Core processer, without exploitation of massive parallel computing, e.g.,
on the graphics card.
17
Z
itscane1156
D
14
−2
6</p>
      <p>8
0Lateral p2osition X4</p>
      <p>(c)
−5</p>
      <p>Lateral po0sition X
(f)</p>
      <p>5
−3 −2 La−te1ral posi0tion X 1
(i)
2
3</p>
    </sec>
    <sec id="sec-8">
      <title>7. Conclusion</title>
      <p>We have presented a hybrid vehicle tracking approach that combines a feature-based
point cloud model with geometric constraints for application of tracking turning vehicles
at intersections.</p>
      <p>The Stixel World provides a powerful and efficient representation for the precise
localization and segmentation of object boundaries. This information has been used for an
initial pose refinement step and to derive additional measurements that constrain the
position of the rotation point to the object’s lateral center during filtering. The experimental
results have shown a significant improvement of the tracking of oncoming vehicles at
intersections.</p>
      <p>The realization based on dense stereo stixels has demonstrated the practical usage
of the generic measurement model. Alternative image-based methods for extracting the
object silhouette in the image or other range sensors, such as lidar scanners, could be
integrated accordingly.</p>
      <p>Further investigations will include the direct integration of the single stixels into the
object shape model and to address the problems arising at extreme dense traffic scenes
that have been excluded in this contribution.
10</p>
      <p>Fram20e # (Δ T=0.04s)
30
40
50
10</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [15] [16] [17] [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Veeraraghavan</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Papanikolopoulos</surname>
          </string-name>
          , “
          <article-title>Combining multiple tracking modalities for vehicle tracking at traffic intersections,”</article-title>
          <source>in IEEE Conf. on Robotics and Automation</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Atev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Arumugam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Masoud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Janardan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Papanikolopoulos</surname>
          </string-name>
          , “
          <article-title>A vision-based approach to collision prediction at traffic intersections,” Intelligent Transportation Systems</article-title>
          , IEEE Transactions on, vol.
          <volume>6</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>416</fpage>
          -
          <lpage>423</lpage>
          , Dec.
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Ottlik</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Nagel</surname>
          </string-name>
          , “
          <article-title>Initialization of model-based vehicle tracking in video sequences of innercity intersections</article-title>
          ,
          <source>” Int. J. Comput. Vision</source>
          , vol.
          <volume>80</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>211</fpage>
          -
          <lpage>225</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Danescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nedevschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Meinecke</surname>
          </string-name>
          , and T. Graf, “
          <article-title>Stereovision based vehicle tracking in urban traffic environments,” Intelligent Transportation Systems</article-title>
          , IEEE Conference on, pp.
          <fpage>400</fpage>
          -
          <lpage>404</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>Barrois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hristova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Woehler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kummert</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Hermes</surname>
          </string-name>
          , “
          <article-title>3D pose estimation of vehicles using a stereo camera,” in Intelligent Vehicles Symposium</article-title>
          , IEEE,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Beymer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>McLauchlan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Coifman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Malik</surname>
          </string-name>
          , “
          <article-title>A real-time computer vision system for measuring traffic parameters,” in Computer Vision</article-title>
          and Pattern Recognition, San Juan, Puerto Rico,
          <year>1997</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Dang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hoffmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Stiller</surname>
          </string-name>
          , “
          <article-title>Fusing optical flow and stereo disparity for object tracking</article-title>
          ,
          <source>” IEEE 5th International Conference on Intelligent Transportation Systems</source>
          , pp.
          <fpage>112</fpage>
          -
          <lpage>117</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>Leibe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cornelis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cornelis</surname>
          </string-name>
          , and
          <string-name>
            <surname>L. Van Gool</surname>
          </string-name>
          , “
          <article-title>Dynamic 3D scene analysis from a moving vehicle,” in Computer Vision and Pattern Recognition, CVPR</article-title>
          . IEEE Conference on,
          <year>2007</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Barth</surname>
          </string-name>
          and U. Franke, “
          <article-title>Where will the oncoming vehicle be the next second?” in Intelligent Vehicles Symposium</article-title>
          , IEEE,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Badino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Franke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Pfeiffer</surname>
          </string-name>
          , “
          <article-title>The stixel world - a compact medium level representation of the 3D world,” in DAGM Symposium</article-title>
          , Jena, Germany,
          <year>September 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bar-Shalom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. Rong</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Kirubarajan</surname>
          </string-name>
          ,
          <article-title>Estimation with Applications To Tracking and Navigation</article-title>
          . John Wiley &amp; Sons, Inc,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Wedel</surname>
          </string-name>
          , U. Franke,
          <string-name>
            <given-names>H.</given-names>
            <surname>Badino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Cremers</surname>
          </string-name>
          , “
          <article-title>B-spline modeling of road surfaces for freespace estimation</article-title>
          ,”
          <source>in IEEE Intelligent Vehicles Symposium</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>828</fpage>
          -
          <lpage>833</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Badino</surname>
          </string-name>
          , “
          <article-title>A robust approach for ego-motion estimation using a mobile stereo platform,” in 1st Intern</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>Workshop on Complex Motion (IWCM04)</source>
          , Guenzburg, Germany, October 12 - 14
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Rep.</surname>
          </string-name>
          CMU-CS-
          <volume>91</volume>
          -132,
          <year>April 1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>C. McGlone</surname>
          </string-name>
          , Ed.,
          <source>Manual of Photogrammetry</source>
          , 5th ed.
          <source>Amer. Soc. Photogrammetry</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>U.</given-names>
            <surname>Franke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Badino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrig</surname>
          </string-name>
          , “
          <article-title>6D-vision: Fusion of stereo and motion for robust environment perception</article-title>
          ,
          <source>” in 27th DAGM Symposium</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>216</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Hirschmüller</surname>
          </string-name>
          , “
          <article-title>Accurate and efficient stereo processing by semi-global matching and mutual information,” in Computer Vision and Pattern Recognition, CVPR</article-title>
          , vol.
          <volume>2</volume>
          , June 2005, pp.
          <fpage>807</fpage>
          -
          <lpage>814</lpage>
          vol.
          <volume>2</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>F. E. S.</given-names>
            <surname>Gehrig</surname>
          </string-name>
          and T. Meyer, “
          <article-title>A real-time low-power stereo engine using semi-global matching</article-title>
          ,” in International Conference on Computer Vision Systems, ICVS,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Badino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vaudrey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Franke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mester</surname>
          </string-name>
          , “
          <article-title>Stereo-based free space computation in complex traffic scenarios,”</article-title>
          <source>in IEEE Southwest Symposium on Image Analysis and Interpretation</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>