Vehicle Tracking at Urban Intersections Using Dense Stereo Alexander BARTH 1 , David PFEIFFER and Uwe FRANKE Daimler AG, Group Research and Advanced Engineering, Sindelfingen, Germany Abstract. A new approach for vehicle tracking at urban intersections based on stereo-vision is proposed. Objects are represented as rigid 3D point clouds and tracked by means of extended Kalman filtering. In this contribution we combine the advantages of a generic feature-based 3D point cloud model with vehicle specific geometrical and kinematical constraints to estimate the pose and motion state of oncoming vehicles at intersections. Real-time dense stereo disparity maps provide new opportunities in reconstruction of the 3D driving scene. An efficient and com- pact Stixel World representation is computed that segments the scene into drivable freespace and obstacles on the ground. Based on this data we derive the silhouette of an object in the image and constrain its pose in space during turning maneuvers. The system has been successfully tested on various real-world scenarios and runs in real-time on VGA images in our demonstration car. Keywords. Vehicle Tracking, Driver Assistance Systems, Dense Stereo Vision, Kalman filtering Introduction Detecting and tracking other traffic participants at urban intersections has attracted spe- cial attention in the intelligent vehicle domain due to the large number of accidents still occurring day-to-day. Monitoring moving objects at such accident hotspots with station- ary cameras, typically from elevated position, has been addressed by many researchers in the past, e.g. [1,2,3]. Previous work on vision-based vehicle tracking from a moving platform mainly con- centrates on highway scenarios. However, precise information on the behavior of the on- coming and cross traffic at intersections provides a fundamental basis for future driver assistance and safety applications. In general, one can distinguish between geometric and feature-based vehicle track- ing approaches. Geometric approaches try to fit a geometric model, e.g. a cuboid [4,5] or more sophisticated vehicle models [3], to the given sensor data. Such approaches per- form well as long as the model is a sufficient approximation of the real object, and the data is reliable. 1 Corresponding Author: Alexander Barth, Daimler AG, GR/PAP, HPC 050/G024, Sindelfingen, Germany; E-mail: alexander.barth@daimler.com. Feature-based methods, as for example [6,7,8,9], model an object by a set of char- acteristic features, e.g. gray value or color statistics, edges, corners, etc. These statistics can be determined online and are usually more flexible than geometric models. A draw- back of such methods is that, without any geometric constraints, detected objects may be incomplete, i.e., parts of the object that are not covered by a feature are missing, or features belonging two different physical objects are merged, i.e., the separation between two close objects can fail. In [9], we have proposed a feature-based vehicle tracking approach that simulta- neously estimates the pose and motion parameters of a rigid point cloud, representing the vehicle’s shape. Object points are detected and grouped on the Gestalt principle of common fate, i.e., points of common motion are likely to belong to the same object. The system has been successfully applied to predict the driving path of oncoming vehicles at country road scenes. In this contribution, we will extend this feature-based point cloud model by a geo- metric model to overcome two fundamental assumptions made in the original approach. First, it is not longer required that the vehicle dimension can be reconstructed sufficiently well from the observed point cloud. Secondly, a good estimate of the vehicle’s center rear axle is essential to predict the object pose at highly dynamic turn maneuvers. However, it cannot be observed from motion as long as the vehicle is moving straight or not moving at all. The geometric model allows for adding direct measurements for the center rear axle position based on an object’s image silhouette and dense stereo disparity maps. To be able to process the large amount of dense stereo data in real-time, an efficient Stixel World representation, which has been recently proposed by Badino et al [10], is used. This representation models both the drivable freespace and its boundaries, corre- sponding to obstacles over ground. In Section 1, we will first briefly summarize our feature-based vehicle tracking ap- proach and then extend this model in Section 2 by geometric constraints. Section 3 gives a short overview on the stixel representation, that is used in Section 4 and 5 for an initial pose refinement and to derive additional filter measurements respectively. Experimental results on real-world scenes are shown and discussed in Section 6. 1. Feature-Based Vehicle Tracking Approach The object model consists of a state vector x, including pose and motion parameters of an observed object, and a rigid 3D point cloud Θ: OBJ |= {x, Θ} (1) The main idea is to estimate x based on the 3D displacement of the rigid point cloud in a stereo image sequence using an extended Kalman filter [11]. Fig. 1(a) gives an overview on the general system. Formally, vehicles are modeled as rigid body, whose pose relative to the ego vehicle is defined by the transformation of a local object coordinate system with respect to the ego coordinate system attached to the ego vehicle. The object pose can be fully described by an (arbitrary) reference point on the object, P ref , representing the object origin, and the Euler angle ψ, indicating the rotation around the height axis (see Fig. 1(b)). Object Prot Depth from Motion from x Stereo Optical Flow Pref= (X, 0, Z)T ψ 3D displacements z Object Motion Model Model z Extended Kalman Filter x (0, 0, 0)T Object Pose and Motion Ego Vehicle (a) (b) Figure 1. (a) System overview. (b) Object coordinate systems and pose parameters. Incorporating vehicle specific characteristics, it is further assumed that the Z-axis of the object coordinate system is ideally aligned with the longitudinal axis of the ve- hicle, i.e., ψ corresponds to the moving direction. Lateral movements are restricted to circular path motion based on a simplified bicycle motion model. This motion model is parametrized by velocity v and acceleration v̇ in the moving direction as well as the yaw rate ψ̇, i.e., the change of orientation. The model further requires the object origin to be located at the center rear axle of the vehicle. We will denote this characteristic point as rotation point, P rot , in the following. Its relative position to the reference point is typically not known at initialization and has to be estimated. Pose and motion parameters are summarized in the following state vector:  T x =  e Xref , e Zref , o Xrot , o Zrot , ψ , v, ψ̇, v̇  . (2)   | {z } | {z } pose motion with reference point e P ref = [ e Xref , 0, e Zref ]T in ego coordinates and rotation point o P rot = [ o Xrot , 0, o Zrot ]T in object coordinates. For simplicity, a planar ground is assumed, i.e., e Yref = o Yrot = 0 independent of the lateral and longitudinal posi- tion. This model can be easily extended by incorporating height information on the 3D geometry of the ground, e.g., using a height varying road model as proposed in [12]. The ego-motion is estimated and compensated, before the filter prediction step, using the method proposed in [13]. The object dimension is not explicitly modeled. Instead it is assumed that the ob- ject’s shape is sufficiently represented by a set of 3D points. Each point o Pm = T [ o Xm o Ym o Zm ] ∈ Θ, 1 ≤ m ≤ M , has a fixed position within the object coordinate system and can be observed in terms of an image coordinate hum (t), vm (t)i and stereo disparity dm (t) at time t. These point measurements build the measurement vector z p with Initial Point Cloud Updated Point Cloud 1.4 1.2 1 0.8 Y 0.6 0.4 −3 −2 −1 0 1 −1 2 0 3 1 Z X (a) (b) Figure 2. (a) Object point cloud at initialization and after 10 time steps. The initial point cloud, spread over 6 m in longitudinal direction, is successively refined. (b) Example 6D-Vision motion field used for object detection. The color encodes lateral velocity (red fast, green slow). Points at the front of vehicle no. 1 show larger lateral velocities compared to points at the rear, since the vehicle is turning. z p (t) = [u1 (t), v1 (t), d1 (t), . . . , uM (t), vM (t), dM (t)]T . (3) The projection of each point onto the image plane is tracked using a feature tracker, e.g. the well-known KLT-tracker [14], to be able to reassign measurements of the same 3D point over a sequence of images. The nonlinear measurement model directly follows from the transformation between object and camera coordinates and the well-known pro- jection equations of a finite perspective camera. Since the exact position of a given object point is typically not known at initializa- tion, it has to be estimated from the noisy measurements. For real-time applicability, the problem of motion estimation is separated from the problem of shape reconstruction. Thus, instead of estimating shape and motion simultaneously by integrating Θ into x, the point positions are refined outside the Kalman filter. For each object point o Pm , observed several times in terms of o P̃m (t), a maxi- mum likelihood estimation, assuming uncorrelated measurements and zero-mean Gaus- sian measurement noise [15], is given by  −1 t X t X o −1 −1 Pm (t + 1) =  Cm (j) Cm (j) o P̃m (j) (4) j=tm j=tm where Cm (t) denotes the 3 × 3 covariance matrix of o P̃m (t), and tm the discrete time step point o Pm has been added to the model. An example for an initial noisy object point cloud and the same point cloud refined over 10 time steps is given in Figure 2(a). Moving objects are detected from the 3D motion field of a number of feature points, distributed over the whole image (see Figure 2(b)). The motion vectors are estimated based on the principle of 6D-Vision, i.e., the pointwise fusion of depth and motion by Kalman filtering [16]. The vehicle tracking is initialized from a cluster of points moving in the same direc- tion with equal velocity. The average velocity vector gives the moving direction (initial orientation) and the initial vehicle speed. Both the reference point and the rotation point are initialized at the centroid of the initial point cloud. l Prot Z front rear w h X Prot w ρ l (a) Perspective View (b) Bird’s Eye View Figure 3. Geometric box model with the rotation point at the center rear axle located at a constant distance ρ from the vehicle’s rear. The object corners as well as the side centers have a fixed position with respect to the rotation point describable in terms of w, h, l, and ρ. 2. Geometric Object Model We now extend the object model in Eq. 1 by a cuboid D, yielding OBJ |= {x, Θ, D} (5) with D = [w, l, h]T , approximating the object dimension in terms of width w, length l, and height h, independent of the currently observed point cloud. This has several advantages. A cuboid covers a certain region in space or on the image plane that can be used to associate new points to an object. Decoupling the ob- ject dimension from the point cloud is extremely helpful if parts of the point cloud are occluded or lost by the feature tracker. In addition, restricting the cuboid dimension to the expected size of road vehicles allows for rejecting points with similar motion pattern belonging to different, close-by objects in dense traffic scenes. The cuboid model is used to integrate a basic semantical meaning on vehicle sides (front, rear, left, right) or characteristic points, e.g. front left corner or center of the right side. The objective is to align the sides of the cuboid with the physical sides of the vehicle. As can be seen in Fig. 3, all points on the cube sides have a fixed position with respect to a virtual vehicle coordinate system with the origin at the rotation point. The Z-distance ρ between rotation point and rear side is a given (vehicle specific) constant. In practice, ρ = 1 m is a good approximation for most road vehicles. Accordingly, if the dimension of the object is known correctly, it is straight forward to describe the rotation point relative to a given object side or corner. This means, all cor- ners or sides observable by a sensor can be used to constrain the position of the rotation point. Based on this idea, we introduce a second vector z rot of K direct measurements for the rotation point, with z rot = [ o P̃ rot,1 , . . . , o P̃ rot,K ]T , and concatenate this vector with the point measurement vector in Eq. (3) to the total measurement vector z with   zp z= . (6) z rot (a) Dense disparity image (SGM) (b) Free space (c) Stixel representation Figure 4. (a) Dense stereo results overlaid on the image of an urban traffic situation. The colors encode the distance, red means close, green represents far. Note that SGM delivers measurements even for most pixels on the road. (b) Freespace obtained from an evaluation of the stereo occupancy grid. (c) Stixel representation for this situation, the colors encode the lateral distance to the expected driving corridor shown in blue. The additional measurements are intended to stabilize the rotation point position and to prevent that the filter wrongly compensates ambiguities due to error-prone point positions in the object model by shifting the rotation point outside the object at rotational movements. This problem will be addressed in detail in the experimental results (see Section 6). The unknown object dimension is updated outside the Kalman filter by low pass filtering in this approach, when dimensional measurements are available. At this point the extended model is independent of where the actual measurements for the rotation point or cuboid dimensions come from. In the following, we present an example realization based on dense stereo data. 3. Stixel-based Scene Representation Real-time implementations of dense stereo algorithms, such as Semi-Global Matching (SGM) [17], on dedicated hardware [18] provide significantly more information on the 3D environment compared to sparse stereo methods. The gain in information and preci- sion allows for improved scene reconstruction and object modeling. However, more information means there is also more data to process. Thus, effi- cient data representations are beneficial to ease further interpretation. We use a very comprehensive but powerful scene representation that has been recently proposed by Badino et al. [10] for the intelligent vehicle domain. It is based on the fact that traffic scenes typically consist of a relative planar free space which is limited by 3D obstacles that are nearly perpendicular to the ground. The so called Stixel World represents the 3D scene by a set of rectangular sticks, named “stixels”, as shown in Fig. 4(c). Each stixel is defined by its 3D position relative to the camera and stands vertically on the ground, having a certain height. Each stixel limits the free space and approximates the object boundaries. As illustrated in Fig. 4, the Stixel World is created by the following steps: First, we compute a dense disparity image using SGM. Fig. 4(a) shows that SGM is able to model object boundaries precisely. In addition, the smoothness constraint used in the algorithm leads to smooth estimates in low contrast regions, exemplary seen on the street and the untextured parts of the vehicles and buildings. In the second step, a stochastic occupancy grid is generated from the stereo dispar- ities using the method presented in [19]. An occupancy grid is a two-dimensional array which models occupancy evidence of the environment. Only those 3D measurements lying above the road are registered as obstacles in the occupancy grid. From this grid the freespace shown in Fig. 4(b) is computed. The used dynamic programming approach turns out to be highly robust w.r.t. disparity noise. Each free space point of the polygon in Fig. 4(b) indicates not only the interruption the free space but also the base-point of a potential obstacle located at that position. Fol- lowing the considered image column upward, the disparity is nearly constant but jumps to a smaller value above the object. This allows us to compute the upper boundary (i.e. the height) of the obstacles in a second pass of dynamic programming. Given base and top point of an obstacle in a certain image column, all object dispar- ities are averaged using a robust estimator to determine the distance of this stixel. This averaging significantly reduces the disparity noise and improves the depth accuracy. In praxis we use a stixel width of 3-7 pixel, which further improves the results. The final result is depicted in Fig. 4(c). The sketched stixel representation features the following properties: • Compactness: A significant reduction of the data volume is offered. If, for exam- ple, the width of the stixels is set to 5 pixels, a scene from a VGA image can be represented by 640/5 = 128 stixels instead of 300.000 disparities. • Completeness: The geometrical information contained in this representation is sufficient for many recognition tasks in driver assistance. • Robustness: Outliers in the data have minimal or no impact on the resulting rep- resentation. It is straight forward to cluster groups of stixels to objects based on depth disconti- nuities. Throughout this article it is assumed that vehicles at intersections are sufficiently isolated, i.e., the left and right end of a stixel cluster as well as all stixels in between correspond to a single object. 4. Initial Pose Refinement The motion-based object detection method initializes objects according to the initial point cloud and with an expected dimension of typical road vehicles. As can be seen in Fig. 5(b), the white box, indicating the projection of the initial object hypothesis onto the image plane, is not accurately aligned with the object boundaries. At the same time, the corresponding stixel cluster provides a very good segmentation as visualized in Fig. 5(a). Therefore, the motion based object detection and initialization method is extended by an initial pose refinement step. The objective is to compute an improved initial vehicle pose that is consistent with the stixel data and that places the rotation point closer to the actual center rear axle. The idea is as follows: The image columns of the left and right stixel of the cor- responding stixel cluster, denoted as ul and ur respectively, define the viewing range and constrain the expected vehicle position in lateral direction (see Fig. 5(c)). At the same time, the disparity values dl and dr of the boundary stixels introduce constraints on distance. Depending on the given object pose hypothesis hb x, Di, b the left and right most stixel are directly linked to corresponding object corners. The inner stixels cannot be assigned (a) Stixel Cluster (b) Refinement Result Pose Hypothesis Refined Pose 1 4 1 4 3 2 3 2 image plane ul ur ul ur camera camera origin origin (c) Geometric Constraints Figure 5. Six constraints on depth and lateral position are derived from a stixel cluster and assigned to four characteristic object points, that depend on visibility properties of the pose prior. The initial pose is iteratively refined by a maximum likelihood estimation. to a concrete point at the visible object sides that easy, although they also provide valu- able information on depth. Due to perspective, one or two vehicle sides are visible in the image at one time. Thus, we divide the inner stixels based on an expectation on the number of image columns to be covered by a visible side. The median disparity dsi over all stixels assigned to a given object side si , i ∈ {1, 2}, is taken as additional depth con- straint on the center of that side. Since the median is more robust to outliers compared to the mean, inaccuracies in assigning the inner stixels to object sides are acceptable. Formally, we can summarize the constraints in a vector c as T c = [ul , ur , dl , dr , ds1 , ds2 ] . (7) Now, fhb b , with x,Di fhb b (y) = c, x,Di (8) defines the functional model between the constraints and the parameters y to be refined: o T y= Xrot , o Zrot , ψ, Ω(b x) . (9) Here, Ω(b x) denotes the size of the object side presumingly covered by more stixels based on the pose prior x b, i.e., only one dimension, width or length, is estimated at one time. With an increasing stereo uncertainty at larger distances (e.g. > 40 m for a 0.3 m stereo baseline), the parameter vector is reduced to contain only the rotation point position, since reliable size measurements cannot be obtained and the object motion gives a much better estimate on orientation. The parameters are estimated using a maximum likelihood estimation [15]. Since fhb x,Di is nonlinear, it has to be linearized at y 0 derived from the pose prior. Then, the b parameter updates ∆y, with y ≈ y 0 + ∆y, are computed as  −1   ∆y = AT Σ−1 cc A AT −1 Σ cc c − fhb x,Di b (y 0 ) , (10)  ∂f  hx, b Di where the matrix A indicates the Jacobian , and Σcc the covariance c ∂y y0 matrix of the constraints. This estimation procedure is iterated a few times (typically three iterations are suf- ficient). If the updates converge to approximately zero, the initial object pose is updated by the refined pose. Otherwise it is not changed to prevent a degradation. Fig. 5(b) shows the result of the refinement step for the poor initialization example discussed before. As can be seen, the orange box approximates the object much better and the rotation point has been moved from the centroid of the point cloud towards the rear. Note that the refinement approach proposed above assumes an object to be com- pletely visible in the image. Special handling of partly occlusions is outside the scope of this article. 5. Stixel Measurements The pose refinement method proposed above, is used not only at initialization, but also to yield direct measurements of the rotation point during the tracking phase. The measure- ment vector z rot (see Sec. 2) is set to z rot = [ o Xrot , o Zrot ]T . The Kalman filter state prediction is used as pose prior for computing the rotation point measurements. The measurement noise can be derived from the estimation procedure, since  −1 Σ∆y∆y = AT Σ−1 cc A gives the covariance of the parameter updates. The final co- variance matrix is given by error propagation. The simultaneously estimated size Ω(b x) is used to update the corresponding cuboid dimension, width or length, by a slow low pass outside the Kalman filter. In addition, height measurements are easily obtained from the stixel cluster. 6. Experimental Results The proposed system has been tested on various real-world intersections scenes. The tracking results of an example sequence are superimposed in Fig. 6. The bounding box indicates the object pose and the carpet on the ground the predicted driving path based on the current motion state assuming constant yaw rate and constant acceleration. Optical (a) (b) (c) Figure 6. Example tracking results of an intersection scene with the estimated pose and predicted driving path superimposed. A second oncoming object is detected as soon as it starts moving (right image). Rotation point drift 32 4 Initial Object Cuboid 3.5 Unconstraint Rotation Point 30 Constraint Rotation Point Unconstrained 3 28 2.5 26 2 Distance Z 24 1.5 z 22 1 20 0.5 Constrained 18 0 16 −0.5 14 −1 −5 0 5 10 −4 −3 −2 −1 0 1 Lateral position X x (a) Estimated trajectories in ego-coordinates (b) Drift of Prot w.r.t. initial pose Figure 7. Additional rotation point measurements prevent that the Kalman filter minimizes the measurement error of the observed noisy object points by shifting the rotation point to a position outside the object. flow vectors are also visualized. The object is successfully tracked through the maneuver until it leaves the visual field of the camera. The estimated trajectory of this sequence (black solid line) and the corresponding object poses (green solid boxes) are shown in Fig. 7(a) from a bird’s eye view. Without additional rotation point measurements the filtering fails in this sequence, indicated by the second trajectory in this figure (red dashed boxes). The Kalman filter minimizes the residual between predicted and measured object points. If the filter is allowed to change the rotation point without any geometric con- straints, it is possible that the rotation point drifts away from the object point cloud as occurring in this example (see Fig. 7(b)). A reliable prediction of the turn maneuver is not possible anymore, thus, the object track is rejected. This demonstrates the importance of the geometric constraints on the rotation point. Further example results are depicted in Fig. 8. The Kalman filter state estimates of the yaw rate and velocity of the first row sequence are shown in Fig. 9. The velocity profile shows a typical acceleration behavior at turn maneuvers. The driver first slightly reduces the velocity and then accelerates after the maximum turn rate is reached. The system stably runs at 80 ms cycle time on 640 × 480 images, including 40 ms for the freespace and stixel computation, 25 ms for feature tracking and ego-motion computation, as well as 2 − 3 ms for tracking and pose refinement of a single object on a current Quad Core processer, without exploitation of massive parallel computing, e.g., on the graphics card. 31 30 29 28 Distance Z 27 26 25 24 23 22 −4 −2 0 2 4 6 8 Lateral position X (a) (b) (c) 36 34 32 Distance Z 30 28 26 24 −5 0 5 Lateral position X (d) (e) (f) 18 17 Distance Z 16 15 14 −3 −2 −1 0 1 2 3 Lateral position X (g) (h) (i) Figure 8. Example results with bounding box and predicted driving path superimposed (first column), stixel clusters and SGM stereo (second column), and resulting trajectories from bird’s eye view (third column). 7. Conclusion We have presented a hybrid vehicle tracking approach that combines a feature-based point cloud model with geometric constraints for application of tracking turning vehicles at intersections. The Stixel World provides a powerful and efficient representation for the precise lo- calization and segmentation of object boundaries. This information has been used for an initial pose refinement step and to derive additional measurements that constrain the po- sition of the rotation point to the object’s lateral center during filtering. The experimental results have shown a significant improvement of the tracking of oncoming vehicles at intersections. The realization based on dense stereo stixels has demonstrated the practical usage of the generic measurement model. Alternative image-based methods for extracting the object silhouette in the image or other range sensors, such as lidar scanners, could be integrated accordingly. Further investigations will include the direct integration of the single stixels into the object shape model and to address the problems arising at extreme dense traffic scenes that have been excluded in this contribution. 1 10.5 10 0.8 9.5 Absolute Velocity (m/s) Yaw Rate (rad/s) 0.6 9 8.5 0.4 8 0.2 7.5 7 0 6.5 −0.2 6 0 10 20 30 40 50 0 10 20 30 40 50 Frame # (∆ T=0.04s) Frame # (∆ T=0.04s) (a) Estimated Yaw Rate (b) Estimated Velocity Figure 9. Motion state estimation results. The precise yaw rate and velocity estimate of the oncoming traffic provides useful information for future driver assistance systems. References [1] H. Veeraraghavan and N. Papanikolopoulos, “Combining multiple tracking modalities for vehicle track- ing at traffic intersections,” in IEEE Conf. on Robotics and Automation, 2004. [2] S. Atev, H. Arumugam, O. Masoud, R. Janardan, and N. Papanikolopoulos, “A vision-based approach to collision prediction at traffic intersections,” Intelligent Transportation Systems, IEEE Transactions on, vol. 6, no. 4, pp. 416–423, Dec. 2005. [3] A. Ottlik and H. H. Nagel, “Initialization of model-based vehicle tracking in video sequences of inner- city intersections,” Int. J. Comput. Vision, vol. 80, no. 2, pp. 211–225, 2008. [4] R. Danescu, S. Nedevschi, M. Meinecke, and T. Graf, “Stereovision based vehicle tracking in urban traffic environments,” Intelligent Transportation Systems, IEEE Conference on, pp. 400–404, 2007. [5] B. Barrois, S. Hristova, C. Woehler, F. Kummert, and C. Hermes, “3D pose estimation of vehicles using a stereo camera,” in Intelligent Vehicles Symposium, IEEE, 2009. [6] D. Beymer, P. McLauchlan, B. Coifman, and J. Malik, “A real-time computer vision system for measur- ing traffic parameters,” in Computer Vision and Pattern Recognition, San Juan, Puerto Rico, 1997, pp. 495–501. [7] T. Dang, C. Hoffmann, and C. Stiller, “Fusing optical flow and stereo disparity for object tracking,” IEEE 5th International Conference on Intelligent Transportation Systems, pp. 112–117, 2002. [8] B. Leibe, N. Cornelis, K. Cornelis, and L. Van Gool, “Dynamic 3D scene analysis from a moving vehicle,” in Computer Vision and Pattern Recognition, CVPR. IEEE Conference on, 2007, pp. 1–8. [9] A. Barth and U. Franke, “Where will the oncoming vehicle be the next second?” in Intelligent Vehicles Symposium, IEEE, 2008. [10] H. Badino, U. Franke, and D. Pfeiffer, “The stixel world - a compact medium level representation of the 3D world,” in DAGM Symposium, Jena, Germany, September 2009. [11] Y. Bar-Shalom, X. Rong Li, and T. Kirubarajan, Estimation with Applications To Tracking and Naviga- tion. John Wiley & Sons, Inc, 2001. [12] A. Wedel, U. Franke, H. Badino, and D. Cremers, “B-spline modeling of road surfaces for freespace estimation,” in IEEE Intelligent Vehicles Symposium, 2008, pp. 828–833. [13] H. Badino, “A robust approach for ego-motion estimation using a mobile stereo platform,” in 1st Intern. Workshop on Complex Motion (IWCM04), Guenzburg, Germany, October 12 - 14 2004. [14] C. Tomasi and T. Kanade, “Detection and tracking of point features,” Carnegie Mellon University, Tech. Rep. CMU-CS-91-132, April 1991. [15] C. McGlone, Ed., Manual of Photogrammetry, 5th ed. Amer. Soc. Photogrammetry, 2004. [16] U. Franke, C. Rabe, H. Badino, and S. Gehrig, “6D-vision: Fusion of stereo and motion for robust environment perception,” in 27th DAGM Symposium, 2005, pp. 216–223. [17] H. Hirschmüller, “Accurate and efficient stereo processing by semi-global matching and mutual infor- mation,” in Computer Vision and Pattern Recognition, CVPR, vol. 2, June 2005, pp. 807–814 vol. 2. [18] F. E. S. Gehrig and T. Meyer, “A real-time low-power stereo engine using semi-global matching,” in International Conference on Computer Vision Systems, ICVS, 2009. [19] H. Badino, T. Vaudrey, U. Franke, and R. Mester, “Stereo-based free space computation in complex traffic scenarios,” in IEEE Southwest Symposium on Image Analysis and Interpretation, 2008.