Conference & Workshop on Assistive Technologies for People with Vision & Hearing Impairments
                                                                Assistive Technology for All Ages
                                                                     CVHI 2007, M.A. Hersh (ed.)


      TERRAIN ANALYSIS FOR BLIND WHEELCHAIR USERS: COMPUTER VISION
       ALGORITHMS FOR FINDING CURBS AND OTHER NEGATIVE OBSTACLES

                                  James Coughlan and Huiying Shen

                             The Smith-Kettlewell Eye Research Institute
                                         2318 Fillmore St.
                                     San Francisco, CA 94115
                          Phone: 415-345-2146,     Email: coughlan@ski.org


Abstract: We are developing computer vision algorithms for sensing important terrain features as an
aid to wheelchair navigation, which interpret visual information obtained from images collected by
video cameras mounted to the wheelchair. This paper focuses specifically on a novel computer vision
algorithm for detecting curbs and other negative obstacles (i.e. anything below the level of the ground,
such as holes and drop-offs), which are important and ubiquitous features on and near sidewalks and
other walkways. The algorithm we develop extracts as much information as possible from depth
information obtained from stereo video cameras (i.e. pairs of cameras mounted close together); other
information (e.g. monocular cues such as intensity edges) will be incorporated in the future. We
demonstrate experimental results on typical sidewalk scenes.


Keywords: blind, visually impaired, wheelchair, assistive technology, computer vision, curbs, obstacle
detection


1. Introduction

Approximately one in ten blind persons uses a wheelchair, and independent travel is currently next to
impossible for this population. This paper describes computer vision algorithms for detecting curbs
and other negative obstacles, which are important and ubiquitous features of sidewalks and other
walkways that are especially difficult for blind travelers to find. The algorithms are intended to analyze
images acquired from a stereo video camera (see Figure 1a,b) mounted to a wheelchair using a
portable computer (also carried on the wheelchair); the information they provide will be communicated
to the traveler using synthesized speech, audible tones and/or tactile feedback, and is meant to
augment rather than replace the information from existing wayfinding skills. Detecting negative
obstacles is challenging since the depth cues signalling negative obstacles are noisy. Our detection
algorithm addresses this problem by filtering the depth information in a way that greatly reduces the
noise while preserving important information.


2. State-Of-The-Art and Related Technology

The specific problems of visually impaired wheelchair riders have received little study (Greenbaum et
al, 1998). Indeed, the only commercial device targeted at this population is a version of the laser cane
by Nurion Inc., mounted on the arm of a wheelchair (Gill, 2000). The laser's fixed pencil beam
drastically limits its "field of view," while four added ultrasonic sensors detect only large, tall obstacles
within one foot. Some technology developed in robotics and autonomous vehicle navigation research
may eventually be useful in the design of navigation aids for wheelchairs, but has limitations that
                                                                                      J. Coughlan & H. Shen


prevent it from being adopted in the near future. For instance, 3-D sensing for environmental mapping
in robotics is performed using a single or double axis lidar (similar to radar but using laser light rather
than radio waves). Although lidars produce very accurate distance measurements, they are still
expensive and bulky, which has prevented their widespread use.

Computer vision, the design of software to interpret visual information obtained from cameras, is a
promising technology that overcomes many of the limitations inherent in the above modalities: it relies
only on relatively inexpensive and compact hardware (digital cameras and a computer), and can
sense obstacles within a wide field of view (and at distances of up to several meters or more).


3. Computer Vision Algorithms for Finding Obstacles

We begin this section with a brief overview of stereo vision (stereopsis); see Forsyth and Ponce 2002
for a comprehensive introduction. Stereo vision is a powerful computer vision method for recovering 3-
D scene structure, which works by comparing the differences between images taken by two cameras
placed a short distance apart (like human eyes) to estimate depth (see Figure 1a). The fixed
geometric relationship between the cameras simplifies depth estimation, making it a fast and relatively
robust calculation. For our application, depth estimation is used to determine the ground plane, i.e.
plane that the wheelchair is rolling on, and to locate obstacles, which are points in the scene that lie
significantly above or below the ground plane.


3.1 Overview of Stereo Vision

Stereo vision exploits the fact that a single point in a scene appears in slightly different locations in
neighboring views of the scene. If the views are from two suitably aligned and calibrated cameras,
with parallel lines of sight, a feature in the left image is horizontally shifted relative to the feature in the
right image. This image shift, called the disparity, is directly related to the distance from the camera to
the point in the scene: distant points have small disparity, while nearby points have large disparity.
The following thought experiment illustrates this fact. If you alternately open and close your left and
right eyes while looking at the sky, distant objects (such as stars or the sun) will appear in the same
place in both eyes while nearby objects (such as your hand pointing towards the distant object) will
appear in different locations in each eye.


Figure 1 (a) Stereo video camera. (b) Left image. (c) Right image. (d) Disparity map d(x,y) produced
by stereo algorithm: brighter green means higher disparity and thus closer to camera; black pixels
indicate no estimate at that location. The disparity map clearly shows that the person in the images is
closer to the camera than the walls behind him.

Stereo algorithms determine the correspondences between points in the left and right images, thereby
establishing the disparity d(x,y) everywhere in the image (see Figure 1b,c,d for an example). We note
that the process of finding correspondences is a challenging problem that causes much of the noise
and errors in stereo vision algorithms (and is an active area of research in computer vision, see
Scharstein and Szeliski 2002). Geometric triangulation then yields an equation calculating the distance
in terms of the disparity: the depth of any point in the scene is inversely proportional to its disparity.

A fundamental (albeit somewhat non-intuitive) result is that the disparity map corresponding to a
planar surface in a scene is itself a planar (i.e. linear) function of image coordinates x,y. (We omit the
derivation, which is straightforward, see Forsyth and Ponce 2002.) We will exploit this relationship to
locate the dominant plane, or ground plane, in the image.


                                                                                                              2
                                                                                    J. Coughlan & H. Shen


3.2 Past Applications of Computer Vision Stereo for Finding Obstacles

A variety of work has been done on computer vision algorithms for finding negative obstacles using
stereo depth information, primarily in the context of autonomous vehicles (Belluta et al 2000,
Labayrade et al 2002). Our work focuses on a different domain, in which the features of interest are
difficult to detect from depth information alone, because the depth changes that characterize these
obstacles may be small relative to the distances at which they are viewed and may be swamped by
noise in the depth information estimated from computer vision stereo algorithms. (For instance, a curb
is approximately 10-15 cm high but may be viewed at distances of several meters.) In contrast with
recent work by Lu and Manduchi (2005), which combines depth information and monocular intensity
information to infer the locations of depth discontinuities that signal the presence of curbs, we focus on
extracting as much information as possible from depth information alone.


3.3 Proposed Algorithm for Finding Obstacles

We propose an algorithm for finding the depth discontinuities that signal the presence of obstacles
which makes very little use of monocular cues. The key to our approach is that we smooth the noisy
disparity map obtained by the stereo camera in such a way that the important discontinuities are
preserved, while eliminating a lot of the noise. Our goal is not to argue that monocular cues should be
avoided, but rather to demonstrate a novel technique for making the disparity information more
reliable. Indeed, in future work we envision developing a hybrid approach that augments the depth
information with monocular cues such as intensity edges.


Figure 2 Disparity map of a street scene (right image of the scene shown in Fig. 2a), with the stereo
camera pointed towards a sidewalk curb. In Fig. 2b the disparity map is rendered in 2-D in a false
color scale: dark blue indicates no disparity estimate, and disparities increase from blue to red. In Fig.
2c the disparity map is rendered as a 3-D plot (with the same colors as in Fig. 2a and with height
proportional to disparity, and rotated for ease of viewing). Notice that the overall shape of the disparity
map is planar, corresponding to the ground plane, which is visible despite the high level of noise.

The main steps of our algorithm are as follows. The first – and most important – step is to apply a
median filter (Forsyth and Ponce 2002) to the raw disparity map: Fig.’s 2 and 3 show the disparity map
before and after filtering, respectively. This filter operates as follows: at each pixel, the median value of
the disparities within a square neighborhood centered about the pixel (we chose a neighborhood size
of 21 x 21 to apply to our 240 x 320 disparity images) is computed, and the filtered value at that pixel
is then set to the median. An important property of the median filter is that it smooths out small spikes
in the disparity map (visible as blue dots in Fig. 2b), while preserving important structures in the
disparity map – especially step-like edges that signal discontinuities such as the transition from a
sidewalk to the street.


                                                                                                           3
                                                                                 J. Coughlan & H. Shen


Figure 3 Disparity map from Fig. 2, after smoothing by a median filter. Note that the noise is reduced
significantly, but important discontinuities are preserved. Dark blue corresponds to points for which no
disparity estimate is available.

The next step is to find edges in the filtered disparity map, i.e. discontinuities. These are determined
by estimating the magnitude of the spatial gradient (i.e. the magnitude of the vector whose
components are the partial derivatives of the disparity with respect to x and y). The edge map
corresponding to the filter disparity map in Fig. 3 is shown in Fig. 4a. Notice that many of these edges
correspond to depth discontinuities in the scene.


Figure 4 Edge map (a), with darkness proportional to the magnitude of the disparity gradient, and
deviation from ground plane (b).

The ground plane is then determined by finding the plane that best fits as much of the disparity map
as possible (allowing for the many pixels that do not lie on this plane). Once the ground plane has
been determined, the difference between any pixel’s disparity and the disparity of the ground plane at
that pixel reflects how close the point is to the ground plane (in 3-D). This difference is shown in Fig.
4b.

The final stage of the algorithm is to apply a series of tests at each pixel to decide whether the pixel
belongs to a significant depth discontinuity near the ground plane, such as a curb. First, the magnitude
of the edge strength at that pixel must be above a minimum threshold; furthermore, to avoid spurious
edges, the edge must not be a neighbour of a pixel without any disparity estimate (which would make
the gradient difficult to estimate accurately). (Such spurious edges are common near the border of the
image.) Second, the pixel must lie sufficiently close to the ground plane. Third, the pixel must be
sufficiently close to the camera, both because it is hard to reliably estimate depth discontinuities at a
distance, and because nearby depth discontinuities are more important than distant ones for our
wheelchair application. Fourth, to eliminate spurious edges due to saturation of the image intensity (for
instance, objects that are very bright, or at least much brighter than other objects in the image), which
creates substantial noise in the disparity map, we eliminate from consideration any pixels whose
image intensity has a saturated value (i.e. 255 for a standard 8-bit camera).

The results of this algorithm are shown in the next section.


                                                                                                       4
                                                                                  J. Coughlan & H. Shen


3.4 Experimental Results

We demonstrate our algorithm on four images of sidewalks (Fig. 5). The results provide evidence that
the algorithm is able to detect important depth discontinuities near the ground plane – those which are
most likely to correspond to curbs and other nearby obstacles. Two of these images show curb cuts,
places where a curb slopes downward to meet the ground (to provide a ramp allowing wheelchairs
and other wheeled vehicles to move between the sidewalk and the street). In these images, the
portions of the curb that are substantially elevated off the ground are detected as obstacle edges,
while the portion that is low to the ground is not detected. In the other images, some of the borders
between the sidewalk and the dirt patches (where bushes are planted) are detected, as are nearby
cars.


Figure 5 Experimental results: detected obstacle edges in red, superimposed on original image of
scene. Result in lower left corresponds to scene in Fig.’s 2-4.

However, the algorithm in its current form has serious limitations. The result in the lower left of Fig. 5
completely misses the most distant edge of the curb (near the top of the image), because the
discontinuity corresponding to that edge in the disparity map is so faint. Also, only a small number of
edges from positive obstacles such as trees and bushes are detected. Finally, the alignment between
the detected edges and the true edges is imprecise. All of these limitations arise from noise in the
disparity maps; in order to circumvent these limitations and improve the algorithm’s performance, other
cues (such as monocular cues, e.g. intensity edges) should be incorporated.


4. Conclusions

We have devised a simple algorithm for finding obstacle edges using stereo vision. The key feature of
the algorithm is that its smooths the depth information so as to reduce noise while preserving
important information, enabling the algorithm to rely almost exclusively on depth information.
Experimental results on sidewalk images demonstrate the feasibility of our approach, which in future
work will be extended to include monocular image information such as intensity edges. Quantitative
measures of performance, such as the fraction of curbs that are successfully detected, will need to be
assessed quantitatively to guide the development of the algorithms and fully understand their
strengths and weaknesses.


                                                                                                        5
                                                                               J. Coughlan & H. Shen


Ultimately we envision that computer vision algorithms will function as part of a comprehensive system
for wheelchair navigation that integrates multiple sensor modalities (such as ultrasound and laser),
since no one modality is reliable enough to use in isolation.


References

Bellutta, P., R. Manduchi, L. Matthies, K. Owens and A. Rankin (2000). Terrain perception for DEMO
    III, Intelligent Vehicle Symposium 2000.
Forsyth, D. and J. Ponce (2002). Computer Vision: A Modern Approach. Prentice Hall.
Gill, J. (2000). Personal electronic mobility devices. In Information for Professionals Working with
    Visually Disabled People. http://www.tiresias.org
Greenbaum, M.G., S. Fernandes and S.F. Wainapel (1998). Use of a motorized wheelchair in
    conjunction with a guide dog for the legally blind and physically disabled, Arch Phys Med Rehabil.
    vol. 79, no. 2, pp. 216-7.
Labayrade, R., D. Aubert and J.P. Tarel (2002). Real time obstacle detection in stereo vision on non
    flat road geometry through 'V-disparity' representation, Proceedings of IEEE Intelligent Vehicle
    Symposium. Versailles, France, June 18-20, 2002.
Lu, X. and R. Manduchi (2005). Detection and localization of curbs and stairways using stereo vision,
    IEEE International Conference on Robotics and Automation (ICRA '05), Barcelona, April 2005.
Scharstein, D. and R. Szeliski (2002). A taxonomy and evaluation of dense two-frame stereo
    correspondence algorithms. International Journal of Computer Vision, vol. 47, no. 1/2/3, pp. 7-42.


Acknowledgements: We would like to thank Roberto Manduchi for many helpful discussions. The
authors were supported by the National Science Foundation (grant no. IIS0415310).


                                                                                                    6