=Paper=
{{Paper
|id=Vol-2665/paper24
|storemode=property
|title=Image segmentation based on RGBD data
|pdfUrl=https://ceur-ws.org/Vol-2665/paper24.pdf
|volume=Vol-2665
|authors=Elena Medvedeva,Elisaveta Varco
}}
==Image segmentation based on RGBD data ==
Image Segmentation Based on RGBD Data Elena Medvedeva Elisaveta Varco Department of Radio Electronics Department of Radio Electronics Vyatka State University, VyatSU Vyatka State University, VyatSU Kirov, Russia Kirov, Russia emedv@mail.ru varkoelizaveta2011@hotmail.com Abstract—The paper proposes a method of image II. IMAGE SEGMENTATION BASED ON RGBD DATA segmentation based on the joint usage of color and depth data. The method consists of two stages. The first stage involves RGB In the RGB color space, each component is a digital image segmentation based on contour detection and the halftone image. Its pixels are represented by g-bit binary subsequent filling of closed regions. This procedure is followed numbers. The D component is also a multi-bit digital by joint color and depth segmentation. Depth data make it image (depth map) where each element corresponds to the possible to distinguish between pixels with similar brightness information about the distance from the camera to each characteristics for different objects and improve the quality of point of the observed scene. image segmentation. To reduce computational resources, we suggest that contours should be detected in high order bit There are two ways to perform RGBD data planes of a digital image using the mathematical model of two- segmentation. The first stage involves color-based image dimensional Markov chain. The experimental results prove segmentation, and the second stage – segmentation based on that the proposed method is effective. depth data or vice versa. It is more preferable to use color data at the first stage. This is due to a number of defects on Keywords—RGBD segmentation, two-dimensional Markov chain, contour detection, depth map. the depth map – lost and distorted depth values, uneven and noisy object boundaries, incorrectly measured depth values I. INTRODUCTION for some materials with mirror or fine-grained surfaces, and Segmentation is used to solve a number of tasks related to so on. Therefore, using depth data at the first stage will detection and recognition of static and dynamic objects in significantly distort the object boundaries and break the video surveillance, autonomous driving, and others. object contours at the second one. Traditional segmentation methods are mainly focused on In this paper, firstly, the RGB image is segmented. To the use of color or brightness features. According to these improve the accuracy of selected boundaries of objects of methods, the quality of image segmentation depends interest, we use the method based on detecting contours with significantly on the pattern of the scene: smooth or sharp subsequent pixel filling in closed image regions. The second changes in lighting; shadows created by objects; complex stage involves joint segmentation of color and depth data. backgrounds, and etc. Much work has been done in the field Depth data make it possible to distinguish pixels with over the years; however, none of the existing segmentation similar brightness or color characteristics for different techniques is able to obtain satisfactory results based on color objects and thus to improve the quality of image data alone. segmentation. New RGBD sensors, for instance, the Microsoft Kinect, Digital halftone images corresponding to color which provide synchronized depth and color video frames, components can be represented by a set of bit binary have opened up new opportunities to solve the tasks related images (BBI). The most informative (detailed) regions are to object detection and recognition. Unlike RGB data, depth highlighted on the high order BBI of the digital halftone data are considered to be more resistant to changes in lighting image. The low order BBI are binary images in the form of and dynamic background objects and can be an effective two-dimensional noise. Therefore, we propose to detect the additional feature for image segmentation. contours of objects of interest in the high order BBI of the Fusion of color and depth has become a new research digital halftone image. To detect the contours, it is possible topic in the field of computer vision recently. A number of to use the mathematical model based on two-dimensional papers offer various methods for segmenting RGBD data: Markov chains with two equally probable states M 1 , l methods based on combining background subtraction algorithms with depth data [1]; methods using convolution l M2 and matrices of probability of horizontal neural networks [2]; clustering [3]; contour, brightness and l l l l 11 12 11 12 1 1 2 2 depth [4], and others. l l 1 2 П , and vertical П l l l l 21 22 21 22 1 1 2 2 However, almost all segmentation methods based on combining depth and color data are either insufficiently ( l 1, g ) transitions [5, 6]. flexible or require significant computational resources. Therefore, research in this area is an urgent task. This approach to detecting contours will reduce The aim of this paper is to develop a method for image computational resources by using 2×2 transition segmentation based on the joint usage of brightness and probability matrices. depth data which can improve the quality of segmentation l with reduced computational resources. Fig. 1 shows an element 3 of a two-dimensional binary image with a neighborhood of neighboring elements i, j ,k 1 , 2 . (l ) (l ) Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) Image Processing and Earth Remote Sensing 2l property is important when performing the following procedure – filling closed regions with color. To fill closed regions with color, the range of brightness values Y m in ; Y m a x for the object is specified. All the elements within the object area are assigned an average l l 1 3 brightness value Y ср (or a label with a specified value). To fill the regions with color, the line seed fill algorithm was Fig. 1. Fragment of the bit plane of the digital halftone image. chosen [7]. It provides a significant gain in memory and In accordance with the mathematical model of a two- processing time by storing only one seed element for each dimensional random Markov process, the amount of filled regions. As a result of such image processing, the l object can be divided into several parts or have inaccurate information in the 3 element for various combinations of borders due to uneven illumination, the presence of shadows neighboring i , j , k 1( l ) , (2l ) elements is determined using or glare. In addition, extraneous objects in the background of the scene can be seen in the image along with the objects of the formulas [5,6]: interest. All these factors will influence the quality of 1 l 2 ii ii l solution of the subsequent tasks of image detection, I 3 l Mi l 1 l l M i , 2 l Mi l lo g 3 l ii ; (1) classification and recognition. l 2 l At the second stage, a range of data values ii ij 1 I 3 l Mi l 1 l l M i , 2 l l M j lo g 3 ij l ; X m in ; X m a x is set on the depth map that the object of interest can take, and a mask is formed. Next, the mask is l 2 l superimposed on the result of segmentation of the RGB ij ii 1 I 3 l Mi l 1 l l M j , 2 l Mi l lo g 3 ij l ; image and the final stage of selecting objects is performed. This procedure allows you to distinguish between objects l 2 l that have similar brightness or color characteristics, but ij ij 1 I 3 l Mi l 1 l l M j , 2 l l M j lo g 3 ii l , varied range characteristics, as well as improve the segmentation of objects in uneven lighting, the presence of where r ij l i , j 1, 2 ; r 1, 3 are elements of transition shadows, etc. probability matrices in one-dimensional Markov chains with Fig. 2 shows a flowchart explaining the algorithm. two states – 1 Π (horizontally), l l 2 Π (vertically), and III. EXPERIMENTAL RESULTS 3 l 1 l 2 l Π Π Π . The RGBD Object Dataset was used to do research [8]. The RGBD dataset contains pairs of sequences of color The elements of the transition probability matrices are images and depth maps, as well as segmentation results based supposed to be known a priori and obtained from a large on depth and color data, using the RANSAC algorithm and number of samples of real images. an adaptive Gaussian mixture (AGM) model [9]. Each video After comparing the calculated amount of information sequence consists of 199 of size frames. In each image, an with the threshold, the decision on whether the analyzed object of interest is only one item. element belongs to the contour point is made. The threshold Fig. 3 shows examples of segmentation algorithms: (a) – value is calculated as the average value between the the original RGB image; (b) – reference marking; (c) – minimum amount of information and the amount of segmentation using the RANSAC algorithm and AGM; (d) – information when at least one of the neighboring elements segmentation based on brightness data; (e) – the result of assumes a different state. joint segmentation according to brightness and depth. The For an 8-bit digital halftone image represented by 256 brightness segmentation of the image “Apple” is performed brightness values, it is possible to select all light regions with using the R component the 8th BBI; segmentation of images brightness ranging from 128 to 255 in a dark background using “Banana” and “Scissors” - according to the B-component the the high order (8th) bit plane, or, conversely, all dark objects in 8th BBI ; segmentation image “Coffee mug” - according to the background with brightness above 128. To highlight regions the G-component the 8th BBI . in less contrasting images with indistinct boundaries, it is The results given (Fig.3d) prove that the segmentation necessary to detect the contours in the following binary images of algorithm based on contour detection accurately localizes the 7th or 6th bit of the digital halftone image. In this case, the the boundaries of objects. contour image will represent the sum of contour images of several bits. Additional use of depth data (Fig.3e) makes it possible to improve the quality of segmentation: to remove the selected The proposed method of contour detection requires fragments which are close in brightness to the object of insignificant computational resources which are determined interest, get rid of shadows, etc. In addition, when by comparison operations with two neighboring elements. comparing the results in Fig. 3e and Fig.3c, it can be seen As a result, one-pixel closed contour is obtained. This that the developed method allows more accurate selection of objects of interest than the method proposed in [9]. VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 106 Image Processing and Earth Remote Sensing RGB Depth map image Image decomposition into g-bit binary images 8 BBI 7 BBI Contour detection Mask formation based on distance threshold Filling of image segments 6 BBI 5 BBI based on brightness threshold + 8 layer-borders Segmentation result Fig. 2. Flowchart of RGBD image segmentation algorithm. The segmentation process can be performed TABLE I. ESTIMATION THE RESULTS OF SEGMENTATION automatically for typical images (or sequences of video ALGORITHMS. frames) in which objects of interest have similar based based on brightness based on RANSAC characteristics in brightness and depth. on brightness and depth and AGS model [9] Video Precision ( P ) and recall ( R ) criteria were used to sequences P R E P R E P R E assess the quality of segmentation, and the error coefficient Apple 0.93 0.58 0.0086 0.89 0.98 0.0015 0.90 0.97 0.0013 was calculated (E) [10]: Banana 0.93 0.93 0.0018 0.93 0.93 0.0018 0.80 0.97 0.0027 TP Scissors 0.80 0.91 0.0026 0.82 0.97 0.0021 0.78 0.93 0.0026 P r e c is io n , (2) TP FP Coffee mug 0.83 0.63 0.0118 0.98 0.80 0.0030 0.95 0.94 0.0014 TP Comb 0.89 0.40 0.0237 0.91 0.90 0.0035 0.85 0.94 0.0035 R e c a ll , (3) TP FN FP FN Joint segmentation has similar values of precision with E , (4) those for brightness segmentation but increases the recall TP TN FP FN score (up to 2.1 times) and reduces the segmentation error where TP – true positives; TN – true negatives; FP – false (up to 5.7 times). positives; FN – false negatives. IV. CONCLUSION The precision within the segmented region is the percentage of pixels which actually belong to the given The proposed method of image segmentation based on region in relation to all the pixels that are assigned to this the joint usage of color and depth data makes it possible to region. The recall criterion measures the percentage of all accurately select the boundaries of objects of interest and truly defined pixels which belong to the segmented region in effectively distinguish the pixels with similar brightness relation to all the pixels. The error coefficient E takes all the characteristics for different objects. Due to the detected error pixels into account in relation to the total number of contours in high order bit planes of the digital image using pixels. the mathematical model of two-dimensional Markov chain, it is possible to reduce the computational resources when Reference segmentation images were used to calculate implementing the algorithm. The algorithm can be used to precision, recall and error coefficient. solve a number of tasks related to object detection and Table 1 contains the results of assessments of the quality recognition in video surveillance systems, autonomous of image segmentation using the developed method and the driving, etc. known method [9]. The assessments were calculated using individual images and averaged over the entire video sequence. VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 107 Image Processing and Earth Remote Sensing a) b) c) d) e) Apple Banana Scissors Coffee mug Fig. 3. Comparison of RGBD data segmentation methods. REFERENCES [5] E.P. Petrov, I.S. Trubin, E.V. Medvedeva and S.M. Smolskiy, “Mathematical Models of Video-Sequences of Digital Half-Tone [1] R. Trabelsi, I. Jabri, F. Smach and A. Bouallegue, “Efficient and fast Images,” Integrated models for information communication systems and multi-modal foreground-background segmentation using RGBD data,” net-works: design and development, IGI Global, pp. 207-241, 2013. Pattern Recognition Letters, vol. 97, pp.13-20, 2017. [6] E.V. Medvedeva and E.E. Kurbatova, “Image Segmentation Based on [2] X. Qi, R. Liao, J. Jia, S. Fidler and R. Urtasun, “3D Graph Neural Two-Dimensional Markov Chains,” Computer Vision in Control Networks for RGBD Semantic Segmentation,” IEEE International Systems-2. Innovations in Practice, Springer International Publishing Conference on Computer Vision, Venice, Italy, pp. 5199-5208, 2017. Switzerland, pp. 277-295, 2015. [3] H. Yalic and A. Can, “Automatic Object Segmentation on RGB-D Data [7] T. Pavlidis, “Algorithms for Graphics and Image Processing,” M.: using Surface Normals and Region Similarity,” Proceedings of the 13th Radio and Communications, 1986. Intern. Joint Conf. on Computer Vision, Imaging and Computer [8] RGBD Object Dataset [Online]. URL: http://rgbd- Graphics Theory and Applications, vol. 4, pp. 379-386, 2018. dataset.cs.washington.edu/dataset.html (23.05.2019). [4] F. Xiang, C. Chen, C. Wang and C.-C. Jay Kuo, “Image segmentation [9] K. Lai, L. Bo, X. Ren and D. Fox, “A Large-Scale Hierarchical Multi- using contour, surface, and depth cues,” IEEE Intern. Conf. on Image View RGB-D Object Dataset,” IEEE Intern. Conf. on Robotics and Processing, 2017. Automation, 2011. [10] S. Russell and P. Norving, “Artificial Intelligence: A Modern Approach,” M.: Williams Publishing House, 2006. VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 108