=Paper=
{{Paper
|id=Vol-2665/paper24
|storemode=property
|title=Image segmentation based on RGBD data 
|pdfUrl=https://ceur-ws.org/Vol-2665/paper24.pdf
|volume=Vol-2665
|authors=Elena Medvedeva,Elisaveta Varco
}}
==Image segmentation based on RGBD data ==
<pdf width="1500px">https://ceur-ws.org/Vol-2665/paper24.pdf</pdf>
<pre>
             Image Segmentation Based on RGBD Data
                         Elena Medvedeva                                                                               Elisaveta Varco
                  Department of Radio Electronics                                                              Department of Radio Electronics
                  Vyatka State University, VyatSU                                                              Vyatka State University, VyatSU
                           Kirov, Russia                                                                                Kirov, Russia
                          emedv@mail.ru                                                                       varkoelizaveta2011@hotmail.com

   Abstract—The paper proposes a method of image                                            II. IMAGE SEGMENTATION BASED ON RGBD DATA
segmentation based on the joint usage of color and depth data.
The method consists of two stages. The first stage involves RGB                 In the RGB color space, each component is a digital
image segmentation based on contour detection and the                        halftone image. Its pixels are represented by g-bit binary
subsequent filling of closed regions. This procedure is followed             numbers. The D component is also a multi-bit digital
by joint color and depth segmentation. Depth data make it                    image (depth map) where each element corresponds to the
possible to distinguish between pixels with similar brightness               information about the distance from the camera to each
characteristics for different objects and improve the quality of             point of the observed scene.
image segmentation. To reduce computational resources, we
suggest that contours should be detected in high order bit                       There are two ways to perform RGBD data
planes of a digital image using the mathematical model of two-               segmentation. The first stage involves color-based image
dimensional Markov chain. The experimental results prove                     segmentation, and the second stage – segmentation based on
that the proposed method is effective.                                       depth data or vice versa. It is more preferable to use color
                                                                             data at the first stage. This is due to a number of defects on
   Keywords—RGBD segmentation, two-dimensional Markov
chain, contour detection, depth map.                                         the depth map – lost and distorted depth values, uneven and
                                                                             noisy object boundaries, incorrectly measured depth values
                          I. INTRODUCTION                                    for some materials with mirror or fine-grained surfaces, and
    Segmentation is used to solve a number of tasks related to               so on. Therefore, using depth data at the first stage will
detection and recognition of static and dynamic objects in                   significantly distort the object boundaries and break the
video surveillance, autonomous driving, and others.                          object contours at the second one.
    Traditional segmentation methods are mainly focused on                       In this paper, firstly, the RGB image is segmented. To
the use of color or brightness features. According to these                  improve the accuracy of selected boundaries of objects of
methods, the quality of image segmentation depends                           interest, we use the method based on detecting contours with
significantly on the pattern of the scene: smooth or sharp                   subsequent pixel filling in closed image regions. The second
changes in lighting; shadows created by objects; complex                     stage involves joint segmentation of color and depth data.
backgrounds, and etc. Much work has been done in the field                   Depth data make it possible to distinguish pixels with
over the years; however, none of the existing segmentation                   similar brightness or color characteristics for different
techniques is able to obtain satisfactory results based on color             objects and thus to improve the quality of image
data alone.                                                                  segmentation.
    New RGBD sensors, for instance, the Microsoft Kinect,                       Digital halftone images corresponding to color
which provide synchronized depth and color video frames,                     components can be represented by a set of bit binary
have opened up new opportunities to solve the tasks related                  images (BBI). The most informative (detailed) regions are
to object detection and recognition. Unlike RGB data, depth                  highlighted on the high order BBI of the digital halftone
data are considered to be more resistant to changes in lighting
                                                                             image. The low order BBI are binary images in the form of
and dynamic background objects and can be an effective
                                                                             two-dimensional noise. Therefore, we propose to detect the
additional feature for image segmentation.
                                                                             contours of objects of interest in the high order BBI of the
    Fusion of color and depth has become a new research                      digital halftone image. To detect the contours, it is possible
topic in the field of computer vision recently. A number of                  to use the mathematical model based on two-dimensional
papers offer various methods for segmenting RGBD data:
                                                                             Markov chains with two equally probable states M 1  ,
                                                                                                                                                                                     l

methods based on combining background subtraction
algorithms with depth data [1]; methods using convolution                            l 
                                                                             M2                 and          matrices          of     probability            of        horizontal
neural networks [2]; clustering [3]; contour, brightness and
                                                                                                      l             l                                               l          l
                                                                                                     11            12                                                11          12
                                                                                                1             1                                                   2            2
depth [4], and others.                                                               l                                                              l 
                                                                                                                                                            
                                                                             1                                                                2
                                                                                 П                                          , and vertical        П
                                                                                                      l             l                                               l          l
                                                                                                     21             22                                               21          22
                                                                                                1             1                                                   2            2
    However, almost all segmentation methods based on
combining depth and color data are either insufficiently
                                                                             ( l  1, g ) transitions [5, 6].
flexible or require significant computational resources.
Therefore, research in this area is an urgent task.
                                                                                This approach to detecting contours will reduce
   The aim of this paper is to develop a method for image                    computational resources by using 2×2 transition
segmentation based on the joint usage of brightness and                      probability matrices.
depth data which can improve the quality of segmentation
                                                                                                                                    l 
with reduced computational resources.                                           Fig. 1 shows an element  3 of a two-dimensional binary
                                                                             image with a neighborhood of neighboring elements
                                                                              i, j ,k    1 ,  2  .
                                                                                                     (l )    (l )


Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
Image Processing and Earth Remote Sensing


                                                                                                                  2l                                                 property is important when performing the following
                                                                                                                                                                        procedure – filling closed regions with color.
                                                                                                                                                                           To fill closed regions with color, the range of brightness
                                                                                                                                                                        values  Y m in ; Y m a x  for the object is specified. All the
                                                                                                                                                                        elements within the object area are assigned an average
                                                       l                                                          l 
                                                  1                                                             3                                                     brightness value Y ср (or a label with a specified value). To
                                                                                                                                                                        fill the regions with color, the line seed fill algorithm was
Fig. 1. Fragment of the bit plane of the digital halftone image.                                                                                                        chosen [7]. It provides a significant gain in memory and
   In accordance with the mathematical model of a two-                                                                                                                  processing time by storing only one seed element for each
dimensional random Markov process, the amount of                                                                                                                        filled regions. As a result of such image processing, the
                                                                  l                                                                                                   object can be divided into several parts or have inaccurate
information in the  3 element for various combinations of                                                                                                              borders due to uneven illumination, the presence of shadows
neighboring  i , j , k    1( l ) ,  (2l )  elements is determined using                                                                                           or glare. In addition, extraneous objects in the background of
                                                                                                                                                                        the scene can be seen in the image along with the objects of
the formulas [5,6]:                                                                                                                                                     interest. All these factors will influence the quality of
                                                                                                                                1      l  2
                                                                                                                                     ii           ii
                                                                                                                                                         l            solution of the subsequent tasks of image detection,
          I 3         l 
                                 Mi
                                            l 
                                                   1
                                                           l             l 
                                                                   M i , 2
                                                                                   l 
                                                                                           Mi
                                                                                                  l 
                                                                                                              lo g
                                                                                                                                          3      l 
                                                                                                                                               ii
                                                                                                                                                                ; (1)   classification and recognition.

                                                                                                                                      l  2            l 
                                                                                                                                                                           At   the         second stage, a range of data values
                                                                                                                             ii                  ij
                                                                                                                            1


           
         I 3
                       l 
                                 Mi
                                           l 
                                                   1
                                                        l               l 
                                                                   M i , 2
                                                                                  l         l 
                                                                                           M j             lo g                   3
                                                                                                                                           ij
                                                                                                                                                l 
                                                                                                                                                                ;        X m in ; X m a x  is set on the depth map that the object of
                                                                                                                                                                        interest can take, and a mask is formed. Next, the mask is
                                                                                                                                      l  2            l             superimposed on the result of segmentation of the RGB
                                                                                                                             ij                  ii
                                                                                                                            1


           
         I 3
                       l 
                                 Mi
                                           l 
                                                   1
                                                        l               l 
                                                                   M j , 2
                                                                                  l 
                                                                                           Mi
                                                                                              l 
                                                                                                            lo g                   3
                                                                                                                                           ij
                                                                                                                                                l 
                                                                                                                                                                ;       image and the final stage of selecting objects is performed.
                                                                                                                                                                            This procedure allows you to distinguish between objects
                                                                                                                                      l  2            l             that have similar brightness or color characteristics, but
                                                                                                                             ij                  ij
                                                                                                                            1


           
         I 3
                       l 
                                 Mi
                                           l 
                                                   1
                                                        l               l 
                                                                   M j , 2
                                                                                  l         l 
                                                                                           M j               lo g
                                                                                                                                      3
                                                                                                                                           ii
                                                                                                                                                l 
                                                                                                                                                                ,       varied range characteristics, as well as improve the
                                                                                                                                                                        segmentation of objects in uneven lighting, the presence of
where                   r
                             ij
                                l 
                                        i , j  1, 2 ; r  1, 3  are elements of transition                                                                           shadows, etc.

probability matrices in one-dimensional Markov chains with                                                                                                                 Fig. 2 shows a flowchart explaining the algorithm.
two states – 1 Π   (horizontally),                                                                             l 
                                                       l                                                 2
                                                                                                             Π             (vertically), and                                             III. EXPERIMENTAL RESULTS
3       l                 1       l            2           l 
    Π                          Π                     Π              .                                                                                                    The RGBD Object Dataset was used to do research [8].
                                                                                                                                                                        The RGBD dataset contains pairs of sequences of color
   The elements of the transition probability matrices are
                                                                                                                                                                        images and depth maps, as well as segmentation results based
supposed to be known a priori and obtained from a large
                                                                                                                                                                        on depth and color data, using the RANSAC algorithm and
number of samples of real images.
                                                                                                                                                                        an adaptive Gaussian mixture (AGM) model [9]. Each video
    After comparing the calculated amount of information                                                                                                                sequence consists of 199 of size frames. In each image, an
with the threshold, the decision on whether the analyzed                                                                                                                object of interest is only one item.
element belongs to the contour point is made. The threshold
                                                                                                                                                                            Fig. 3 shows examples of segmentation algorithms: (a) –
value is calculated as the average value between the
                                                                                                                                                                        the original RGB image; (b) – reference marking; (c) –
minimum amount of information and the amount of
                                                                                                                                                                        segmentation using the RANSAC algorithm and AGM; (d) –
information when at least one of the neighboring elements
                                                                                                                                                                        segmentation based on brightness data; (e) – the result of
assumes a different state.
                                                                                                                                                                        joint segmentation according to brightness and depth. The
    For an 8-bit digital halftone image represented by 256                                                                                                              brightness segmentation of the image “Apple” is performed
brightness values, it is possible to select all light regions with                                                                                                      using the R component the 8th BBI; segmentation of images
brightness ranging from 128 to 255 in a dark background using                                                                                                           “Banana” and “Scissors” - according to the B-component the
the high order (8th) bit plane, or, conversely, all dark objects in                                                                                                     8th BBI ; segmentation image “Coffee mug” - according to
the background with brightness above 128. To highlight regions                                                                                                          the G-component the 8th BBI .
in less contrasting images with indistinct boundaries, it is
                                                                                                                                                                            The results given (Fig.3d) prove that the segmentation
necessary to detect the contours in the following binary images of
                                                                                                                                                                        algorithm based on contour detection accurately localizes
the 7th or 6th bit of the digital halftone image. In this case, the
                                                                                                                                                                        the boundaries of objects.
contour image will represent the sum of contour images of
several bits.                                                                                                                                                               Additional use of depth data (Fig.3e) makes it possible to
                                                                                                                                                                        improve the quality of segmentation: to remove the selected
    The proposed method of contour detection requires
                                                                                                                                                                        fragments which are close in brightness to the object of
insignificant computational resources which are determined
                                                                                                                                                                        interest, get rid of shadows, etc. In addition, when
by comparison operations with two neighboring elements.
                                                                                                                                                                        comparing the results in Fig. 3e and Fig.3c, it can be seen
As a result, one-pixel closed contour is obtained. This
                                                                                                                                                                        that the developed method allows more accurate selection of
                                                                                                                                                                        objects of interest than the method proposed in [9].


VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                                                                                                              106
Image Processing and Earth Remote Sensing


                                                                          RGB
                                                                                                            Depth map
                                                                         image


                                                                 Image decomposition
                                                               into g-bit binary images

                 8 BBI                      7 BBI


                                                                  Contour detection                  Mask formation based on
                                                                                                       distance threshold


                                                               Filling of image segments
                 6 BBI                      5 BBI
                                                                  based on brightness
                                                                        threshold


                                                                                             +


                                         8 layer-borders


                                                                                  Segmentation result


Fig. 2. Flowchart of RGBD image segmentation algorithm.

   The segmentation process can be performed                                        TABLE I.                ESTIMATION THE RESULTS OF SEGMENTATION
automatically for typical images (or sequences of video                                                           ALGORITHMS.
frames) in which objects of interest have similar
                                                                                                 based               based on brightness based on RANSAC
characteristics in brightness and depth.                                                         on brightness       and depth           and AGS model [9]
                                                                                 Video
   Precision ( P ) and recall ( R ) criteria were used to                        sequences       P      R     E      P    R    E        P    R    E
assess the quality of segmentation, and the error coefficient
                                                                                   Apple         0.93 0.58 0.0086 0.89 0.98 0.0015 0.90 0.97 0.0013
was calculated (E) [10]:
                                                                                  Banana         0.93 0.93 0.0018 0.93 0.93 0.0018 0.80 0.97 0.0027
                                         TP                                       Scissors       0.80 0.91 0.0026 0.82 0.97 0.0021 0.78 0.93 0.0026
                   P r e c is io n                 ,              (2)
                                       TP  FP                                   Coffee mug 0.83 0.63 0.0118 0.98 0.80 0.0030 0.95 0.94 0.0014
                                       TP                                          Comb          0.89 0.40 0.0237 0.91 0.90 0.0035 0.85 0.94 0.0035
                    R e c a ll                 ,                  (3)
                                   TP  FN
                                   FP  FN                                      Joint segmentation has similar values of precision with
                    E                                     ,       (4)       those for brightness segmentation but increases the recall
                          TP  TN  FP  FN
                                                                             score (up to 2.1 times) and reduces the segmentation error
where TP – true positives; TN – true negatives; FP – false                   (up to 5.7 times).
positives; FN – false negatives.
                                                                                                              IV. CONCLUSION
    The precision within the segmented region is the
percentage of pixels which actually belong to the given                          The proposed method of image segmentation based on
region in relation to all the pixels that are assigned to this               the joint usage of color and depth data makes it possible to
region. The recall criterion measures the percentage of all                  accurately select the boundaries of objects of interest and
truly defined pixels which belong to the segmented region in                 effectively distinguish the pixels with similar brightness
relation to all the pixels. The error coefficient E takes all the            characteristics for different objects. Due to the detected
error pixels into account in relation to the total number of                 contours in high order bit planes of the digital image using
pixels.                                                                      the mathematical model of two-dimensional Markov chain, it
                                                                             is possible to reduce the computational resources when
   Reference segmentation images were used to calculate                      implementing the algorithm. The algorithm can be used to
precision, recall and error coefficient.                                     solve a number of tasks related to object detection and
    Table 1 contains the results of assessments of the quality               recognition in video surveillance systems, autonomous
of image segmentation using the developed method and the                     driving, etc.
known method [9]. The assessments were calculated using
individual images and averaged over the entire video
sequence.


VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                                 107
Image Processing and Earth Remote Sensing

                                                                                                                                                     a)


                                                                                                                                                     b)


                                                                                                                                                     c)


                                                                                                                                                     d)


                                                                                                                                                     e)


              Apple                                Banana                             Scissors                             Coffee mug

Fig. 3. Comparison of RGBD data segmentation methods.

                             REFERENCES                                      [5] E.P. Petrov, I.S. Trubin, E.V. Medvedeva and S.M. Smolskiy,
                                                                                 “Mathematical Models of Video-Sequences of Digital Half-Tone
[1] R. Trabelsi, I. Jabri, F. Smach and A. Bouallegue, “Efficient and fast       Images,” Integrated models for information communication systems and
    multi-modal foreground-background segmentation using RGBD data,”             net-works: design and development, IGI Global, pp. 207-241, 2013.
    Pattern Recognition Letters, vol. 97, pp.13-20, 2017.                    [6] E.V. Medvedeva and E.E. Kurbatova, “Image Segmentation Based on
[2] X. Qi, R. Liao, J. Jia, S. Fidler and R. Urtasun, “3D Graph Neural           Two-Dimensional Markov Chains,” Computer Vision in Control
    Networks for RGBD Semantic Segmentation,” IEEE International                 Systems-2. Innovations in Practice, Springer International Publishing
    Conference on Computer Vision, Venice, Italy, pp. 5199-5208, 2017.           Switzerland, pp. 277-295, 2015.
[3] H. Yalic and A. Can, “Automatic Object Segmentation on RGB-D Data        [7] T. Pavlidis, “Algorithms for Graphics and Image Processing,” M.:
    using Surface Normals and Region Similarity,” Proceedings of the 13th        Radio and Communications, 1986.
    Intern. Joint Conf. on Computer Vision, Imaging and Computer             [8] RGBD        Object     Dataset     [Online].     URL:       http://rgbd-
    Graphics Theory and Applications, vol. 4, pp. 379-386, 2018.                 dataset.cs.washington.edu/dataset.html (23.05.2019).
[4] F. Xiang, C. Chen, C. Wang and C.-C. Jay Kuo, “Image segmentation        [9] K. Lai, L. Bo, X. Ren and D. Fox, “A Large-Scale Hierarchical Multi-
    using contour, surface, and depth cues,” IEEE Intern. Conf. on Image         View RGB-D Object Dataset,” IEEE Intern. Conf. on Robotics and
    Processing, 2017.                                                            Automation, 2011.
                                                                             [10] S. Russell and P. Norving, “Artificial Intelligence: A Modern
                                                                                 Approach,” M.: Williams Publishing House, 2006.


VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                              108

</pre>