<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Self-Tuning Semantic Image Segmentation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergey Milyaev</string-name>
          <email>sergey.milyaev@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Barinova</string-name>
          <email>obarinova@graphics.cs.msu.su</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lomonosov Moscow State University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Voronezh State University</institution>
        </aff>
      </contrib-group>
      <fpage>59</fpage>
      <lpage>66</lpage>
      <abstract>
        <p>In this paper we present a method for nding optimal parameters of graph Laplacian-based semantic segmentation. This method is fully unsupervised and provides parameters individually for each image. In the experiments on Graz dataset the accuracy of segmentation obtained with the parameters provided by our method is very close to the accuracy of segmentation obtained with the parameters chosen on the test set.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Semantic segmentation framework</title>
      <p>Let us denote W : Wij = exp
d (xi; xj ))2
- a weight matrix with Gaussian
kernel. Let gi = Pj wij stand for a sum of W along the i-th row. Let D be
a diagonal matrix with values gi on diagonal. Graph Laplacian is de ned as a
matrix L = W D.</p>
      <p>The methods for image segmentation and matting solve the following energy
function with respect to vector f = (f1; :::; fN ):</p>
      <p>
        In the matrix form (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) takes the following form:
      </p>
      <p>E(f ) =</p>
      <p>X ci (fi
i
yi)2 +</p>
      <p>X wij (fi
i;j</p>
      <p>
        fj )2 :
E(f ) = (f
y)T C (f
y) + f T Lf ;
where C denotes a square diagonal matrix with ci on diagonal and y denotes
an N -dimensional vector of initial likelihood scores yi. This optimization problem
reduces to solving a sparse linear system:
(L + C)f = Cy:
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
with xed 1 = 1. Therefore the parameters of graph Laplacian i; i = 2; :::; l
are the weights of features xk; i = 2; :::; l and the kernel bandwidth . Below we
show that optimal value of is determined by the values of i; i = 2; :::; l.
Choosing the kernel bandwidth with xed . We start by xing the parameters
i; i = 2; :::; l. As shown in [5], if we assume that L provides a good
approximation of Laplace-Belrami operator then the following condition holds:
where lmax = arg maxl=1; ;K yi(l).
      </p>
      <p>
        The object/background segmentation algorithm then consists in: 1)
computing graph Laplacian matrix L; 2) solving the sparse linear system (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ); 3)
thresholding the output. We assume that initial estimates yi and con dences ci
are provided by local models (e.g. appearance model of a speci c category).
      </p>
      <p>
        This framework can be extended to a multi-class segmentation. Let K denote
the number of labels corresponding to object categories. If we solve (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) for each
label l vs all other labels 1; ; l 1; l + 1; ; K and obtain the values yi(l) for
all image pixels; at the end, an i-th image pixel is assigned to the label lmax,
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Self-tuning method</title>
      <p>Suppose that the distance function d is represented as a weighted sum of metrics
di : R R ! R+; i = 1; :::; K:</p>
      <p>K
d(xi; xj )2 = 1 X</p>
      <p>kdk(xi; xj )2 ;
k=1
log X wij ( )
i;j
m=2 log( ) + log</p>
      <p>
        N 2(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )m=2
vol(M )
;
where m is a dimensionality of corresponding manifold M and wij are the
elements of the weight matrix W .
      </p>
      <p>
        Consider the logarithmic plot of log Pi;j wij with respect to log . Figure (
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
shows the plot of log Pij wij with respect to log and log for one image from
GrabCut dataset. According to (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) if the approximation is good then the slope
of this plot should be about the half dimensionality of corresponding manifold.
      </p>
      <p>In the limit ! 1, wij ! 1, so Pij wij ! N 2. On the other hand, as ! 0,
wij ! ij , so Pij wij ! N . These two limiting values set two asymptotes of the
plot and assert that logarithmic plot cannot be linear for all values of .</p>
      <p>
        Therefore in order to get better approximation of Laplace-Beltrami operator
with 1; :::; K xed we have to choose the value of from the linear region
of logarithmic plot. We use the point of maximum derivative as the point of
maximum linearity.
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
      </p>
      <p>
        Unsupervised learning of 1; :::; K and . As follows from (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) the slope of the
logarithmic curve near optimal value of has to be close to m=2, where m is
the dimensionality of manifold M . In our case m = 1, therefore the slope of the
logarithmic plot has to be 0:5. If the plot has di erent slope in the linear region,
this indicates that the second term in (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) is large.
      </p>
      <p>In order to nd optimal values of 2; :::; K we solve the following
optimization problem:
( 2(opt); :::; K(opt)) = arg
min
2;:::; K
kS ( 2; :::; K )</p>
      <p>where S ( 2; :::; K ) is the slope of the logarithmic plot in the point of
maximum derivative.</p>
      <p>
        S ( 2; :::; K ) can be estimated numerically. We can compute log Pij wij
for di erent values of and estimate the slope of this function in the point of
maximum derivative. Therefore the optimization problem (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ) can be solved using
standard optimization methods, e.g. Nelder-Mead simplex method.
      </p>
      <p>
        The unsupervised learning method for graph Laplacian therefore has two
steps:
2; :::; K by solving optimization problem (
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
with 2; :::; K as the point of maximum derivative of the logarithmic
Implementation details For the experiments in this work we use the distance
function from [3]:
d~2(xi; xj ) = kri
      </p>
      <p>2
rj k
2
r
+ kxi
2
g</p>
      <p>2
xj k ;
where r encodes mean RGB color in the superpixel, x encodes coordinates of
the center of the superpixel, r &gt; 0 and g &gt; 0 are the parameters of the method.
The meaning r &gt; 0 and g &gt; 0 is the scale of chromatic neighbourhoods and
the scale of geometric neighbourhoods respectively.</p>
      <p>
        This distance function (7) can be rewritten in the form of (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) as follows:
d~2(xi; xj ) =
1
kri
rj k2 +
kxi
xj k
2 ;
where = 0:5 r2 and = r2= g2. Therefore, the distance function has two
parameters and .
      </p>
      <p>
        In the second step of the learning method (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) we use the sigmoid t of the
logarithmic plot. The shape of logarithmic plot can be approximated with a
      </p>
      <p>
        A
sigmoid function:T (x) = B+exp(Cx+D) + E. Since the asymptotes of the sigmoid
are set by (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) and the slope in the linear region of the sigmoid should be 0:5
the sigmoid has only one free parameter that controls the shift of the sigmoid
along horizontal axis. Figure (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) illustrates the choice of according to sigmoid
approximation.
      </p>
      <p>
        In most cases the slope of the logarithmic plot S( ) is monotonic function of
. Monotonicity of S( ) allows using simple bin-search for optimization problem
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        ).
(7)
(8)
(a) input image (b) local model (c) thresholded (b) (d) Laplacian (e) thresholded (d)
In all experiments graph Laplacian operated with superpixels produced by image
over-segmentation methods. Each superpixel was linked with a xed number of
it's nearest neighbours, and the distances to other superpixels were assumed
in nite. For all experiments we used con dences that are a linear function of the
outputs of local appearance models ci = 0:5(1 jpi 0:5j).
      </p>
      <p>Graz dataset 1 contains 1096 images of three classes: "person", "bike" and
"car". In our experiments we solved a separate binary segmentation problem
for each category. To measure the quality of segmentation we used a standard
metric - percent of incorrectly classi ed pixels in the image.</p>
      <p>In our experiments we used an open-source VlBlocks toolbox 2, which
implements the method described in [6]. We chose it for comparison for the following
reasons. First, it allows using di erent local appearance models. The method has
a parameter N meaning number of neighbouring superpixels which features are
used for classi cation of each particular superpixel. So we report performance
metrics for di erent values of N to illustrate the performance of proposed graph
Laplacian framework applied to di erent local models. Second, the toolbox
in1 available at http://www.emt.tugraz.at
2 code available at http://vlblocks.org/index.html
(a) input image
(b) N=0
(c) N=1
(d) N=2
(e) N=3
(f) N=4
cludes implementation of discrete CRF with graph-cut inference, which we use
for comparison. Note, this CRF model uses similar types of features (color and
spatial coordinates of superpixels) to those used in our graph Laplacian.</p>
      <p>In our experiments on GrabCut dataset we used the same over-segmentation
and the same local appearance model based on SVM as [6]. To obtain initial
estimates yi for graph Laplacian framework we scaled SVM outputs to [0; 1]
interval for each image.</p>
      <p>In the rst experiment the parameters and were validated on the GrabCut
dataset. In the second experiment we validated the parameters on the test set.
In the third experiment we used our unsupervised learning method for choosing
the parameters individually for each image. We also compared with Vlblocks
implementation of CRF with graph-cut inference. The strategy for choosing
internal parameters of CRF was the same as in [6].</p>
      <p>Table 1 contains results of the comparison. Our unsupervised learning gives
results comparable to upper bound on performance of graph Laplacian with
cars bNi=ke0 pers cars bNi=ke1 pers cars bNi=ke2 pers cars bNi=ke3 pers cars bNi=ke4 pers
(SOOOGvVuuuraaMrrrlipsssdh.((CGlveauraltaribdnC.tt)uest)tset) 5454503416.....00296 6565607063.....17953 5454569899.....03541 6666550659.....52316 6666687686.....71849 6666638839.....95864 7776710182.....61900 7776600099.....82425 7766700661.....84963 7776720293.....27643 7777712100.....00273 6666759950.....45422 6776780061.....80854 7777722113.....22952 6666648378.....20639
Table 1. Performance on Graz dataset at equal precision and recall rates for "cars",
"bike" and "person" classes. First row: local appearance model (from VlBlocks
toolbox). Second row: result of applying discrete CRF with graph cut inference (from
VlBlocks toolbox). Third row: graph Laplacian with parameters validated on GrabCut
dataset. Fourth row: graph Laplacian with parameters validated on the test set. Fifth
row: graph Laplacian with parameters learnt individually for each image. For each
appearance model used in our experiments (we varied the number of neighboring regions
as in [6]) the best result is shown in bold font. Underlined are the best overall results.</p>
      <p>xed parameters from the second experiment. The value of performance gain
compared to local appearance model di ers for di erent values of parameter N .
The smaller N is the smaller neighborhood is considered by low-level model, and
the more signi cant is the gain in performance attained by both CRF and graph
Laplacian.</p>
      <p>The gain in performance of graph Laplacian is almost uniformly higher than
the performance gain obtained by discrete CRF. Figure 2 shows results
provided by local appearance model (SVM) and corresponding results of using
graph Laplacian with learnt parameters. Figure 3 shows how the results vary
for di erent local models.</p>
      <p>The running time is the following: learning phase takes about 0.2 seconds on
average, solving of linear system 3 takes about 0.02 seconds on average.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We presented a method for tuning internal parameters of graph Laplacian in a
fully unsupervised manner individually for each test image. Proposed method
has a low computational cost and shows better performance compared to
discrete CRF with graph-cut inference. In the future work we plan to use more
complex distance functions and investigate the case then distance function has
more parameters.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Grady</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Random walks for image segmentation</article-title>
          .
          <source>IEEE Trans. on Pattern Analysis and Machine Intelligence</source>
          <volume>28</volume>
          (
          <issue>11</issue>
          ) (
          <year>2006</year>
          )
          <volume>1768</volume>
          {
          <fpage>1783</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Levin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lischinski</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
          </string-name>
          , Y.:
          <article-title>A closed form solution to natural image matting</article-title>
          .
          <source>IEEE Trans. on Pattern Analysis and Machine Intelligence</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Duchenne</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Audibert</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keriven</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponce</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Segonne</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Segmentation by transduction</article-title>
          .
          <source>In: CVPR</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Belkin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niyogi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Laplacian eigenmaps for dimensionality reduction and data representation</article-title>
          .
          <source>Neural Computataion</source>
          <volume>15</volume>
          (
          <year>2003</year>
          )
          <volume>1373</volume>
          {
          <fpage>1396</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Coifman</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shkolnisky</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sigworth</surname>
            ,
            <given-names>F.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Graph laplacian tomography from unknown random projections</article-title>
          .
          <source>IEEE Trans. on Image Processing</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Fulkerson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vedaldi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soatto</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Class segmentation and object localization with superpixel neighborhoods</article-title>
          . In: ICCV. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>