<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multi-spectral Image Recognition Using Solution Trees Based on Similarity</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vladimir B. Berikov</string-name>
          <email>berikov@math.nsc.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Igor A. Pestunov</string-name>
          <email>pestunov@ict.sbras.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman M. Kozinets</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergey A. Rylov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computational Technologies SB RAS</institution>
          ,
          <addr-line>Novosibirsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Novosibirsk State University</institution>
          ,
          <addr-line>Novosibirsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sobolev Institute of Mathematics</institution>
          ,
          <addr-line>Novosibirsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A method for constructing decision trees based on the mutual similarity of objects is proposed. The method allows obtaining complex decision boundaries, which have a clear logical interpretation. The results of the experiments confirm the effectiveness of the method for multispectral image recognition. - automatically determine the most informative features for each classified object and use them for making a decision;</p>
      </abstract>
      <kwd-group>
        <kwd>recognition</kwd>
        <kwd>multispectral image</kwd>
        <kwd>decision tree</kwd>
        <kwd>similarity of observations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>– give one an opportunity to analyze information of different types (i.e., for quantitative and qualitative
characteristics describing objects), in the presence of missed feature values;
– find probabilistic logical rules that reflect cause-and-effect relationships of the phenomenon under study;</p>
      <p>
        – in combination with an ensemble approach (e.g., decision forest, boosting on trees [
        <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
        ]), DT is able to find
sufficiently stable solutions with high generalizing ability.
      </p>
      <p>
        A recent review of existing methods for DT induction is given in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Despite a large number of known
approaches, there is still a need in developing efficient methods with high generalization ability. There are several
possible ways to improve quality. The first approach is to find a criterion that will enhance the predictive ability of
decisions by optimally combining the accuracy and complexity of the tree for the given data [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The second approach
involves the development of more sophisticated techniques for representing the tree (for example, using linear
decision boundaries in the tree nodes) and applying “deeper” algorithms for searching the optimal tree structure [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>A “classical” DT is a tree-like graph, in the nodes of which conditions of two possible types are tested. If X is a
numerical attribute, then the condition “X(a) &lt; b” is examined, where a is an arbitrary object from the statistical
population, X(a) is the value of X for object a, b is some value of the attribute. If X is a categorical attribute, then the
condition “X(a)=b” is checked. Depending on the truth or false of the test, the left or the right sub-node is chosen. The
leaves (terminal nodes) of the tree are associated with the values (class labels) of the predicted feature. The paths
from the root node to leaves represent classification rules. To find an optimal DT, a recursive partition of feature
space is performed.</p>
      <p>This approach has a significant drawback: the partitioning of feature space occurs strictly parallel to the feature
axes (in the case of numerical features), even if the real boundary between classes has a linear shape (Figure 1). To
approximate the boundary, it is necessary to use a more complex tree structure (with many additional nodes) that
often has a negative influence on the efficiency of decisions.</p>
      <p>
        Some works (e.g., [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) propose oblique DT (ODT, also called multivariate DT) with more complicated types of
statements having the form ∑     ( ) +  0 &lt; 0, where the summation is carried out over a subset of numerical
features,  0,  1, … are real-valued coefficients. The coefficients are estimated by optimizing a given quality
functional for the subset of objects in the tree node. A number of algorithms for ODT induction exist:
– Classification and Regression Trees - Linear Combination (CART-LC) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ];
– Simulated Annealing Decision Tree (SADT) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ];
– Linear Machine Decision Trees (LMDT) [9],
– OC1 system [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
      </p>
      <p>–</p>
      <p>
        Based on Support Vector Machine (SVM-ODT) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], etc.
      </p>
      <p>yes
class 0</p>
      <p>Despite the significant improvement of prediction accuracy, this approach also has a number of limitations. First
of all, the found linear boundaries are not-easy to interpret in contrast with simple rules of “classical” univariate DT.
Another limitation is ODT is applicable only for multivariate data and cannot be used for data described with pairwise
similarity matrices.</p>
      <p>
        To overcome the latter difficulty, the work [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] suggests Similarity Forest method in which an ensemble of ODT
is built. Each variant of ODT is defined by randomly chosen pair of data points from different classes; the splitting
boundary is a hyperplane perpendicular to the segment connecting the pair and crossing its midpoint. In the
experiments, the proposed algorithm has demonstrated sufficiently high accuracy in comparison with a number of
other methods, especially in the presence of missed feature values, even if the input information has the form of
multidimensional data. However, the obtained ensemble decision is hard-to-interpret because it includes a large
number of generated trees.
      </p>
      <p>The method proposed in this paper aims to eliminate the above-mentioned drawbacks. We propose to use a more
general type of statements regarding the similarity of observations. The similarity can be calculated using various
metrics in different feature subspaces. This type of decision tree allows one to get more complex decision boundaries,
which at the same time have a clear logical interpretation for the user.</p>
      <p>The developed algorithm was experimentally investigated on model data and multispectral satellite images.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Similarity-based decision tree (SBDT) in pattern recognition problems</title>
      <p>In this work, we consider a pattern recognition problem formulated as follows. Let us denote by  a general
population of objects under consideration, and by point  =  ( ) = ( (1), … ,  ( )) ∈   a feature description of
object  ∈  , where m is feature space F dimensionality. Let  be a set of class labels. We consider a binary
classification problem:  = {−1, +1}, although the results can be extended to a multi-class scenario. Denote by  the
set of feature descriptions of objects from  . Let  ∗:  →  be an objective function with values assigned to the
points of the finite set (training or learning sample)   ⊂  . We need to build a decision function  :  →  which
belongs to a given family;  should approximate y∗ and minimize the estimate of misclassification probability for any
point  ∈  . Let   be another subset of  used for evaluating the performance of the decision function,   ∩
  = ∅. Denote by  =   ∪   , and let d be the size of X and l be the size of   .</p>
      <p>We propose a modification of DT in which instead of standard tests, more general statements of the type “object a
is more similar to the set A than to the set B in feature subspace F’ according to metrics  ” are examined in the
internal nodes. Here A, B are subsets of learning sample, typically of small cardinality. In this work, we assume that
each set A, B includes exactly one object (its description is called a support point). We also shall assume that F’=F
and metrics  is the Euclidian metrics.</p>
      <p>Suppose  is a binary tree with t internal nodes, and  = { 1, …   },  = { 1, … ,   } are the sets of support
points from positive and negative classes respectively. For each internal node   of the tree,  = 1, . . ,  , we define the
tested statement as follows: " ∈ M1 ", where M1 are the points from feature space, which are closer to    than to
   (figure 2). Thus, the data is separated linearly (figure 3).</p>
      <p>For any  ∈  we define matrix   with elements:
 ( ,  ) = { 1,</p>
      <p>0,  ℎ
where  is a metric in feature space  .</p>
      <p>( ,   ) −  ( ,   ) &lt; 0</p>
      <p>, i=1,…, p, j=1,…,n,
objects represented by triangles,   = 0.</p>
      <p>Let us transform matrix   into a vector ⃗⃗⃗⃗⃗ of the size   by the reshaping procedure. Then each point  is
described by vector ⃗⃗⃗⃗⃗⃗ of the size   . In this way,  ′ = {⋃ ∈ ⃗⃗⃗⃗⃗ } is a new feature representation of  .</p>
      <p />
      <p>Consider an example of feature transformation based on data shown in figure 2. As the number of all possible
pairs equals one, matrix</p>
      <p>has only one element. For objects represented by circles we have   = 1, and for</p>
    </sec>
    <sec id="sec-3">
      <title>Support points selection</title>
      <p>The proposed form of DT splits data points based on their relative position. The support points selection method
should consider their informativity for a given sample  . In this work, three methods of support points selection were
implemented.</p>
      <p>
        The first approach uses Relief feature selection algorithm introduced in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Let  + and  − be subsamples of
of +1 and –1 class. After generating vectors ⃗⃗⃗⃗⃗⃗ for all  ∈  , applying Relief allows extracting the most

informative sets 
⊂  +,
      </p>
      <p>⊂  − reducing feature dimension.</p>
      <p>
        The second way is based on Support vector machine (SVM) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. SVM builds a separating hyperplane with
maximum distance (margin width) between points of different classes. Data points which are placed on the border of
the margin are called support vectors. When SVM is trained on  
vectors. In our approach, support vectors compose the sets of support points  and  .
, this set is divided into support and non-support
      </p>
      <p>The last method is based on k-means clustering algorithm. We generate  subsamples of the size  from  and
apply k-means (k=2) to each subsample to extract cluster’s medoids which are considered as support points.</p>
      <p>In addition, we used kernel k-means algorithm that uses kernel function instead of a scalar product. Kernel
function implicitly transforms initial feature space into
another space of larger dimensionality, where the
configuration of data points is changed, often resulting in linearly separable form.
3
 
4
5</p>
      <p>When using SVM or k-means based selection methods, we get linear computational complexity, while using
Relief or kernel k-means results in quadratic complexity depending on the sample size.</p>
      <p>The proposed method was experimentally studied on three two-dimensional model datasets. Each dataset consists
of 100 elements belonging to one of two classes. Figure 4 shows the original datasets: «Moons», «Circles» and
«Linear». The parameter values for the CART algorithm were selected by default. Support points for the SBDT
algorithm were selected with k-means algorithm. Figures 5 and 6 show the classification results obtained by CART
and SBDT algorithms respectively. They also indicate classification mistake probability values obtained by the</p>
    </sec>
    <sec id="sec-4">
      <title>Construction of similarity-based decision tree</title>
      <p>o
o
o</p>
      <p>Step 1. Find sets of support points  and  of classes +1, –1.</p>
      <p>Step 2. Compute vector  ⃗⃗  for all  ∈  .</p>
      <p>Step 3. Build a decision tree in new feature space {⋃ ∈  ⃗⃗⃗⃗⃗ } by CART method.</p>
      <p>
        We chose CART algorithm [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as an add-on method used in SBDT construction. The proposed SBDT algorithm
can be represented with the following steps:
      </p>
    </sec>
    <sec id="sec-5">
      <title>Experimental study on model data</title>
      <p>«moving exam» method. These examples demonstrate that the SBDT method provides higher classification accuracy
compared to the CART algorithm. These figures also show that the decision boundaries constructed by the CART
algorithm are coarser.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Experimental study on satellite data</title>
      <p>To compare CART and SBDT algorithms performance on real data, Landsat 8 satellite image of Iskitim city
(Novosibirsk region) was used. The RGB composite of this image is introduced in Figure 7a. Figure 7b shows the
map of this image made by visual-instrumental methods containing representatives of six classes. It was used to train
and to test the classifiers. 10% of the mapped pixels were used for training. As a result, classification accuracy of
CART and SBDT algorithms is 88% and 98% respectively. Figure 7c shows SBDT image classification result.</p>
      <p>Thus, the proposed SBDT decision tree construction method based on the mutual objects similarity provides
higher classification quality compared to the CART method not only on model, but also on real data. The SBDT
(b)
(b)
(a)
(a)
method allows obtaining more accurate decision boundaries, which have a clear logical interpretation. The results of
the experiments confirm the effectiveness of the method for multispectral satellite image classification.
a
b
c</p>
      <p>Acknowledgements. This work was supported by the RFBR (Grant No. 18-04-0063a) and the Interdisciplinary
Integration Research program of the SB RAS for 2018-2020 (project No. 37).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Lbov</surname>
            <given-names>G.S. Logical</given-names>
          </string-name>
          <article-title>Function in the Problems of Empirical Prediction Handbook of statistics</article-title>
          .
          <year>1982</year>
          . Vol.
          <volume>2</volume>
          . Amsterdam: North-Holland Publishing Company. P.
          <volume>479</volume>
          -
          <fpage>492</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Breiman</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olshen</surname>
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stone</surname>
            <given-names>C.J.</given-names>
          </string-name>
          <string-name>
            <surname>Classification</surname>
            and
            <given-names>Regression</given-names>
          </string-name>
          <string-name>
            <surname>Trees</surname>
          </string-name>
          . New York: Routledge.
          <year>1984</year>
          . 368 p.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Breiman</surname>
            <given-names>L</given-names>
          </string-name>
          . Random forests // Machine learning.
          <source>2001</source>
          . Vol.
          <volume>45</volume>
          (
          <issue>1</issue>
          ). P. 5-
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Schapire</surname>
            <given-names>R.E.</given-names>
          </string-name>
          <article-title>The boosting approach to machine learning: An overview // Nonlinear estimation and classification</article-title>
          . Springer.
          <year>2003</year>
          . New York: Springer. P.
          <volume>149</volume>
          -
          <fpage>171</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Kotsiantis</surname>
            <given-names>S.B.</given-names>
          </string-name>
          <article-title>Decision trees: a recent overview</article-title>
          // Artificial Intelligence Review.
          <year>2013</year>
          . Vol.
          <volume>39</volume>
          (
          <issue>4</issue>
          ). P.
          <volume>261</volume>
          -
          <fpage>283</fpage>
          . doi: https://doi.org/10.1007/s10462-011-9272-4.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Berikov</surname>
            <given-names>V.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lbov</surname>
            <given-names>G.S.</given-names>
          </string-name>
          <article-title>Choice of optimal complexity of the class of logical decision functions in pattern recognition problems</article-title>
          // Doklady Mathematics.
          <year>2007</year>
          . Vol.
          <volume>76</volume>
          . N. 3. P.
          <volume>969</volume>
          -
          <fpage>971</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Lbov</surname>
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berikov</surname>
            <given-names>V.B. Recursive</given-names>
          </string-name>
          <article-title>Method of Formation of the Recognition Decision Rule in the Class of Logical Functions // Pattern Recognition</article-title>
          and
          <string-name>
            <given-names>Image</given-names>
            <surname>Analysis</surname>
          </string-name>
          .
          <year>1993</year>
          . Vol.
          <volume>3</volume>
          (
          <issue>4</issue>
          ). P.
          <volume>428</volume>
          -
          <fpage>431</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Bucy</surname>
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diesposti</surname>
            <given-names>R.S.</given-names>
          </string-name>
          <article-title>Decision tree design by simulated annealing // ESAIM: Mathematical Modelling</article-title>
          and
          <string-name>
            <given-names>Numerical</given-names>
            <surname>Analysis</surname>
          </string-name>
          .
          <year>1993</year>
          . Vol.
          <volume>27</volume>
          . N. 5. P.
          <volume>515</volume>
          -
          <fpage>534</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Utgoff</surname>
            <given-names>P.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brodley</surname>
            <given-names>C.E.</given-names>
          </string-name>
          <article-title>An incremental method for finding multivariate splits for decision trees</article-title>
          <source>// Proc. Seventh Int. Conf. on Machine Learning</source>
          .
          <year>1990</year>
          . P.
          <volume>58</volume>
          -
          <fpage>65</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Murthy</surname>
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasif</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salzberg</surname>
            <given-names>S.</given-names>
          </string-name>
          <article-title>A system for induction of oblique decision trees //</article-title>
          <source>Journal of artificial intelligence research</source>
          .
          <year>1994</year>
          . Vol.
          <volume>2</volume>
          . P. 1-
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Menkovski</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christou</surname>
            <given-names>I.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Efremidis</surname>
            <given-names>S.</given-names>
          </string-name>
          <article-title>Oblique decision trees using embedded support vector machines in classifier ensembles //</article-title>
          <source>Proc. 7th IEEE Int. Conf. on Cybernetic Intelligent Systems</source>
          .
          <year>2008</year>
          . P. 1-
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Sathe</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aggarwal C.C.</surname>
          </string-name>
          <article-title>Similarity forests //</article-title>
          <source>Proc. 23rd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining</source>
          .
          <year>2017</year>
          . P.
          <volume>395</volume>
          -
          <fpage>403</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Kira</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rendell L</surname>
          </string-name>
          .
          <article-title>A Practical Approach to Feature Selection //</article-title>
          <source>Proc. Ninth Int. Conf. on Machine Learning</source>
          .
          <year>1992</year>
          . P.
          <volume>249</volume>
          -
          <fpage>256</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Cortes</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vapnik</surname>
            <given-names>V</given-names>
          </string-name>
          .
          <article-title>Support-vector networks // Machine learning</article-title>
          .
          <source>1995</source>
          . Vol.
          <volume>20</volume>
          (
          <issue>3</issue>
          ). P.
          <volume>273</volume>
          -
          <fpage>297</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>