=Paper= {{Paper |id=Vol-1901/paper39 |storemode=property |title=A learning based feature point detector |pdfUrl=https://ceur-ws.org/Vol-1901/paper39.pdf |volume=Vol-1901 |authors=Alexander Verichev }} ==A learning based feature point detector == https://ceur-ws.org/Vol-1901/paper39.pdf
                                      A learning based feature point detector
                                                                  A. Verichev1
                                  1
                                   Samara National Research University, 34 Moskovskoe Shosse, 443086, Samara, Russia


Abstract

We propose a learning-based image feature points detector. Instead of giving an explicit definition for feature point we apply the methods of
machine learning to infer it inductively using a representative training set. This allows for a flexible tuning of the proposed detector to a
specific problem that is described by a training set of desired responses. To increase feature points' repeatability and robustness to various
image transformations the feature space of the learning algorithm includes raw image moments and image moment invariants. Experiments
demonstrate high flexibility in tuning the detector to a specific task, acceptable repeatability of the feature points and robustness to various
image transformations.

Keywords: image feature points; image feature points detector; image moments; image moment invariants; machine learning



1. Introduction

   Feature point is a piece of information which is relevant to solving a certain application-related computational task. Feature
points find their use in numerous applications such as image stitching, stereo correspondence, locating and tracking of a moving
object, object detection and recognition, and others [1, 2]. The ubiquitous usage of feature points is a direct consequence of their
properties [3]:
 Repeatability: Given two images of the same object or scene, a high percentage of the features detected on the scene visible
  in both images should be found in both images.
 Informativeness: The intensity patterns underlying the detected features should show a lot of variation.
 Locality: The features should be local, so as to reduce the probability of occlusion and to allow simple model approximations
  of the geometric and photometric deformations between two images.
 Quantity: The number of detected features should be sufficiently large, such that a reasonable number of features are detected
  even on small objects.
 Accuracy: The detected features should be accurately localized.
 Efficiency: The detection of features in a new image should allow for time-critical applications.
   Algorithms and methods that detect image feature points by making local decisions are called feature points detector. An
abundance of image feature points detectors is known, most of which are based on a certain criterion - a heuristics that implicitly
defines what a term feature point constitutes. Generally these heuristics can be classified into three categories [4]:
 Gradient-based: A majority of image feature points detectors is based on computation of gradients of intensity function, for
  example Förstner [5], Harris [6], Shi-Tomasi [7].
 Template-based: Feature points are found by comparing the intensity of surrounding pixels with that of center pixels which is
  governed by some template. The well-known template-based detectors are SUSAN [8], FAST [9], AGAST [10].
 Contour-based: A feature point is defined as the intersecting point of two adjacent edge lines, examples are DoG-curve [11],
  ANDD [12].
   However, formulating a heuristics for an image feature points detector requires a well-formed application-dependent
definition of the term feature point, which in turn requires some level of expertise in the application domain. Moreover, a strictly
stated criterion, although sharpening performance, diminishes its flexibility to adjust to a particular problem, which renders all
the possible usages outside the destined application moot.
   The goal of this work is to dispense with defining the term feature point altogether and focus on the properties we wish the
feature points to possess. With that goal in mind we resort to machine learning methods. Image raw moments and image
moment invariants are used along with some other local characteristics of image points to form a feature space of a learning
algorithm. The detector is trained to solve a specific problem on a relevant and carefully collected training set. This effectively
defines the term feature point implicitly, since it's inductively inferred from the training examples.
   The proposed method is described in full detail in section 2, along with the learning algorithm, its feature space and the
procedures for collecting training and test sets. Evaluation criteria of a trained detector's performance and the results of
experimental evaluation are described in section 3. We conclude with a discussion of these results.

2. Proposed method

   The proposed learning-based feature points detector is based on the idea of transforming detection task into a classification
task as suggested in [13], which boils down to training the detector's classifier on a set of the desired responses.




3rd International conference “Information Technology and Nanotechnology 2017”                                                              247
                                    Image Processing, Geoinformation Technology and Information Security / A. Verichev
2.1. Feature space

   The first step towards constructing our detector is to define the classifier's feature space, which is an ℝ15 vector space. Each
pixel of an image 𝐼[𝑥, 𝑦] is mapped to a certain vector in this feature space using a locally defined operator 𝑃9×9 → ℝ15 , where
𝑃 = {𝑛: 0 ≤ 𝑛 < 256} is a set of intensities of a grayscale image. The features of the feature space are described below.
   The first two features are standard deviation of a standardized local area, 𝜙1 , and standard deviation divided by the norm of
the local area, 𝜙2 :

               1              1
         𝜙1 = √ ∑4𝑖=−4 ∑4𝑗=−4 2 (𝐼[𝑥 + 𝑖, 𝑦 + 𝑗] − 𝐼 )̅ 2 ,
                         80           𝑛
                                                                                                                               (1)
                    𝜙1
         𝜙2 =            ,
                     𝑛


where norm 𝑛 and local mean 𝐼 ̅ are defined:

                1
         𝐼̅ =        ∑4𝑖=−4 ∑4𝑗=−4 𝐼[𝑥 + 𝑖, 𝑦 + 𝑗],
                81



         𝑛 = √∑4𝑖=−4 ∑4𝑗=−4(𝐼[𝑥 + 𝑖, 𝑦 + 𝑗])2 .


The use of these features is motivated by their sensitivity to monotonous and textured areas.
  The next four features are chosen to be central image moments of a local image area: 𝜙𝑡+3 = 𝜇𝑡𝑡 , 0 ≤ 𝑡 ≤ 3. The central
moments are defined [14]:

                                           1
         𝜇𝑖𝑗 = ∑4𝑘=−4 ∑4𝑙=−4 𝑘 𝑖 ∙ 𝑙 𝑗 ∙        𝐼[𝑥 + 𝑘, 𝑦 + 𝑙].                                                               (2)
                                           81


   To induce invariance to rotation transformations the following Hu invariant image moments and Flusser moments are
used [15, 16]:

         𝜙7 = 𝜇20 + 𝜇02 ,

         𝜙8 = (𝜇20 − 𝜇02 )2 + 4𝜇11
                                2
                                   ,

         𝜙9 = (𝜇30 − 3𝜇12 )2 + (3𝜇21 − 𝜇03 )2 ,

         𝜙10 = (𝜇30 + 𝜇12 )2 + (𝜇21 + 𝜇03 )2 ,

         𝜙11 = (𝜇30 − 3𝜇12 )(𝜇30 + 𝜇12 )[(𝜇30 + 𝜇12 )2 − 3(𝜇21 + 𝜇03 )2 ] + (3𝜇21 − 𝜇03 )(𝜇21 + 𝜇03 )[3(𝜇30 + 𝜇12 )2 −         (3)
              (𝜇21 + 𝜇03 )2 ],

         𝜙12 = (𝜇20 − 𝜇02 )[(𝜇30 + 𝜇12 )2 − (𝜇21 + 𝜇03 )2 ] + 4(𝜇30 + 𝜇12 )(𝜇21 + 𝜇03 ),

         𝜙13 = (3𝜇21 − 𝜇03 )(𝜇30 + 𝜇12 )[(𝜇30 + 𝜇12 )2 − 3(𝜇21 + 𝜇03 )2 ] − (𝜇30 − 3𝜇12 )(𝜇21 + 𝜇03 )[3(𝜇30 + 𝜇12 )2 −
              (𝜇21 + 𝜇03 )2 ],

         𝜙14 = 𝜇11 [(𝜇30 + 𝜇12 )2 − (𝜇03 + 𝜇21 )2 ] − (𝜇20 − 𝜇02 )(𝜇30 + 𝜇12 )(𝜇03 + 𝜇21 ).

   Moments calculation is an intensive computational task that requires of a lot of operations. To reduce the number of
arithmetical operations we apply the recursive method of moments calculation based on the use of integer factorial
polynomials [16].
   The last feature that characterizes misalignment of centre of local area and its centre of mass is defined:

        𝜙15 = √(𝑥𝑐 − 𝑥)2 + (𝑦𝑐 − 𝑦)2 ,                                                                                          (4)

where 𝑥𝑐 = 𝜇10 /𝜇00 and 𝑦с = 𝜇01 /𝜇00 .
   The set of the features 𝜙𝑖 , 1 ≤ 𝑖 ≤ 15, defined by (1) - (4), with a usual addition and scalar multiplication operations form
the feature vector space.




3rd International conference “Information Technology and Nanotechnology 2017”                                                  248
                                     Image Processing, Geoinformation Technology and Information Security / A. Verichev
2.2. Tuning the detector

2.2.1. Collecting a training set
   Tuning the detector requires a training set that consists of the desired detector's responses. Depending on the application there
are various ways the set can be obtained:
 manually, involving experts of the domain;
 automatically, using well-known feature points detectors such as Harris or Canny;
 combining the two.
   In case there is a human involvement of any kind it is inevitable for a training set to contain a so called training noise [17].
Besides, in a typical scenario a number of feature points is small compared to the other points. To alleviate these negative effects
the neighbouring points of the feature points can be considered feature points as well.
   Provided an application requires high level of robustness to certain transformations, a training set can be enlarged to contain
the so called virtual examples [18]. To this end every image used to form a training set is transformed according to some
transformation. Since the parameters of that transformation are known, the elements of the original image can be mapped onto
the transformed image, which makes it possible to extract feature vectors of the points of the transformed image that correspond
to the feature points of the original image. These new feature vectors are the virtual examples that convey information about
various effects the transformation have on the feature vectors.

2.2.2. Training a classifier
   With a training set at hand we can pose and solve a supervised learning problem. Since the number of the feature vectors in a
training set is typically quite large we chose to apply nonparametric probability density estimation approach. Let 𝐷 =
{(𝒙𝑖 , 𝑦𝑖 )}𝑁
            𝑖=1 denote training set, where 𝒙𝑖 is a feature vector, 𝑦𝑖 is its label, 𝑦𝑖 ∈ {𝐶1 , 𝐶2 }. 𝐶1 corresponds to feature points and 𝐶2
corresponds to the other points. Then, an estimation of conditional probability density function is defined as follows:

                                             ‖𝒙−𝒙𝑗 ‖
         𝑝̂ (𝒙|𝐶𝑖 ) ∝ ∑𝑁
                       𝑗=1[𝑦𝑗 = 𝐶𝑖 ] 𝐾 (               ),                                                                              (5)
                                                 ℎ


where 𝐾 is a kernel function, ℎ is kernel’s width parameter. By the Bayes’ Theorem:

         𝑝̂ (𝐶𝑖 |𝒙) ∝ 𝑝̂ (𝒙|𝐶𝑖 ) ∙ 𝜋̂𝑖 ,                                                                                               (6)

where 𝜋̂𝑖 is an estimate of prior probability of ith class:

                1
         𝜋̂𝑖 = ∑𝑁
                𝑗=1[𝑦𝑗 = 𝐶𝑖 ].                                                                                                         (7)
                𝑁


   Define a characteristic function of a feature point 𝑙(𝒙):

         𝑙(𝒙) = 𝑙𝑛(𝑝̂ (𝐶1 |𝒙)) − 𝑙𝑛(𝑝̂ (𝐶2 |𝒙)).                                                                                       (8)

   In order to smooth the detector’s response we filter the characteristic function 𝑙(𝒙) using a local peak filter. The peak filter
suppresses non-maximal values in a local 3×3 neighbourhood of the point 𝒙:

                    𝑙(𝒙), 𝑙(𝒙) > 𝑙(𝒈) + 𝛿 ∀𝒈 ∈ 𝑊 ∖ {𝒙}
        𝑙̃(𝒙) = {                                     ,                                                                                (9)
                       0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

where 𝑊 is a set of all feature vectors from the local neighbourhood, 𝛿is some threshold.
  From (8) and (9) we infer the decision rule:

                                             𝜋
                                             ̂
                𝐶 , 𝑙̃(𝒙) > 𝑡 = 𝑙𝑛 ( 2)                                                                                               (10)
        𝑦(𝒙) = { 1                  𝜋
                                    ̂1 .
                𝐶2 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

3. Experimental evaluation

3.1. Experimental setup

   To experimentally evaluate the proposed detector we built a set of images. The set contains a series of 10 overlapping images
of 6 different scenes, 60 images in total. Figure 1 shows three images of one of these scenes. Each of the 6 groups of images was
split in relation 8:2 to form training set 𝐷 and test set 𝐶, respectively. We chose to use Harris [6] corner detector to detect feature
points. The training set was enlarged by the virtual examples as described in section 2.2.1 and the transformations that were
applied are described in section 3.3.

3rd International conference “Information Technology and Nanotechnology 2017”                                                           249
                                     Image Processing, Geoinformation Technology and Information Security / A. Verichev




                                                             Fig. 1. Example images of a scene.


3.2. Evaluation of training accuracy

    Let 𝑉 = {(𝒙𝑖 , 𝑦𝑖 )}𝑁
                        𝑖=1 be training or test set. The primary criterion of detector’s performance on the set V is its accuracy:

                   1
         𝐴(𝑉) = ∑𝑁
                 𝑖=1[𝑦(𝒙𝑖 ) = 𝑦𝑖 ].                                                                                                  (11)
                   𝑁


   Besides the accuracy two more criteria are used: precision 𝑃 and recall 𝑅 [19]. Precision is the fraction of relevant instances
over the retrieved instances, while recall is the fraction of relevant instances among the retrieved ones over the total number of
relevant instances in the set.
   Let 𝐹𝑃, 𝐹𝑁 and 𝑇𝑃 denote false positives, false negatives and true positives, respectively. Then,

         𝐹𝑃(𝑉) = ∑𝑁
                  𝑖=1[𝑦(𝒙𝑖 ) = 𝐶1 ] ∙ [𝑦𝑖 = 𝐶2 ],


         𝐹𝑁(𝑉) = ∑𝑁
                  𝑖=1[𝑦(𝒙𝑖 ) = 𝐶2 ] ∙ [𝑦𝑖 = 𝐶1 ],                                                                                    (12)

         𝑇𝑃(𝑉) = ∑𝑁
                  𝑖=1[𝑦(𝒙𝑖 ) = 𝑦𝑖 ].


    Precision and recall are defined:

                        𝑇𝑃(𝑉)
         𝑃(𝑉) =                  ,
                   𝑇𝑃(𝑉)+𝐹𝑃(𝑉)
                                                                                                                                     (13)
                        𝑇𝑃(𝑉)
         𝑅(𝑉) =                  .
                   𝑇𝑃(𝑉)+𝐹𝑁(𝑉)


   The proposed detector was first trained on the training set. Accuracy, precision and recall were evaluated on the training set
𝐷 and test set 𝐶. The results are shown in table 1. Taking into account a fairly large size of the sets, the data suggests an
adequate quality of training.

3.3. Repeatability evaluation of the detector

   As mentioned in introduction, repeatability is one of the most important properties of the feature points. Along with its
importance, repeatability allows for an objective and qualitative evaluation. Hence, we used repeatability to evaluate the
performance of the proposed detector.
          Table 1. Accuracy, precision and recall of the trained detector .
                                     𝐴(𝐷)                𝑃(𝐷)               𝑅(𝐷)
          Training set, 𝐷              0.997            0.905             0.960
          Test set, 𝐶                  0.9766           0.730             0.580

    The procedure for repeatability evaluation is outlined below.
 An original image is used to find a set of feature points 𝑃𝑜 .
 The original image is transformed by one of the transformations (cf. the next list below).
 The transformed image is used to find a set of feature points 𝑃𝑡 .
 Since parameters of the transformation are known, coordinates of the points 𝑃𝑜 of the original image can be mapped onto the
  transformed image. Thus, the points in the set 𝑃𝑜 are mapped onto the transformed image, forming a set 𝑃𝑚 .
 The sets 𝑃𝑚 and 𝑃𝑡 are matched. Two points 𝑎 ∈ 𝑃𝑚 and 𝑏 ∈ 𝑃𝑡 are considered equal if 𝑎 ∈ 𝑉𝜀 (𝑏), 𝜀 = 2.0.


3rd International conference “Information Technology and Nanotechnology 2017”                                                         250
                                  Image Processing, Geoinformation Technology and Information Security / A. Verichev
 As a result of the comparison performed in the previous step we find three sets of points: 𝑃𝑇𝑃 are the points found on both
  sets, 𝑃𝐹𝑃 are new points that were not found on the original image but were found on the transformed image, 𝑃𝐹𝑁 are the
  missed points that were found on the original image and were not found on the transformed image. The cardinalities of these
  sets are, respectively, 𝑇𝑃, 𝐹𝑃 and 𝐹𝑁 values of the proposed detector. These values are used to calculate the detector’s
  accuracy, precision and recall.
   To evaluate repeatability we used the following transformations of the images:
 rotation by angle 𝛼, −45° ≤ 𝛼 ≤ 45°, 𝛼 is increased by 3°;
 sub-pixel shift by 𝑡, 0.25 ≤ 𝑡 ≤ 0.75, 𝑡 is increased by 0.05;
 scaling by 𝑠, 0.5 ≤ 𝑠 < 1.5, 𝑠 is increased by 0.1




           (a)                                                                             (b)




                                                 (c)
                              Fig. 2. Repeatability of the detector evaluated for various transformations: (a) rotation, (b) scaling, (c) translation.


   The results of the repeatability evaluation of the proposed detector that was trained on the training set 𝐷 are shown on fig. 2.
The detector’s performance can be considered adequate on rotated images for −9° < 𝛼 < 9° and on scaled images for 0.8 ≤
𝑠 ≤ 1.2. The performance on shifted images is high for the whole range of the parameter 𝑡.

4. Conclusion

   In this paper we investigated a relatively new approach to feature point detection. Contrary to the standard approach to the
problem, we didn’t formulate any heuristics-based definition of the term feature point but tried to infer it inductively using the
methods of machine learning and a representative training set. This enabled us to tune the proposed detector to a specific
problem at hand. The results of the experimental evaluation of the detector verify that such a tuning is in fact possible.
Moreover, the detector showed acceptable robustness to rotation and scaling transformation, and high robustness to sub-pixel
shift transformation. This suggests a great potential of the learning-based approach to feature points detection.


Acknowledgements

   The reported study was funded by RFBR according to the research project №17-29-03190-ofi.


3rd International conference “Information Technology and Nanotechnology 2017”                                                                            251
                                     Image Processing, Geoinformation Technology and Information Security / A. Verichev
References

[1] Szeliski R. Computer Vision: Algorithms and Applications. London: Springer, 2011; 812 p.
[2] Denisova AY, Myasnikov VV. Anomaly detection for hyperspectral imaginary. Computer Optics 2014; 38(2); 287–296.
[3] Tuytelaars T, Mikolajczyk R. Local invariant feature detectors: a survey. Foundations and trends® in computer graphics and vision 2008; 3(3): 177–280.
     DOI: 10.1561/0600000017.
[4] Li Y, Wang S, Tian Q, Ding X. A survey of recent advances in visual feature detection. Neurocomputing 2015; 149: 736–751. DOI:
     10.1016/j.neucom.2014.08.003.
[5] Förstner W, Gülch E. A fast operator for detection and precise location of distinct points, corners and centres of circular features. Proc. ISPRS
     intercommission conference on fast processing of photogrammetric data 1998; 281–305.
[6] Harris C, Stephens M. A combined corner and edge detector. Alvey vision conference 1988; 15(50): 147–151.
[7] Shi J, Tomasi C. Good features to track. Proc. Intl Conf. on Comp. Vis. and Pat. Recog (CVPR) 1994; 593–600.
[8] Smith SM, Brady J.M. SUSAN – A new approach to low level image processing. International Journal of Computer Vision 1997; 23(1): 45–78. DOI:
     10.1023/A:1007963824710.
[9] Rosten E, Drummond T. Machine learning for high-speed corner detection. European Conference on Computer Vision 2006; 430–443. DOI: 10.1007/11744023_34.
[10] Mair E, Hager GD, Burschka D, Suppa M, Hirzinger G. Adaptive and generic corner detection based on the accelerated segment test. European conference on
     Computer Vision 2010; 183–196. DOI: 10.1007/978-3-642-15552-9_14.
[11] Zhang X, Wang HA, Smith WB, Ling X, Lovell BC, Yang D. Corner detection based on gradient correlation matrices of planar curves. Pattern Recognition 2010;
     43(4): 1207–1223. DOI: 10.1016/j.patcog.2009.10.017.
[12] Shui PL, Zhang WC. Corner detection and classification using anisotropic directional derivative representations. IEEE Transactions on Image Processing
     2013; 22(8): 3204–3218. DOI: 10.1109/TIP.2013.2259834.
[13] Chernov AV, Myasnikov VV, Sergeyev VV. Fast Method for Local Image Processing and Analysis. Pattern Recognition and Image Analysis 1999; 9(2): 237–238.
[14] Flusser J, Suk T. Pattern recognition by affine moment invariants. Pattern Recognition and Image Analysis 1993; 26(1): 167–174. DOI: 10.1016/0031-
     3203(93)90098-H.
[15] Hu MK. Visual pattern recognition by moment invariants. IRE transactions on information theory 1962; 8(2): 179–187. DOI: 10.1109/TIT.1962.1057692.
[16] Myasnikov VV. Constructing efficient linear local features in image processing and analysis problems. Automation and Remote Control 2010; 72(3): 514–527. DOI:
     10.1134/S0005117910030124.
[17] Theodoridis S. Machine learning: a Bayesian and optimization perspective. San Diego: Academic Press, 2015; 1062 p.
[18] Alpaydin E. Introduction to machine learning. Cambridge: MIT press, 2014; 584 p.
[19] Hastie T, Tibshirani R, Frieman J. Elements of statistical learning: data mining, inference, and prediction. London: Springer, 2011; 745 p.




3rd International conference “Information Technology and Nanotechnology 2017”                                                                                 252