Point-based Medialness for Animal and Plant Identification

                          Prashant Aparajeya                                                        Frederic Fol Leymarie
                Goldsmiths, University of London, U.K.                                 Goldsmiths, University of London, U.K.
                       p.aparajeya@gold.ac.uk                                                          ffl@gold.ac.uk


ABSTRACT
We introduce the idea of using a perception-based medial point
description [9] of a natural form (2D static or in movement) as a
framework for a part-based shape representation which can then
be efficiently used in biological species identification and matching
tasks. The first step is one of fuzzy medialness measurements of 2D
segmented objects from intensity images which emphasises main
                                                                                        (a)                          (b)
shape information characteristics of an object’s parts (e.g. con-
cavities and folds along a contour). We distinguish interior from
exterior shape description. Interior medialness is used to charac-
terise deformations from straightness, corners and necks, while ex-
terior medialness identifies the main concavities and inlands which
are useful to verify parts extent and reason about articulation and
movement. In a second step we identify a set of characteristic
features points built from three types. We define (i) an Interior
                                                                                              (c)                   (d)                     (e)
dominant point as a well localised peak value in medialness rep-
resentation, while (ii) an exterior dominant point is evaluated by
identifying a region of concavity sub-tended by a minimum angu-
lar support. Furthermore, (iii) convex point are extracted from the
form to further characterise the elongation of parts. Our evaluated
feature points, together are sufficiently invariant to shape move-
ment, where the articulation in moving objects are characterised by
exterior dominant points. In the third step, a robust shape match-
                                                                                              (f)                   (g)                     (h)
ing algorithm is designed that finds the most relevant targets from a
database of templates by comparing the dominant feature points in
                                                                                   Figure 1: (a) Sketch of a cat built from a small number of ap-
a scale, rotation and translation invariant way (inspired by the SIFT
                                                                                   proximate disks (visible sketched lines); (b) corresponding seg-
method [17]). The performance of our method has been tested on
                                                                                   mented and binarised image; (c) classical internal medial-axis
several databases. The robustness of the algorithm is further tested
                                                                                   approximation; (d) external medial-axis; (e) 2D shock graph;
by perturbing the data-set at different scales.
                                                                                   (f) proposed interior medialness map; (g) recovered concave
                                                                                   (green dots) and convex (red dots) points, and (h) final domi-
Keywords                                                                           nant (medial) point set (in blue for internal ones, in green for
2D shape analysis, dominant points, information retrieval, medial-                 external/concave ones, and in red for convex points) obtained
ness representation, shape compression, articulated movement.                      via our method.

1.    INTRODUCTION                                                                 sizes (here approximate disks of various radii, Fig. 1.(a)). Different
   In this short communication we introduce our proposed 2D shape                  body movements are characterised by a particular orientation and
representation for biological objects (animals and plants) which is                combination of these primitives. From the point of view of psy-
inspired by results and techniques from cognitive psychology, artis-               chophysical investigations on the perception of shape movements
tic rendering and animation and computer vision (Fig. 1.(h)). An                   by humans, Kovács et al. have shown that such articulated move-
artist will often draw different poses of an animal in movement                    ments of a biological character can be best captured via a minimal
by using various combinations of primitive structures of different                 set of dominant features, potentially being represented as isolated
                                                                                   points [9].
                                                                                      Inspired with these two approaches to the perception of natu-
Copyright c by the paper’s authors. Copying permitted only for private             ral motions, we have investigated a possible scheme based on the
and academic purposes.                                                             notion of robust medialness presented by Kovács et al. that can
In: S. Vrochidis, K. Karatzas, A. Karpinnen, A. Joly (eds.): Proceedings of
the International Workshop on Environmental Multimedia Retrieval (EMR              efficiently capture the important structural part-based information
2014), Glasgow, UK, April 1, 2014, published at http://ceur-ws.org                 commonly used in artistic drawings and animations. The main


                                                                              14
                                                                              inant” and hence makes it compact. Contrarily to classical medial-
                                                                              based representations, ours is not overly sensitive to small boundary
                                                                              deformations and furthermore gives high response in those regions
                                                                              where the object has high curvature with large boundary support
                                                                              and in the vicinity of joints (between well-delineated parts, such as
                                                                              the limbs of an animal). We augment the medial dominant points
                                                                              with main contour points indicating significant convex and concave
                                                                              features, thus bringing together with our notion of medialness the
                                                                              main 2D point-based shape systems proposed over the years in the
                                                                              fields of cognitive psychology and computer vision: the so-called
                                                                              “codons” denoting contour parts [19] and high curvature convexi-
                                                                              ties often used in scale-space analyses [3].
                                                                                 Mathematically, medialness of a point in the image space is de-
                                                                              fined as the containment of sets of boundary segments falling into
Figure 2: From [9, Fig.2] with permission: the D function for                the annulus of thickness parameterised by the tolerance value ()
a simple shape defined as an accumulation of curve segments                   and with interior radius taken as the minimum radial distance of
falling inside the annulus neighborhood of thickness  (thick                 a point from boundary [9] (Fig. 2). On completion of medialness
boundary segments within the gray ring) centered around the                   measurements each pixel in the transformed image space holds a
circle (with center p). M (p) is taken as the minimum radial                  local shape information (of accumulated medialness). Assuming
distance from point p to the nearest contour point.                           figure-ground separation, thickness variations, bulges and necks of
                                                                              an object are captured via interior medialness measurement. In the
                                                                              work of Kovács et al. it is shown that humans are most sensitive
                                                                              to a small number of localised areas of medialness which coarsely
advantage over other classical medial-based representations of 2D
                                                                              correspond to joints for animated bodies [9]. Our equivalent (ex-
shape is one of combined compactness, robustness and capacity of
                                                                              tended) notion is defined as dominant points and can be applied to
dealing with articulated movements.
                                                                              any objects, animated or not. Dominant points are constrained to
   Shape representation towards matching has been addressed in
                                                                              be a relatively small number of points of high medialness obtained
many ways by computer scientists in recent years, including by
                                                                              by filtering out the less informative, redundant and noisy data from
directly characterising and grouping contour points [24, 15], by
                                                                              the initial medialness image space.
contour analysis [2, 10, 5, 21], using Blum’s medial axis trans-
                                                                                 To identify internal dominant points a morphological top-hat
form (MAT) and its related shock graph [8], combining contour and
                                                                              transform [23] is applied to isolate peaks in the medialness sig-
skeleton [11, 1], or using instead the inner distance [14] (some of
                                                                              nal. Peaks are filtered using an empirically derived threshold. The
the main medial representations are illustrated in Fig. 1, including
                                                                              selected peaks are then each characterised by a single represen-
our proposed method). Most related to our approach are contour
                                                                              tative point. To avoid considering large numbers of nearby iso-
enclosure-based symmetries [7] and medial point transform [22],
                                                                              lated peaks which are characteristic of object regions with many
which compute similar medialness maps, but do not apply these to
                                                                              small deformations, only peaks at a given minimum distance away
the retrieval of dominant points and their use in shape matching.
                                                                              from each other are retained. The extraction process of external
Other classical approaches emphasise similarly either boundary in-
                                                                              dominant point is achieved by combining a concavity measure to-
formation (e.g. Fourier, wavelet and scale-space analyses of closed
                                                                              gether with length of support on the contour. Again, a spatially lo-
contours) or interior information (e.g. primitive retro-fitting or ap-
                                                                              calised filtering is applied to isolate representative dominant points.
proximation) [18]. The general approach to matching is then to find
                                                                              Furthermore, to improve the robustness of our representation, we
good ways to put in correspondence the whole shape representation
                                                                              extracted the set of convex points to capture the blob like struc-
from a query with an equivalent complete shape representation of
                                                                              ture from the shape. We have observed that the articulation and
a target object (e.g. extracting a skeleton from a segmented image
                                                                              movement of limbs can be captured via such additional dominant
and defining a process to match it with another skeleton description
                                                                              points. Together, the selected dominant points (internal and exter-
in the DB). We propose instead to find an efficient medial repre-
                                                                              nal) and convex points are then considered as the representative
sentation which remains discrete (point-based), is (at least approx-
                                                                              feature points of the shape. Our matching algorithm is designed
imately) invariant to scaling, rotations and translations, and can be
                                                                              in such a way that it first compares internal dominant points of a
the basis of a feature vector map for efficient query-target matching
                                                                              query object with internal representative dominant points of target
tasks as produced in the discipline of Information Retrieval. Note
                                                                              shapes in a database. External dominant points are then similarly
that we do not require to have a complete object segmented and
                                                                              processed and convex points are used in a final refinement step. The
thus will also address partial shape matching. Note also that most
                                                                              matching algorithm first analyses the amount of scale, rotation and
of the well established shape-based approaches do not consider de-
                                                                              translation of the query w.r.to the target image. These values are
formations and articulated movements, while we do.
                                                                              then applied over the query image to find the best possible match-
   Our current method is to develop a shape matching algorithm
                                                                              ing location in the target image.
which is invariant to translation, scale and rotation and is inspired
by the now classic SIFT approach [16, 17]. Our shape represen-
tation is derived from the region-based medial point description of           2.    MEDIALNESS MEASURES & FEATURE
shape proposed by Kovács et al. [9] in cognitive science and per-
ception studies. The purpose of evaluating such medialness mea-                     EXTRACTION
surement is to provide a description of the shape which is local,                A medial point is defined by computing the D function as a dis-
compact, can easily be applied at different spatial scales, and mim-          tance metric (to boundary segments). The D value at any point
icks human sensitivity to contour stimuli. This process maps the              in transformed space represents the degree to which this point is
whole shape information into a few number of points we call “dom-             associated with a percentage of bounding contour pixels of the ob-


                                                                         15
                                                                                  In practice, if an object can be deformed or is articulated, salient
                                                                               concavities can be identified in association to those deforming or
                                                                               moving areas (such as for joints of a human body). Considering this
                                                                               empirical observation, the location of an external dominant point
                                                                               can be made invariant to this deformation/articulation only up to a
                                                                               certain extent. For example, if the location of an external dominant
                                                                               point is initially relatively far away from the corresponding contour
                                                                               segment, a slight change in the boundary shape near the movable
Figure 4: Illustration of the three successive steps in isolating              part (such as an arm movement) can considerably change the posi-
internal dominant points: (a) medialness representation of (the                tion of that associated dominant point (Fig. 5, left). On the other
interior of) a standing dog figure; (b) corresponding top-hat                  hand, if a point is located very close to the contour, it can easily be
transform; and (c) internal dominant points illustrated as black               due to noise or small perturbations in the boundary. Therefore, to
dots together with the original object’s boundary.                             be able to retrieve reliable external dominant points, it is first re-
                                                                               quired to provide an adapted definition of concavity as a significant
                                                                               shape feature.
ject within a tolerance of value  (after Kovács
                                             ´ et al. [9]; Fig. 2).
Formally, D is defined as: D (p) = T1 |p−b|≤M (p)+ db, for
any point p = [xp , y p ], vector b(t) = [(x(t), y(t)] describing
the 2D´ bounding contour (B) of the object, and normalising factor
T = b∈B db. The metric M (p) is taken as the smallest distance
between p and the bounding contour: M (p) = min |p − b(t)|. As
                                                  0≤t≤1
an example, we performed (interior) medialness measurement on
a dog in a typical standing posture to illustrate how the increasing
value of tolerance () operates an averaging effect on medialness,
Fig. 3, as previously observed by Kovács et al. [9]: as  increases,
smaller symmetries are discarded in favor of those at a larger scale.

2.1     Dominant Points Extraction
   Medialness measurement is currently done separately for inter-
nal and external regions to take advantage of the perceptual figure-
ground dichotomy known to be a powerful cue in humans. This                    Figure 5: Left: External medialness processing on a humanoid.
also enables our method to consider articulated objects as potential           The articulated movement of the left arm changes the location
targets in pattern recognition tasks.                                          and orientation of the associated external dominant point (at
                                                                               the concave curvature peak). If the external dominant point
2.1.1     Internal Dominant Points                                             is reasonably far from the contour, then it proves difficult to
   Medialness increases with “whiteness” in our transformed im-                retrieve a (shape-based) match with the modified form. Blue
ages (visualisation of medialness). To select points of internal dom-          arrows show the local support for concavity while brown ar-
inance, a “white” top-hat transform is applied, resulting in a series          rows indicate the direction of flow of medialness (away from
of bright white areas. The white top-hat transform is defined as the           the concavity). Right: Top: Detection of concave regions (on a
difference of an input function (here an image of medialness mea-              butterfly object) using angular support. Bottom: Detected Con-
sures as a grey-level 2D function) with the morphological opening              cave points.
of this image by a flat structural element (a disk with radius as a pa-
rameter). Opening is a set operator on functions which “removes”                  We define a point (contour point) of local concavity if it falls
small objects from the foreground of an image, placing them in                 under a threshold angular region, under the constraint of length of
the background (augmenting the local function set values) [23, 6].             support which itself depends on the tolerance value (). The value
This filtering is followed by a thresholding to discard remaining ar-          of threshold (θ) is tunable but is always less than π, which permits
eas of relatively low medialness significance. Figure 4.(b) shows              to control the angular limit of the concave region. A point whose
the result obtained after applying the white top-hat transform on a            local concavity is larger than θ is considered a “flat” point. In our
medialness image.                                                              experiments we tuned the value of θ from 5π/6 to 8π/9. In as-
   We still require to process further the output of the top-hat trans-        sociation, we define an external circular region (of radius function
form to isolate the most dominant points amongst the remaining                 of ) centered at each locus containing candidate external dominant
selected medialness points which tend to form clusters. To do so, a            points. Each such region may provide only one representative dom-
flat circular structuring element of radius /2 (but of at least 2 pix-        inant point, where the dominance of a particular point is decided by
els in width) is applied over the top-hat image such that within the           the maximum containment of boundary points inside the associated
element it produces only that value which maximises medialness.                annulus (of medialness) and corresponds to the maximum length of
We further impose that no remaining points of locally maximised                support. Finally, we position the representative dominant point to
medialness are too close; this is currently implemented by impos-              be near the contour at a fixed distance outside the form (Fig. 5,
ing a minimum distance of length 2 is taken between any pair of               right).
selected points. We have found that in practice this is sufficient to
avoid the clustering of final interior dominant points (Fig.4.(c)).
                                                                               2.1.3     Convex Points
                                                                                 Our final shape feature is a set of convex points, where a shape
2.1.2     External Dominant Points                                             has sharp local internal bending and gives a signature of a blob-like


                                                                          16
Figure 3: Top row: Medialness and tolerance. Left: A dog standing; other images to the right show variations in D for increasing
tolerance (). Middle row: The D -function for a sequential set of frames capturing the profiles of a running leopard. The maxima
(white spots) of the function are good candidates as primitives for biological motion representation (after [9]). Bottom row: Corre-
sponding feature points automatically extracted from the medialness measures visible in the middle row (blue: interior dominant;
green: exterior dominant; red: convex points).


part or significant internal curvature structure (i.e. a peak in curva-        3.   MATCHING ALGORITHM
ture with large boundary support). The goal is to represent an entire
protruding sub-structure using one or a few boundary points. Tradi-
tionally, such protrusions have a significant contribution in charac-
terising shape [19, 11, 2, 21]. The process of extraction of convex
points is similar to the extraction of concave points, the main dif-
ference being the value of threshold angle (θ), where π < θ ≤ 2π.
In our experiments we have found useful values to be in the range:
5π/4 to 4π/3. Such convex and concave points are complemen-
tary to each others and have been used in the “codon” theory of
shape description: a codon is delimited by a pair of negative cur-
vature extrema denoting concavities and a middle representative
positive maximum of curvature denoting a convexity [19]. In our
case we relate these two sets with the extremities of the traditional
medial axis of H. Blum: end points of interior branches correspond
to center of positive extrema of curvature and end points of exte-             Figure 6: Illustration of the evaluation of scale (β). In the
rior branches are mapped to negative extrema of curvature of the               query image on the left, for a particular dominant point falling
boundary. The repositioning of these extrema near the boundary                 inside the green circle, two possible matching locations in the
is alike the end points of the PISA (Process Inferring Symmetry                target image, are shown: cases I and II. In each case the cir-
Axis) representation of M. Leyton [13]. Together, the three sets:              cle’s radius is dictated by the minimum distance to the contour
concave, convex and interior dominant, form a rich enough point-               (from medialness) and the scale (β) is given as the ratio of the
based description of medialness to allow us to efficiently address             minimum radial distances of target vs query.
applications with articulated movement for real image data.
                                                                                  Our objective is to design a robust matching algorithm that will
                                                                               match dominant points (query to targets) in an efficient way, in
2.2    Articulation                                                            both time (or equivalent numerical complexity) and accuracy. First,
   Anatomically, an animal’s articulated movement is dependent on              both the internal and external dominant points are separately ex-
the point of connection between two bones or elements of a skele-              tracted from query and target images. Internal dominant points
ton. Our results show that exterior dominant points (representative            are the keypoints for evaluating scale, rotation and translation of
of significant concavities) have higher potential to trace such artic-         the query image w.r. to a target image. After finding the scale,
ulations, unless the shape is highly deformed [12]. For usual move-            rotation and translation of the query image, the next task is to im-
ments (e.g. walking or jogging), these feature points remain present           prove the correctness of the matching algorithm. For this, external
and identifiable in association to an underlying bone junction and             dominant points can play a role and improve the final accuracy.
hence can provide a practical signature for it (Fig. 3, bottom).               The following information is associated with the dominant points :


                                                                          17
< X, Y, R, M M >, where (X,Y) is the 2D location of the domi-                    scale of the image Q (w.r.to T ), our next task is to transform the
nant point, R is the minimum radial distance of the point from the               positions of all feature points (QI , QE and QC ) of the image Q
contour and M M is the medialness measurement. The test case                     into the space of image T and finally check for a match. This is
(query Q) is represented as: Q= {QI ,QE ,Qc }, for internal (QI ),               done as follows:
external (QE ) dominant points and convex (QC ) points; while the
target image (T ) is similarly represented as: T = {TI ,TE ,TC }.                     1. Construct the 4 × 4 homogeneous matrix H to perform the
Each element of QI is matched with each element of TI follow-                            required linear (rotation and scaling) and affine (translation)
ing four stages.                                                                         transforms for all feature points found in image Q.
   Stage I finds the scale (β) and translation of the query image by                  2. Calculate the modified coordinate positions by matrix multi-
matching a dominant point. Stage II is a check on the scale β to                         plication of H with the feature point positions.
make sure at least one more internal dominant point can be used in
the matching process (if not, move to a different dominant point not                  3. For each modified qi (qi QI ) if there is a tj (tj TI ) within a
yet considered). In stage III, the rotation of the query image (w.r.to                   tolerance radius of r × , their β-value is then compared. If
the target) is evaluated. Finally, stage IV modifies the Cartesian                       the β ratio is within Ts , then count it as a match.
positions of each feature point of the query image by applying the
                                                                                      4. Repeat step 3 for external dominant and convex points, i.e.
evaluated scale, rotation and translation and we proceed to measure
                                                                                         each element of QE and QC with TE and TC respectively.
a matching performance value.
Stage I: Take an element (qi ) from set QI and match it with each                Consider MI , ME and MC as the sets of internally, externally and
element (tj ) of set TI . For each pair of (qi ,tj ), the scale (β) is           convex matching feature points. Intuitively, more shape discrim-
                     Rt
evaluated as: β = Rqj (Fig. 6). The scale (of query image w.r. to                ination is present in internal and external dominant points while
                         i
target) for the matching pair (qi ,tj ) is defined via two translations,         convex points add details (end points of protruding parts delimited
                                 −−→                   −−→                       by external (concave) points). Hence we make use of the follow-
one for each axial direction: q x tx = tx − qx and q y ty = ty − qy ,
                                                                                 ing heuristic: our matching metric is biased towards internal and
where qi = (q x ,q y ) and tj = (tx ,ty ).
                                                                                 external dominant points. We express the problem of finding a best
Stage II: Now take the next element (qi+k ) from set QI and match
                                                                                 matching location of Q in T as the maximization of the F-measure:
it again with each element (tj ) of set TI . For each pair of (qi+k ,tj )
                  0                      0
                                                                                                                P
find the scale β . If the ratio of β over β is under the tolerance                                          2 × (MI + ME )
                                                                                                F = P                  P                         (1)
level Ts , then goto stage III. Otherwise, repeat at another point and                                   (QI + QE ) + (TI + TE )
check the same tolerance criterion Ts and repeat if necessary until                To handle the situations where many F-measures have same val-
                                                     0
all the elements of QI are counted. The value β (if it is under the              ues, we then compute a percentage of matched convex points, FC ,
tolerance level) ensures compatibility under scaling of the query                and maximize this value. Mathematically,
image (with respect to a target) and helps in finding the matching
location of the next internal dominant point to consider (red arrows                                               P
                                                                                                                2 × MC
in Fig. 7).                                                                                               FC = P                                       (2)
Stage III: We define the orientation (α) of an image by the angle                                                (QC + TC )
between a line joining two matched dominant points (as obtained                     There is more than one way to combine the information from the
from step I and II) and the positive (reference) x-axis. If (qi ,qi+k )          three sets of feature points. Currently, we are using first the internal
are the matching dominant points in QI and (tj ,tj+l ) are matched               dominant points to reduce the number of targets to further consider
dominant points in TI , then orientations αq and αt are defined as               for a potential match. Then we add external (concave) dominant
the angle between matching dominant points (Fig. 7). The rotation                points to further reduce the number of candidates, and finally use
(θ) of the image Q is thus defined by the difference of orientations,            convex points only if we still have multiple candidates left. Note
i.e. rotation(θ) = αT (tj ,tj+l ) − αQ(qi ,qi+k ) .                              that we do not use yet in practice additional information implicitly
                                                                                 available, such as “codon” structure, i.e. how pairs of concavities
                                                                                 are associated with specific convex points [19]. This structural in-
                                                                                 formation would be useful for finer matching under articulated mo-
                                                                                 tions (of limbs). We are currently exploring this combined use of
                                                                                 convex points with external dominant points and will report else-
                                                                                 where on its potential advantage.

                                                                                 4.     EXPERIMENTAL RESULTS
                                                                                    Our current matching algorithm is hierarchical in its use of fea-
                                                                                 ture points (internal –> external –> convex) and has proven use-
                                                                                 ful even with common body articulated movements, but as pointed
                                                                                 out above, this could be refined in the near future by using more
                                                                                 structural information. At this early stage in our research program
                                                                                 we have performed an extensive experiment on different heteroge-
Figure 7: For both query (test) and target images, the orien-                    neous databases containing biological forms (large animals, plants
tation (α) is the angle between a line joining the two matching                  and insects) to verify the performance in efficiency and accuracy of
internal dominant points (shown with blue arrows) and a posi-                    our novel method. Four main types of datasets were used for this
tive x-axis. The required rotation (θ) of the query image w.r.to                 purpose: (i) animals taking a static posture and in movement, (ii)
the target is given by the difference in orientations.                           humans taking a static posture or in action (articulated movement),
                                                                                 (iii) insects and (iv) plants (only leaf forms currently). Also, we
Stage IV: Upon obtaining the values of translation, rotation and                 performed some random re-scaling, rotating and translating for the


                                                                            18
verification of invariance under such transformations (Fig. 9), and
added a number of occlusions, by performing random cuts, to test
the method’s response beyond affine transforms. Furthermore, the
robustness of the algorithm is also evaluated by applying a “struc-
tural noise” — introducing scalable random geometric deforma-
tions — which we designed by performing randomised morpho-
logical set operations on the segmented (binarised) objects. In our
experiments we have defined three levels of perturbations: small or
less perturbed, medium and large or highly perturbed (examples in
Fig. 8). We note that other methods relying on smooth continuous
contours (such as methods based on the use of codons or curvature
scale-space, as well as many of the traditional medial-axis meth-
ods) will have great difficulty in dealing with such deformations —
which are to be expected in noisy image captures and under varying
environmental conditions such as due to decay and erosion.


                                                                             Figure 9: Top-10 results on some of the samples for deer,
                                                                             horse and humanoid forms when using our F − measure as
                                                                             the basis for ranking. The leftmost image for each ranking
                                                                             results is the query, while the images on the top row are the
                                                                             target images matched at successive rank location from best to
                                                                             worst. The bottom row then shows the overlaying (in orange) of
                                                                             the query on the respective target image (after transformation)
                                                                             and their spatial differences. The top two series shows tests
                                                                             for the invariance under scaling, rotation and translation. The
                                                                             third and fourth series show the behavior of matching using the
                                                                             F − measure in the presence of articulated movements.


                                                                             cludes circa 5000 plant images, while we have segmented only 200
                                                                             of these thus far). This is a limitation of our current approach,
                                                                             but we are working on extensions to grey-level and color images
                                                                             (as non-segmented inputs). Also, rather than focus on one type of
                                                                             biological forms, say butterflies, we decided to test and show the
                                                                             potential power of our approach for a number of very different bi-
                                                                             ological forms, from plant leaves, to various species of insects to
                                                                             larger animals (including humans).
                                                                                Furthermore, to check the robustness of our algorithm, we de-
Figure 8: Three levels of (added) perturbation: I - none (origi-             formed each such sample at the previously indicated three levels
nals), II - small , III - medium and IV - large.                             of perturbation, thus bringing our total dataset count to 1130×4 =
                                                                             4520. From this database, we took each sample as a different query,
   To construct different databases we used the standard MPEG-7              resulting in a total comparison set of 4520×4520≈20.43 million
[10], ImageCLEF-2013 [4] and Kimia (Brown University’s) [20]                 forms, and exploited our current ranking metric (F alone, or with
datasets. Furthermore, we also initiated our own database where              FC ) to find the returned top-10 matches. Examples of such top-10
we selected different sequences of animals in motion (from videos).          results can be seen in Figures 9, 10, 11, and 12. In our experiments,
From these datasets, we have collected a total of 1130 samples be-           the matching algorithm always finds the best fitting shape area for
longing to Animals other than insects (520 samples, including Hu-            the query in the target image (Fig. 13). For empirical analysis, we
man, Horse, Rat, Cat, Panther, Turtle, Elephant, Bat, Deer, Dog,             performed two individual comparisons: (a) precision obtained on
and Ray forms), Insects (410 samples, including Butterfly, Bug,              different sets of data types (Fig. 14), and (b) precision obtained
Mosquito, Ant, and other miscellaneous insect forms), and Plant              at different perturbation levels (Fig. 15). When the query image
leaves (200 samples, including Acercampestre, Aceropalus, Ac-                belongs to the original set, the retrieval rate at different ranks is
erplatanoides, Acerpseudoplatanus, Acersaccharinum, Anemone-                 very high, while the performance significantly decreases for high
hepatica, Ficuscarica, Hederahelix, Liquidambarstyraciflua, Lirio-           levels of perturbations. Note however that even under a large struc-
dendrontulipifer, and Populusalba specimens).                                tural perturbation a given form which may have lost some of its
   We note that we are limiting the sizes of our test databases as we        significant shape features (e.g. a limb), can still be matched with
require well segmented binary forms (distinguishing figure from              perceptually similar targets — as judged by a human observer and
ground) to initiate our medialness transform (e.g. ImageCLEF in-             validated here as we know the ground truth.


                                                                        19
Figure 10: Top-10 results on some of the samples from the
Plant database (ImageCLEF-2013 [4]) with shape perturbations
using our F − measure as the basis for ranking. NB: The first
match is always the desired target.
                                                                      Figure 12: Top-10 results on some of the samples from the
                                                                      Insect database including structural perturbations when using
                                                                      our F − measure as the basis for ranking.


                                                                      Figure 13: A special query image obtained as a juxtaposition of
                                                                      two different insect cuts finds a best fitting location in the tar-
Figure 11: Top-10 results on some of the samples for butterfly        get image, and results into an interesting series of part-based
forms from the Insect database including structural perturba-         matches (retrieving the correct pair of individual insects (be-
tions when using our F − measure as the basis for ranking.            fore juxtaposition) from the DB).
NB: A partial shape query (3rd series from the top) returns
valid and interesting results.


                                                                 20
                                                                            6.   REFERENCES
                                                                             [1] X. Bai, W. Liu, and Z. Tu. Integrating contour & skeleton for
                                                                                 shape classification. In ICCV Workshops, pages 360–7, 2009.
                                                                             [2] S. Berretti, A. Del Bimbo, and P. Pala. Retrieval by shape
                                                                                 similarity with perceptual distance and effective indexing.
                                                                                 IEEE Trans. on Multimedia, 2(4):225–39, 2000.
                                                                             [3] M. Bober and F. Mokhtarian. Curvature Scale Space
                                                                                 Representation. Springer, 2003.
                                                                             [4] B. Caputo et al. Imageclef 2013: the vision, the data and the
                                                                                 open challenges. In Information Access Evaluation, pages
                                                                                 250–68. Springer, 2013.
                                                                             [5] L. Chen, R. Feris, and M. Turk. Efficient partial shape
                                                                                 matching using Smith-Waterman algorithm. In CVPR
                                                                                 Workshops, pages 1–6, 2008.
Figure 14: Result analysis (precision) on the different datasets             [6] E.R. Dougherty and R.A. Lotufo. Hands-on morphological
containing originals of large animal, plant and insect images,                   image processing. SPIE, 2003.
using our proposed F − measure as the ranking metric.                        [7] M. Kelly and M. D. Levine. Annular symmetry operators. In
                                                                                 ICCV, pages 1016–21, 1995.
                                                                             [8] B.B. Kimia. On the role of medial geometry in human
                                                                                 vision. Journal of Physiology-Paris, 97(2):155–90, 2003.
                                                                             [9] I. Kovács, Á. Fehér, and B. Julesz. Medial-point description
                                                                                 of shape. Vision research, 38(15):2323–33, 1998.
                                                                            [10] L. J. Latecki and R. Lakamper. Shape similarity measure
                                                                                 based on correspondence of visual parts. IEEE PAMI,
                                                                                 22(10):1185–90, 2000.
                                                                            [11] F. Leymarie and M. D. Levine. Simulating the grassfire
                                                                                 transform using an active contour model. IEEE PAMI,
                                                                                 14(1):56–75, 1992.
                                                                            [12] F. Fol Leymarie et al. Point-based medialness for movement
                                                                                 computing. In ACM Proc. of MOCO, pp.31–6, Paris, France,
                                                                                 2014.
                                                                            [13] M. Leyton. Process Grammar. Springer, 2012.
                                                                            [14] H. Ling and D. Jacobs. Shape classification using the
Figure 15: Result analysis of the entire dataset for different                   inner-distance. IEEE PAMI, 29(2):286–99, 2007.
levels of perturbations, using our F − measure as the rank-                 [15] Z. Liu and Meng F. An, J. A robust point matching algorithm
ing metric. NB: Most of the images are also randomly rotated,                    for image registration. In ICMV, volume SPIE 8350, 2011.
scaled and translated.                                                      [16] D. G. Lowe. Object recognition from local scale-invariant
                                                                                 features. In ICCV, volume 2, pages 1150–7, 1999.
                                                                            [17] D.G. Lowe. Distinctive image features from scale-invariant
5.   CONCLUSION                                                                  keypoints. Int’l J. of Computer Vision, 60(2):91–110, 2004.
   The strengths of our approach include: (1) emphasizing a rep-            [18] Y. Mingqiang et al. A survey of shape feature extract. tech.
resentation of 2D shapes based on results from studies on human                  In Pattern Rec. Tech. & Applic. pp. 43–90, InTech, 2008.
perception via robust medialness analysis; (2) introducing an algo-         [19] W. Richards and D. D. Hoffman. Codon constraints on
rithmic chain providing an implementation of this shape analysis                 closed 2D shapes. CVGIP, 31(3):265–81, 1985.
tested on current reference 2D binary animal and plant databases;           [20] T. Sebastian et al. Recognition of shapes by editing their
(3) mapping of the whole form into a small number of dominant                    shock graphs. IEEE PAMI, 26(5):550–71, 2004.
feature points; (4) achieving accuracy for top ranked matches (ex-          [21] P. Srestasathiern and A. Yilmaz. Planar shape representation
actness) and with nice (observable) degradation properties for the               and matching under projective transformation. CVIU,
following highest ranking results; (5) testing and demonstrating                 115(11):1525–35, 2011.
robustness under some levels of articulation and other structural           [22] G.J. van Tonder and Y. Ejima. Flexible computation of shape
(noise, cuts) perturbations. Our method also shows promising re-                 symmetries within the max. disk paradigm. IEEE SMC, Part
sults for part-based matching tasks, in the context of occlusions,               B, 33(3):535–40, 2003.
cuts and mixed object parts, as well as for shape reconstruction and        [23] L. Vincent. Morphological grayscale reconstruction in image
compression, all important topics which will require further inves-              analysis. IEEE Trans. on Ima. Process., 2(2):176–201, 1993.
tigations.
                                                                            [24] P. B. Van Wamelen et al. A fast expected time algorithm for
                                                                                 the 2-D point pattern matching problem. Pattern
Acknowledgments                                                                  Recognition, 37(8):1699–711, 2004.
This work was partially funded by the European Union (FP7 –
ICT; Grant Agreement #258749; CEEDs project). List of exter-
nal databases used in our research: (i) Brown University’s Shape
DB, (ii) MPEG-7 shape DB, (iii) ImageCLEF DB.


                                                                       21