=Paper= {{Paper |id=Vol-1366/paper5.pdf |storemode=property |title=Automated Silhouette Extraction for Mountain Recognition |pdfUrl=https://ceur-ws.org/Vol-1366/paper5.pdf |volume=Vol-1366 |dblpUrl=https://dblp.org/rec/conf/gvd/BraunS15 }} ==Automated Silhouette Extraction for Mountain Recognition== https://ceur-ws.org/Vol-1366/paper5.pdf
Automated Silhouette Extraction for Mountain Recognition

                             Daniel Braun                                                 Michael Singhof
                     Heinrich-Heine-Universität                                      Heinrich-Heine-Universität
                       Institut für Informatik                                         Institut für Informatik
                        Universitätsstraße 1                                            Universitätsstraße 1
                   40225 Düsseldorf, Deutschland                                   40225 Düsseldorf, Deutschland
                braun@cs.uni-duesseldorf.de                                     singhof@cs.uni-duesseldorf.de


ABSTRACT                                                                   1.   INTRODUCTION
With the rise of digital photography and easy sharing of                      Sharing our experiences with digital images is a significant
images over the internet, a huge amount of images with no                  part of our today’s life, which is partly a result of the high
notion of what they are showing exists. In order to overcome               availability of digital cameras, like in smartphones, and the
this problem, we – for the example of mountain recognition –               high use of social networks, that simplifies the publication
introduce a method, that is able to automatically recognise                and sharing of images. As a consequence, the number of
a mountain shown in a photography.                                         images in the world wide web increases significantly. For
   Our method does not require GPS information stored in                   example, this can be seen on the image sharing platform
the image, since most images are not GPS tagged, either                    Instagram, where users share an average of 70 million new
because of the absence of a GPS sensor in the device or                    photos per day [1].
because it has been deactivated for a lesser power consump-                   As a result of such high numbers of images, searching
tion, which often is the case with smartphones. Instead, we                photos which show specific objects is challenging, because
propose a method that is able to automatically extract the                 the majority of these images is not properly tagged with the
mountain’s silhouette from a given image. This silhouette                  names of every object seen in them. So the need for efficient
is then cleaned by removing artefacts and outliers, such as                algorithms for automatic object recognition rises. In the last
trees and clouds, with a histogram based approach. Finally,                decades there were many advances in this research field, but
the cleaned silhouette can be compared to reference data in                especially the automatic identification of landmarks, which
order to recognise the mountain that is shown in the pic-                  are subject to weather changes, areal erosion and vegetation,
ture. For this, time series comparison techniques can be                   is still a challenging task, even if the amount of images with
used to find matching silhouettes. However, because of the                 attached GPS data, which marks the geo-position of the
huge number of reference silhouettes to compare against, we                camera when the photo was shot, is rising.
argue, that a preselection of those silhouettes is necessary                  The growing spread of devices with the capability of gen-
and point out approaches to this problem.                                  erating GPS tags for the images, like smartphones and dig-
                                                                           ital cameras with GPS units, enables many possibilities for
Categories and Subject Descriptors                                         an subsequent geo-localisation of images, due to the fact
                                                                           that GPS tags can significantly reduce the number of pos-
I.4.8 [IMAGE PROCESSING AND COMPUTER VI-                                   sible sights, buildings or landmarks to compare with. How-
SION]: Scene Analysis—Object recognition; I.4.6 [IMAGE                     ever, there exist too many images without the advantage
PROCESSING AND COMPUTER VISION]: Seg-                                      of GPS tags, so that an automatic geo-localisation without
mentation—Edge and feature detection; H.2.8 [DATABASE                      prior knowledge of the camera position is still a valuable
MANAGEMENT]: Database Applications —Data min-                              aim.
ing                                                                           Our focus lies on the automatic landmark recognition,
                                                                           which we will describe by using the example of mountain
Keywords                                                                   recognition in images. To solve the question, which moun-
                                                                           tain can be seen on an given image, we match the skyline
Object Recognition, Image Annotation, Outlier Detection,
                                                                           of the mountain in the image with silhouettes of mountains
Image Segmentation, Skyline Recognition, Mountain Recog-
                                                                           in our database. For this purpose, we have to automatically
nition, Time Series
                                                                           extract the exact skyline from the image, which is a diffi-
                                                                           cult task, because the segmentation of the image can lead
                                                                           to artefacts, for instance due to weather conditions, noise,
                                                                           obstacles or overexposure.
                                                                              In this paper we introduce a baseline segmentation algo-
                                                                           rithm, which uses an outlier detection algorithm to identify
                                                                           and eliminate these artefacts. The article is structured as
                                                                           follows: In the next section we discuss other papers related
                                                                           to our algorithm. We then introduce our algorithm, which
27th GI-Workshop on Foundations of Databases (Grundlagen von Daten-        consists of the three steps silhouette extraction, silhouette
banken), 26.05.2015 - 29.05.2015, Magdeburg, Germany.                      cleaning and silhouette matching. The section of the latter
Copyright is held by the author/owner(s).

                                                                      18
one will be a perspective of how to use the cleaned silhouette        3.    SILHOUETTE RECOGNITION
in further steps. The last chapter summarises this paper and             The silhouette recognition process is basically a process in
outlines future work.                                                 three steps. The first step is the extraction of the silhouette
                                                                      from a given picture. This silhouette is stored as a polygonal
2.   RELATED WORK                                                     chain. During the second step, this silhouette is cleaned by
   The amount of publications for automated mountain re-              identifying and removing outliers. The cleaned silhouette
cognition increased significantly in the last two decades.            is then used as the input to the third and final step, which
Given a high number of publicly available digital elevation           consists of matching the silhouette against the reference data
maps (DEM), the consensus in many publications [2, 3, 4, 5,           in order to be able to identify the structure in the picture.
10] is to use image-to-model matching for mountain recog-
nition and orientation identification.                                3.1    Silhouette Extraction
   Many approaches, like [3, 4, 10], make use of known GPS               Extracting the silhouette is the first step in the mountain
data for the image to align in the terrain. This reduces the          identification process described in this paper. The task of
search space for possible mountain silhouettes significantly,         this step is the separation of the sky from the rest of the
because the search of the correct mountains is limited in             processed image. We therefore have a binary segmentation
checking the surrounding of the camera’s position.                    task to solve, in which we check every pixel p of an image and
   Baboud et al. [3] need an estimated field-of-view to cal-          decide if p shows a part of the sky or not. For a human this
culate the rotation which maps the image to the terrain.              would be easy to do in most cases, even though it would
Therefore, the authors introduce a robust matching metric,            be too much work and time to segment great numbers of
using extracted edges in combination with a search space              pictures manually. Because of that, we use a growing seed
reduction to further reduce computation time. [10] uses a             algorithm, like the algorithm described in [11], for which we
standard Sobel-filter for the edge extraction. To identify the        use the fact, that in most cases the pixels at the upper bound
edges which are part of the mountain’s contours, the authors          of the image are part of the sky. In the following section,
propose the use of the Random Ferns classifier. Afterwards            we will first describe some difficulties, which can make the
they match the emerged contour map with the contour map               segmentation task more complex. After that, our baseline
extracted from the DEM. In [4] a 360-degree panorama of               segmentation algorithm will be further explained.
the surroundings of the image location is synthesized out of             The segmentation of an image generally suffers from dif-
a given DEM and used for matching the skyline extracted               ferent problems, like under-/overexposure, which results in
from the image. For that, they use a vector cross correlation         a smaller difference between the pixel colours, or blur, which
(VCC) technique to find the candidate matches. After fur-             can be, for example, a result of a lossy compression of the
ther refinement and recalculation of the VCC for each peak            image. In addition, our binary segmentation task has even
they can label the peaks with a high precision.                       some own problems to deal with. The first difficulty is a con-
   All three approaches show good results for their issue,            sequence of the motif itself, because the weather in moun-
but the need for GPS data for the processed image does                tainous terrain is very volatile. This and the fact that, for
not match our problem specifications, different from Baatz            example in the alps, many mountain peaks are covered with
et al. [2], in which the focus lies on images without GPS             snow, can make the extraction of the mountain silhouette
tag. They use an approach based on a cost function for                out of an image imprecise. Furthermore differently coloured
the belonging of a pixel to the sky respectively foreground.          bands of cloud can lead to an extracted silhouette which lies
They combine this approach with a relevance feedback-like             in the sky and is therefore not part of the real skyline. The
user interference, where the user can mark parts of the sky or        second one is a result of possible obstacles, which hide the
the foreground. This user intervention was needed for 49% of          real silhouette of the mountain or lead to smaller segments
the images in their dataset, which was collected during there         within the sky segment.
research. Thankfully, they published this dataset, so that               Even though we are aware of these problems, our naive
it will be used in this paper. After the contour extraction,          segmentation algorithm cannot handle all of them. For the
they match the coded contourlet with the contours extracted           identification of artefacts as result of segmentation errors,
from a DEM at several points on a predefined grid. At                 we use the second step of our chain, which is described in
last, when they find a suitable match, they recalculate the           section 3.2. Our segmentation algorithm uses the upper row
geo-position of the camera. Naval et al. [5] also try to              of pixels as seed for the sky segment. This means that we
find the camera position and orientation, using a DEM to              mark the upper pixels as sky and process every neighbouring
position the camera on the world. For this purpose they               pixel to let this segment grow. For this purpose we first
match the extracted skyline from an image with a synthetic            convert the processed photo to a gray-scale image G. Now
skyline from a DEM. Different to both works we try to get             we can define VG as overall variance of the brightness values.
an automatically cleaned silhouette, thus removing obstacles          Having VG , we now process every pixel p(x,y) which is not a
or other artefacts, out of the processed image.                       part of the sky segment and mark it as sky candidate, if for
   Tao et al. [12] focus on the identification of the sky seen        p(x,y) it holds that
in an image and the search for images with a specific sky                                                         √
appearance. Therefore they define different sky attributes,                           Bp(x,y) − meanrp(x,y) < γ · VG
like for example the sun position, which they extract in-
dividually afterwards. At last they present their complete            with Bp(x,y) the brightness of the pixel p(x,y) , meanrp(x,y) as
system SkyFinder, in which an attribute based search is im-           the mean of the brightness in an neighbourhood of the pixel
plemented. On top of that, they provide a sky replacement             with the radius of r and γ as a factor to scale the impact
algorithm for changing the sky in an image. However, the              of the standard derivation. This means that we mark a
recognition of the skyline is not part of this system.                pixel, if its distance to a local mean of brightness values

                                                                 19
Figure 2: The result of the segmentation algorithm. (Left) The original image out of the dataset from [2].
(Right) The binary image after the segmentation. White pixels mark the ground segment.



is smaller than a fixed percentage of the global standard                 describes the y-coordinate of that point in the picture. We
derivation of all brightness values. The idea behind this                 start at the upper left pixel p(0,0) and search for the first non-
term is, that the border between sky and ground has in                    sky pixel as start vertex v1 for our polygonal chain, which
most cases a stronger contrast than the rest of the image,                lies on the left pixel column with x1 = 0. Now, we have
especially the sky. Therefore, the distance to the mean will              two possibilities: First, we can find no pixel, which results
be higher at the skyline as it will be at the border of possible          in checking the next column until we find a pixel or we have
clouds. With the connection of the upper bound to the                     reached the lower right pixel of the image. Otherwise, if a
standard derivation, we want to take into account the images              possible skyline pixel was found, the algorithm tracks the
where the brightness is very homogenous, for example due                  transition between the sky and the ground in the following
to overexposure, and therefore the contrast of the skyline                manner.
decreases.                                                                   Having vi as the last extracted vertex of the silhouette, the
   In our experience this naive assumption shows good re-                 algorithm checks the 8-connected neighbours of this point
sults for r = 5 and γ = 0.1 on most images out of the swiss               (see the right side of figure 1) and chooses the non-sky point
dataset published in [2]. In the future we will test more                 as next vertex which has the lowest angle between the in-
complex algorithms, like for instance the skyline extraction              coming line, defined by the two last vertices vi−1 and vi ,
algorithm proposed in [8], with the expectation that they                 and the outgoing line, defined by vi and the neighbouring
will yield even better results. After we have marked all pos-             point. This can easily be done, as shown in figure 3, by
                                                                          checking the 8-connected neighbours from vi in clockwise
                                y
                                                                          direction, starting at the predecessor. The only exception is
                               −1                                         the start point v1 , where we set v0 = (−1, y1 ). This means
                                                                          that we choose in this case the left direct neighbour, which
                   p            0           p                             lies outside of the image, as predecessor.

                                1
                                x                                                                     2     3     4
            −1     0      1          −1     0      1

                                                                                                      1     vi    5
Figure 1: Two possible neighbourhoods of a pixel.
(Left) 4-connected neighbourhood of the pixel p.
                                                                                                 vi−1 0     7     6
(Right) 8-connected neighbourhood of the pixel p.

                                                                          Figure 3: An example how to choose the next silhou-
sible candidates, we can finally let the sky segment grow. For            ette point. Starting at vi−1 as predecessor of the cur-
that, we check for every pixel p(x,y) , if it is a 4-connected            rent vertex vi , the algorithm searches in clockwise
neighbour, for explanation see the left side of figure 1, to a            direction for the next ground pixel (blue). Therefore
pixel of the sky segment and marked as sky candidate (see                 the third pixel will be chosen.
the previous step). If so, the pixel will get marked as sky.
This step is repeated until no more pixels can be added to
the sky segment. At this point, we have a binary image,
which represents the skyline as transition between the sky                  Now, we have the border of the mountain as chain of
and the ground, like the one shown in figure 2.                           neighbouring pixels. To reduce the amount of vertices with-
   For the silhouette extraction, we search a silhouette S =              out loosing any information, we eliminate all vertices which
(v1 , . . . , vn ) where every vertex vi = (xi , yi ) is a two di-        bring no further information gain to the final silhouette in
mensional point where xi describes the x-coordinate and yi                the last step of the extraction phase. For this, we define the

                                                                     20
                              Figure 4: Silhouette as black line with outliers in red on original image.



information gain I of a vertex vj , with 1 < j < n, as                        get recognised as mountain. Since the outline of this outlier
                  (                                                           is not atypical for a mountain silhouette, this outlier is not
                    1, if ∠vj−1 vj vj+1 6= 180◦                               recognised. Farther to the right, there are two other outliers
            Ivj =
                    0 otherwise.                                              where clouds are recognised as mountain by the extraction
                                                                              step. These are, in this case, marked red and such detected
This means that every vertex which lies on one line with his                  by the silhouette cleaning step.
predecessor and successor carries no further information for                     Note, that there are a few parts of the silhouette that
the extracted silhouette. After deleting every vertex with                    get marked as outliers but are not described above. These
Ivj = 0, we obtain our final polygonal chain, which can                       are false positives. However, since the image here is used
now be analyzed in the the cleaning phase described in the                    to showcase the different kind of outliers, these are not dis-
following section.                                                            cussed here.
                                                                                 The silhouette cleaning part of our algorithm is, again,
3.2      Silhouette Cleaning                                                  divided into three parts. As an input, it gets the silhouette
   The extracted silhouette from the previous step may con-                   S = (v1 , . . . , vn ) extracted from a picture.
tain different disturbances. In this step, we try to get rid                     For the recognition of patterns inside the polygonal chain,
of those, so that they cannot affect the step of silhouette                   this representation has disadvantages, because all vertices
matching that gets described in section 3.3. Examples for                     are given in absolute positions. For outlier detection, a
such disturbances are manifold, ranging from objects in front                 model where similar structures do look similar in their rep-
of the mountain, such as trees, to errors produced during                     resentation is beneficial. Therefore, we convert S to a rep-
the extraction process that can be caused by a low contrast                   resentation AP = (ap1 , . . . , apn ), with api = (li , ai ). Here,
between the mountain line and the sky. Essentially, after                     we set
finding an outlier, we currently cut elevations within this                                              (
part of the silhouette away.                                                                               |vi − vi−1 |, if i > 1
                                                                                                    li =
   Some of such outliers are showcased in figure 4. Here, the                                              0             else
black line is the silhouette that has been extracted by the
extraction step as described in section 3.1. The red parts are                and ai as the angle between the vector vi+1 − vi , and the
those parts of the silhouette, that are automatically detected                x-axis for i < n and an = 0◦ . Figure 5 illustrates this.
as outliers. Examples for outliers are the tree tops to the                   During this step, points where the angle does not change
left, that have correctly been marked as outliers. In the                     are removed. Also, artefacts consisting of angles of 180◦
center of the picture, there is a part of the mountain, that                  between two following segments get removed.
has been cut out. Here the contrast between rock and blue                        The basic idea of the outlier detection method itself is
sky gets to low for the extraction step to distinguish them.                  to compare histograms created from parts of the polygonal
Right next to this, clouds around the lower peak in the back                  chain AP to a reference histogram. The latter hereby is
                                                                              created from the silhouettes of the ground truth data as
                                                                              given in [2].
                                    v2                                           The mentioned histograms consist of the two dimensions
                                              a2 = −45◦                       segment length and angle size, just as the points in the
                              |




                                                                              transformed polygonal chain AP , and are normalised. This
                              v1
                          −




                                         l3




                                                                              means, that the sum of the frequencies of each bucket is 1.
                          2




                                          =
                         |v




                                          |v 3
                     =




                                                                              For the reference data, we compute one histogram from each
                    l2




                                              −




                                                                              image’s silhouette. Finally, the mean of all these histograms
                                                 v2
                                                   |




                         a1 = 45◦                                             is computed and becomes the reference histogram Hr .
      l1 = 0   v1                                         v3   a3 = 0◦
                                                                                 In order to find outliers, we use a sliding window ap-
                                                                              proach over the polygonal chain APi of a given input im-
Figure 5: Conversion of polygonal chain for better                            age. We therefore use a section of m successive segments
pattern recognition.                                                          api , . . . , api+m−1 in order to compute a histogram Hi with
                                                                              the same bucket distribution as in Hr . Then, for every point

                                                                         21
used to compute Hi , we store the distance between Hi and                    so that we find a vertex ve . This results in four vertices’
Hr . By this approach, for most points, multiple distance                    indices j ≤ s < e ≤ j 1 . From this, the part between vs and
values get stored when the sliding window is moved on. The                   ve is now finally replaced by a straight line between these
final distance di for a point api is now computed as the                     two vertices.
average distance of all distances stored for this point.                       By this, we obtain a cleaned silhouette that can be used
   As distance function in this case we use the one given in                 in further steps for the silhouette matching.
the following
                                                                             3.3    Silhouette Matching
  Definition 1. Let G = (g1 , . . . , gk ), H = (h1 , . . . , hk ) be           Once the silhouette is extracted and cleared of outliers,
normalised histograms with the same bucket distribution.                     it is now time to match it to the reference data in order to
  The above average distance of G to H is defined by                         determine the mountain that is shown by the picture. Cur-
 D(G, H) := max(|aab(G)|, |aab(H)|) − |aab(G) ∩ aab(H)|,                     rently, we have not implemented any method, yet. However,
                                                                             we will introduce some methods that we are going to test in
where                                                                        the future.
                                                           
                                                        1                       The converted silhouettes AP = (ap1 , . . . , apn ) from the
             aab(F ) :=       i ∈ {1, . . . , k} fi ≥
                                                        k                    previous section can be easily changed to be interpreted as
                                                                             time series by setting AP 0 = (ap01 , . . . , ap0n ) where
with F being a normalised histogram with k buckets.
                                                                                                                     Xi
This can be implemented to compute in linear time to the                                       ap0i = (li0 , vi ) = (   li , v i )
number of buckets, or, for histograms of a fixed length, in                                                          j=1
constant time.
  Based on a number of images used as training data, in                      for api = (li , vi ). By this conversion, every point ap0i has
the same way as described above, the medium distance µ                       the length of the polygonal chain until that point as its first
and the standard deviation σ can be determined. We use                       component. Since li > 0 for all i > 1, the length component
these precomputed values in order to find two thresholds                     of AP 0 is strictly monotonic increasing, just as the time
τin > τout , such that we can characterise a point api as a                  dimension in a time series. This is because we do not allow
strong anomaly if it holds that its distance                                 identical points in our silhouette. With this, it is possible
                                                                             to compare different silhouettes for similarity by using time
                          di ≥ µ + τin · σ                                   series comparison techniques such as [6, 9]. These methods
and a weak anomaly if                                                        have the advantage of being rotation invariant which, in our
                                                                             case, means that the image does not have to have the same
                        di ≥ µ + τout · σ.                                   image section as our reference data.
We also determine a length threshold l. For a part of the                       Due to the great number of mountains and peaks that ex-
polygonal chain, to be recognised as an outlier, it must hold                ist and on the notion, that every peak can be photographed
that at least l successive points api , . . . , api+l−1 must be              from different angles, there are hundreds of thousands of sil-
strong anomalies. An outlier can have any length that is                     houettes to match for each query image. Due to this it is
equal to or bigger than l. Finally, if we find such an outlier,              clear, that even with a high-performance matching method
we expand it by adding points to it that are adjacent to                     a preselection is necessary.
the outlier and weak anomalies. By this, it is possible for                     There exist many methods suitable for this, such as the
outliers to merge.                                                           technique presented in [7] that uses the position of the sun
   As an example, say, ap7 , . . . , ap20 are strong anomalies               and the time of day, at which the photo has been taken, in
because their distances d7 , . . . , d20 are bigger than µ+τin ·σ.           order to compute the approximate location on earth. How-
If we set l = 4 for this example, these thirteen points are                  ever, this method has an average localisation error of about
recognised as an outlier o = {7, . . . , 20}. Now, let us assume             100 km. Because of this, it is useful for a rough preselection,
that ap5 and ap6 are weak anomalies as well as ap22 . Then                   however not sufficient in our case.
ap6 and ap5 belong to the extended outlier, because they                        Therefore, we aim to reuse the idea of our outlier de-
are both adjacent to the outlier. On the other hand, ap22                    tection method for matching. Instead of computing his-
does not belong to the extended outlier, because ap21 is not                 tograms of short sequences of the silhouette, in this case the
part of the outlier since it is not a weak anomaly.                          histogram HS over the whole silhouette AP is computed.
   Once all outliers have been detected, the next step is to                 This is then compared to the reference images’ histograms
remove them. Currently, we use a very simple approach for                    HRi , i ∈ {1, . . . , nref } with nref the number of the reference
this where, in the original silhouette S, the overhanging part               image, which we, as discussed in section 3.2, have to compute
of the outlier is replaced by a straight line. This is based on              anyway. The comparison between two histograms, with our
the notion, that the reasons for most outliers in the silhou-                distance function, is linear to the number of buckets in the
ette are either trees or clouds. By just removing them, in                   histograms, or, since this number is fixed, constant. Now,
many cases, we get results that resemble the original shape                  let d(HS , HRi ) denote the distance between the histogram
of the mountain in an easy way.                                              of the new silhouette S and the ith reference histogram. If
   Let o = {i, . . . , j} with i + l ≤ j be an outlier. Then we              d(HS , HRi ) is small, this does not necessarily mean, that
draw a straight line from vi to vj . Now, while the distance                 the silhouettes are identical or even really similar, because
between vs , starting with s = i + 1, and that straight is                   the histogram representation does not preserve the order of
smaller than a preset value d, we find the vertex vs with the                the contained elements. On the other hand, if the distance
largest index, that is close enough to the original straight                 1
                                                                               In general, it is possible, that s ≥ e. In this case, no
line. The same is done from the other end of the outlier                     substitution is performed.

                                                                        22
between two silhouettes is large, we can say that those sil-            buildings are mostly photographed with the sky as back-
houettes are not similar.                                               ground, too. Furthermore, we aim to test our method on
   Due to this, we plan to only use the time series compar-             more diverse areas, such as the recognition of certain ob-
ison methods from the beginning of this section on those                jects in MRT screenings or X-rays. These tasks will make
reference silhouettes, that yield small histogram distances.            it necessary to change some parts of our approach, natu-
Further on, we will evaluate if the position determination              rally, because there, it will be interesting to be able to tell,
method presented in [7] is able to boost the performance of             for example, which organs are shown in a picture, or, as a
our solution.                                                           next step, to be able to identify different kinds of tumours
                                                                        automatically.
4.   CONCLUSION AND FUTURE WORK
   In this work we presented a new approach to motif recog-             5.   REFERENCES
nition based on silhouettes on the example of mountains.                 [1] Instagram @ONLINE, accessed April 7, 2015.
Our method consists of three steps, of which the first two                   https://instagram.com/press/.
have already been implemented while we are working on the                [2] G. Baatz, O. Saurer, K. Köser, and M. Pollefeys.
third step. First results show that we are able to extract sil-              Large Scale Visual Geo-Localization of Images in
houettes with relatively few errors from images and that our                 Mountainous Terrain. In Computer Vision - ECCV
outlier detection step does indeed find meaningful anoma-                    2012, Lecture Notes in Computer Science, pages
lies. We have currently tested the first two steps of our                    517–530. 2012.
                                                                         [3] L. Baboud, M. Čadı́k, E. Eisemann, and H.-P. Seidel.
                                                                             Automatic Photo-to-terrain Alignment for the
                                                                             Annotation of Mountain Pictures. In Proc. of the 2011
                                                                             IEEE Conference on Computer Vision and Pattern
                                                                             Recognition, CVPR ’11, pages 41–48, 2011.
                                                                         [4] R. Fedorov, P. Fraternali, and M. Tagliasacchi.
                                                                             Mountain Peak Identification in Visual Content Based
                                                                             on Coarse Digital Elevation Models. In Proc. of the 3rd
                                                                             ACM International Workshop on Multimedia Analysis
                                                                             for Ecological Data, MAED ’14, pages 7–11, 2014.
                                                                         [5] P. C. N. Jr, M. Mukunoki, M. Minoh, and K. Ikeda.
                                                                             Estimating Camera Position and Orientation from
                                                                             Geographical Map and Mountain Image. In 38th
                                                                             Research Meeting of the Pattern Sensing Group,
                                                                             Society of Instrument and Control Engineers, pages
                                                                             9–16, 1997.
                                                                         [6] E. J. Keogh, L. Wei, X. Xi, S.-H. Lee, and M. Vlachos.
Figure 6: Silhouette as black line with outliers in                          LB Keogh Supports Exact Indexing of Shapes under
red on original image.                                                       Rotation Invariance with Arbitrary Representations
                                                                             and Distance Measures. In VLDB, pages 882–893,
                                                                             2006.
approach as described in sections 3.1 and 3.2 on 18 images               [7] J.-F. Lalonde, S. G. Narasimhan, and A. A. Efros.
from the reference dataset. Results for the first of these im-               What Do the Sun and the Sky Tell Us About the
ages can be seen in figures 2 and 4. Of these 18 images, 17                  Camera? International Journal of Computer Vision,
result in good silhouettes, i.e. silhouettes with relatively few             88(1):24–51, 2010.
outliers of which most get marked and are correctable. The
                                                                         [8] W.-N. Lie, T. C.-I. Lin, T.-C. Lin, and K.-S. Hung. A
last image, though, does not get recognised correctly due to
                                                                             robust dynamic programming algorithm to extract
low contrast in terms of brightness. This is illustrated by
                                                                             skyline in images for navigation. Pattern Recognition
figure 6. We do, however, notice this, because most of the
                                                                             Letters, 26(2):221–230, 2005.
silhouette gets marked as outlier.
                                                                         [9] J. Lin, R. Khade, and Y. Li. Rotation-invariant
   The next step is to implement a silhouette matching al-
                                                                             similarity in time series using bag-of-patterns
gorithm. This has already been outlined in section 3.3. Of
                                                                             representation. J Intell Inf Syst, 39(2):287–315, 2012.
course, it is necessary, to benchmark the parts of that step
in order to find weaknesses early on. Once the system is                [10] L. Porzi, S. R. Buló, P. Valigi, O. Lanz, and E. Ricci.
complete we aim to evaluate it on the whole of the dataset                   Learning Contours for Automatic Annotations of
of [2]. Based on these result, we will tune the parameters of                Mountains Pictures on a Smartphone. In Proc. of the
our method to find settings, that do work well generally.                    International Conference on Distributed Smart
   We also aim to create a mountain recognition corpus of our                Cameras, ICDSC ’14, pages 13:1–13:6, 2014.
own since [2] focuses on images of mountains in Switzerland             [11] F. Y. Shih and S. Cheng. Automatic seeded region
only. Our corpus is aimed to be international and should                     growing for color image segmentation. Image and
feature images from mountains from all over the world.                       Vision Computing, 23(10):877–886, 2005.
   Another interesting perspective is to test, whether our              [12] L. Tao, L. Yuan, and J. Sun. SkyFinder:
framework will work on other types of images, such as city                   Attribute-based Sky Image Search. In ACM
skylines or pictures of single buildings. With these tasks                   SIGGRAPH 2009 Papers, SIGGRAPH ’09, pages
the method itself could be quite similar, since skylines and                 68:1–68:5, 2009.

                                                                   23