=Paper= {{Paper |id=Vol-2145/p03 |storemode=property |title=Ontology Based Image Recognition: A Review |pdfUrl=https://ceur-ws.org/Vol-2145/p03.pdf |volume=Vol-2145 |authors=Sandeepak Bhandari,Audrius Kulikajevas }} ==Ontology Based Image Recognition: A Review== https://ceur-ws.org/Vol-2145/p03.pdf
         Ontology Based Image Recognition: A Review

                    Sandeepak Bhandari                                                             Audrius Kulikajevas
            Department of Software Engineering                                            Department of Multimedia Engineering
             Kaunas University of Technology                                                Kaunas University of Technology
                    Kaunas, Lithuania                                                              Kaunas, Lithuania
                Sandeepak525@gmail.com                                                          akulikajevas@gmail.com


    Abstract — Due to lack of domain knowledge about the                     possibilities to create the ground truths involve image annotation
semantics of the image, image retrieval rate is usually unsatisfying.        by keywords, free text annotations or annotations based on
To improve this, image labels must be provided by the author of              ontologies [3] which allows to add hierarchical structure to
the dataset. Applying ontology in digital image recognition to               collection of keywords in order to produce a taxonomy.
extract relevant information such as timeline, features,                     Furthermore, ontologies can be used for activity recognition
visualization to help understand and interpret it, so that we can            from video feed [4] [5], allowing cognitive vision systems to
focus on the most relevant information.                                      semantically identify activities from low-level events.
                                                                             Traditional object recognition methods rely on detecting an
   Keywords — Ontology, features, image labeling, classification,
                                                                             individual object as a whole, ontology based methods are
semantic gap, cognitive vision.
                                                                             capable of detecting objects based on their individual component
                       I. INTRODUCTION                                       compositions [6]. Thus, making ontology a powerful tool to be
                                                                             used in the future of computer vision and machine learning.
    Computer vision as of late has seen a resurgence in
popularity with the increased interests in machine learning,                     Anne-Marrie Tousch et al. [7] have made contributions in
specifically neural networks. Despite of this, majority of work is           survey of semantic techniques used for image annotation. In the
restricted to specific domain knowledge such as specific object              paper, authors have made an analysis about the nature of
instance recognition, face recognition, etc. which makes these               semantics to describe the image, where they pointed out three
type of computer systems lacks the flexibility and adaptability              main levels to describe the semantics. First level describes
to different domain object recognition. The term cognitive                   objective and concrete object descriptions in relation to abstract
vision [1] has been introduced to encapsulate an attempt to                  ones for example a crying person is an objective description,
achieve more robust and adaptive computer vision systems.                    while inferred pain would be subjective one without deeper
Cognitive vision systems can infer spatial relations such as an              knowledge of the semantic context. Second level compares
ocean is often near a beach, a lake is besides a grassland [2].              generic versus specific objects, also referred to as individual
With this paper we try to evaluate previous research done on                 instances in object recognition, i.e. a bridge vs. Golden gate
object recognition tasks with the addition of semantics in a form            bridge. Third and final semantics level is split into four facets of
of domain ontologies. This paper is organized as follows:                    time, localization, events or objects. Another contribution to the
Section II provides short introduction in related work. Section              discussion was regarding the semantic analysis, explanation of
Errore. L'origine riferimento non è stata trovata. provides                  the semantic gap and observations on discrepancies between
in-depth analysis of Semantic Web applications in the field of               human ability to recognize almost instantly an enormous number
object recognition and tasks related to it, such as segmentation.            of objects versus the possibilities of image recognitions done by
Section IV describes possible applications of the Semantic Web               machines. They clarify the term semantics in the context that
in the domain specific object recognition tasks. Finally, Section            they (and by extension us) use in their paper as an image
Errore. L'origine riferimento non è stata trovata. provides                  description in natural language, with semantic analysis referring
our conclusions to the effectiveness of using ontology in the                to any kind of transcription of an image into linguistic
field of computer vision.                                                    expression. Furthermore, they describe several approaches for
                                                                             semantic analysis using unstructured vocabularies such as:
                      II. RELATED WORK
                                                                             1. Direct methods using plain representation of data and plain
    There has been a variety of recent studies in the field of                  statistical methods.
cognitive computing. With the rise of machine learning and
neural networks it is worth to re-evaluate the benefits of                   2. Linguistic methods based on the use of an intermediate
applying ontologies to existing object classification and                        visual vocabulary between raw numerical data and high-
recognition tasks. One of the most important tasks when it                       level semantics.
comes to object recognition is the creation of image labeling                3. Compositional methods where parts of an image are
system for the identification of ground truths. Some of the                      identified before the whole image or its parts are annotated.
                                                                             4. Structural methods where a geometry of parts is used.
  Copyright held by the author(s).




                                                                        13
5. Hierarchical compositional methods where a hierarchy of                segmentation is a very challenging, albeit necessary task for any
   parts is constructed for recognition.                                  kind of image recognition algorithm therefore any optimizations
                                                                          of it are a welcome addition to any cognitive vision system. In
6. Communicating methods when information is shared                       this section of the paper we will focus on application of the
    between categories.                                                   semantic web technologies in order to optimize the results of the
7. Hierarchical methods that search            for   hierarchical         segmentation algorithms. The quality of recognition highly
   relationships between categories.                                      depends on the ability to segment any given frame. Previously
                                                                          classical algorithms such as watershed have been used to
8. Multilabel methods assigning several global labels                     achieve image segmentation. However, such algorithms had
   simultaneously to an image.                                            lacked the efficiency, precision and robustness in real world
    Finally, authors have touched on semantic image analysis              scenarios where occlusion, motion and illumination play key
using structured vocabularies, where they distinguish two main            roles in the scene. There has been some research in adapting
relationship types commonly found in ontologies: Is-A-Part-Of             convolutional neural networks for the segmentation task such in
and Is-A relationship. Where the later signifies inheritance and          techniques such as Mask R-CNN [9]. However, due to the fact
former signifies composition, such as being part of a bigger              that CNNs highly depend on the domain they have been trained
model. However, according to authors such relationships are not           on, they have troubles detecting changes even in the domains
descriptive enough and suggest finer organizations of methods             that they are familiar. In the paper [10] a method was proposed
on how semantic relations are introduced in the system such as:           which involves simultaneous image segmentation and detection
                                                                          of simple objects, imitating partially how the human vision
1. Linguistic methods where the semantic structure is used at             works. The initial region labeling is performed based on regions
    the level of vocabulary, independently from the image, e.g.           low-level descriptors with concepts stored in an ontological
    to expand the vocabulary.                                             knowledge base. This allows the proposed technique to associate
2. Compositional      methods     that   use   meronyms,      e.g.        each region to a fuzzy set of candidate labels. Afterwards, a
    components.                                                           merging process is performed based on new similarity
                                                                          measurements and merging criteria that are defined at the
3. Communication methods that use semantic relations to share             semantic level with the use of fuzzy set operations. Furthermore,
    information between concepts, be it for the feature                   this approach is invariant to the chosen region growing
    extraction or for classification.                                     algorithm and can be applied to any of them with certain
4. Hierarchical methods use Is-A relations to improve                     modifications, to demonstrate this, authors apply semantics to
   categorization and/or to allow classification at different             watershed and RSST segmentation, experimentally showing that
   semantic levels.                                                       semantic watershed had 90% accuracy, semantic RSST
                                                                          accuracy of 88% compared to classical RSST approach of 82%.
    There have been other contributions to the discussion of              Other experiments have shown a similar increase in accuracy of
automatic image annotation techniques by applying semantics               7-8%. To achieve these improvements, authors have proposed
[8] where author review on different approaches of automatic              to adjust merging processes as well as termination criteria of the
image annotation are reviewed: 1) generative model based                  classical region growing algorithms. What is more, a novel
image annotation, 2) discriminative model based image                     ontological representation for context is introduced, combining
annotation, 3) graph model based image annotation. Due to the             fuzzy theory and fuzzy algebra with characteristics derived from
rapid advancement of digital technology in the last few years,            the semantic web, such as reification. Membership degrees of
there has been an increasingly large number of images available           labels are assigned to regions derived from the semantic
on the Web, making manual annotation of images an impossible              segmentation are re-estimated appropriately, according to
task. However, we will be focusing on the steps of applying               context-based membership degree readjustment algorithm,
semantics to each step of image recognition individually along            which utilizes ontological knowledge, to optimize membership
with applications of semantics to a narrow domain.                        degrees for the detected concepts of each region in the scene.
                                                                          While the contextualization and initial region labeling steps are
        III. SEMANTIC WEB IN OBJECT RECOGNITION                           domain specific and require the domain of the images to be
    Image retrieval is the key task to be solved in the science of        provided as input, the rest of the approach is domain
computer vision. While more classical approaches had to an                independent. In another paper [11], authors have shown that
extent worked in the past for simple object recognition, more             applying semantics to a convolutional neural network for the
general approaches are required in the modern times. In this              task of image segmentation can greatly increase performance.
section we present research done in the past on application of            Authors have experimentally proven that a neural network that
ontology to some steps of image recognition to improve image              is trained end-to-end, pixel-to-pixel on semantic segmentation
retrieval performance. We also present TABLE I. for                       can achieve the best asymptotical and absolute performance
comparison between such methods.                                          results without the downside of other methods such as patch-
                                                                          wise training that lack the efficiency of convolutional training,
A. Semantics application in image segmentation                            or the needed inclusion of superpixels such as the ones used in
    One of the main tasks in any image recognition software is            [12] where they are used to generate semantic objects parts.
the image segmentation step. During this step, an image is                However, the superpixels used in the later additionally give a
segmented into viable detection Regions of Interest (ROIs) to             bridge between low-level and high-level features by
optimize the following recognition steps. However, image




                                                                     14
incorporating semantic knowledge allowing to infer labels of              associated relations among various resources (e.g., Web pages
individual segmented regions.                                             or documents in digital library) aiming at extending the loosely
                                                                          connected network of no semantics (e.g., the Web) to an
B. Semantics application to image labeling                                association-rich network. Since the theory of cognitive science
    Convolutional neural networks have shown great                        considers that the associated relations can make one resource
performance in their ability to correctly detect the objects and          more comprehensive to users, the motivation of SLN is to
actions in the domains that they are familiar with. However, in           organize the associated resources loosely distributed in the Web
order for the network to be able to interpret the image it sees           for effectively supporting the Web intelligent activities such as
correctly it needs a vast amount of data samples from which to            browsing, knowledge discovery and publishing, etc. The tags
compare against. There already exists databases of labeled                and surrounding texts of multimedia resources are used to
images such as ImageNet that can provide useful training data,            represent the semantic content. The relatedness between tags
although with the rapid development of social media, automatic            and surrounding texts are implemented in the semantic Link
techniques capable of effectively understanding and labeling the          Network model. The data sets including about 100 thousand
media are required. Content aware systems are capable of                  images with social tags from Flickr are used to evaluate the
indexing, searching, retrieving, filtering and recommending               proposed method. Two datamining tasks including clustering
multimedia from the vast quantity of media posted of social               and searching are performed by the proposed framework, which
media. However, such unconstrained data has very high                     shows the effectiveness and robust of the proposed framework.
complexity of objects, events and interactions in the consumer
videos. Such unconstrained domains create numerous problems               C. Semantic web application to recognize the object based on
for previously available video analysis algorithms, such as the               individual parts
ones capable of recognizing human activity. What is more, home                Semantic web (ontologies) gives us the powerful ability to
videos being the most prevalent genre suffer from increased               infer certain attributes about the object based on the domain
good feature extraction problems due to poor lightning                    knowledge of the individual object components of said object.
conditions, occlusion, clutter in the scene, shaking and/or low-          Allowing us to instead of recognizing a specific object instance
resolution cameras and various other background noise. In the             or it’s class to recognize individual components available in the
reviewed paper [13] authors try to address these problems by              scene and based on the known Is-A/Is-Part-Of relationships infer
introducing a new attribute-learning framework that learns a              what kind of objects can be created with those components. This
unified semilatent attribute space. Latent attributes are used to         novel way of applying ontologies to the task of object
represent all shared aspects of the data, which are not explicitly        recognition was the application of object semantics based on
included in users’ sparse and incomplete annotations. `Latent             what the object is being used for [15]. In the case of the paper,
attributes are used as complimentary annotations for user                 the semantics were used for tool recognition where the type of
specified attributes and are instead discovered by the model              tool is inferred by what it’s functionality in relation to human
through joint learning of semilatent attribute space. This gives          hand is. In the work, authors assert that objects do not change
authors a mechanism for semantic feature reduction from the               functionality based on changes to their details such as a cup
raw data in multiple modalities to a unified lower dimensional            having multiple handles would still be considered a cup. Instead,
semantic attribute space. These semilatent attributes are used to         they focus on object parts and their combinations to assign a
bridge the semantic gap with reduced dependence on                        function to a tool. This approach gives the advantage that the
completeness of attribute ontology and accuracy of the training           system is not trained to wantonly different individual objects and
attribute labels. Described method has given the authors the              will instead detect parts that contribute to the fundamental tool
flexibility to learn a full semantic-attribute space of the video         functionality. The proposed method consists of three main
feed irrespective of how well defined and complete the user               stages: preprocessing, object signature extraction and object
given data about it is. Furthermore, they have managed to                 similarity calculations. Object signature extraction is subdivided
improve multitask and N-shot learning by leveraging latent                into two steps: part signature extraction and pose signature
attributes, went beyond existing zero-shot learning approaches            extraction. During part signature extraction a support vector
by exploiting latent attributes, leveraged attributes in                  machine (SVM) is used to find the characteristic descriptions of
conjunction with multimodal data to improve cross-media                   given object. Pose signatures describe how parts are attached to
understanding, enabling new tasks such as explicitly learning             each other which provides the information on how the parts are
which modalities attributes appear in. Finally, the proposed              rotated to respect to one another and locations at which parts are
method is applicable to large multimedia data sets as it is               connected to one another. Once the features are extracted
expressed in a significantly more scalable way than previously            function analyzer algorithm is ran which allows to compare
available techniques, making the technique invariant to the               objects and assign functional meanings to them thus, achieving
length of the given input video or the density of available good          these tasks: recognizing the object, generalize the object
features in it.                                                           between different objects with different number of parts,
                                                                          assigning multiple functions to the same object, providing the
    Other research [14] supports the notion that that multimedia
                                                                          ability to find another use for an object. Authors have shown that
resources “in the wild” are growing at a staggering rate and that
                                                                          their method unlike deep convolutional neural networks do not
the rapidly increasing number of multimedia resources has
                                                                          require such extensive training sets and can generalize on very
brought an urgent need to develop intelligent methods to
                                                                          few training samples.
organize and process them. In this paper, the Semantic Link
Network model is used for organizing multimedia resources.
Semantic Link Network (SLN) is designed to establish




                                                                     15
                                       TABLE I.         ONTOLOGY BASED IMAGE RETRIEVAL METHODS
             S. No               Topics/Concepts                                Pros                                 Cons
                                                               Utilizations diverse watchwords; Use
                      Keyword based image retrieval (text                                             Can't depict a picture totally and
             1.                                                at least one picture properties;
                      based, field based, structure based)                                            semantically
                                                               Keywords portraying picture data
                      Low-level feature: color (histogram      Shading likeness based recovery;
                                                                                                      Shading alone can't depict the full
             2.       and moments, dominant color, color       Color cognizance vector based
                                                                                                      picture content
                      cluster, etc.)                           recovery
                      Low-level feature: shape (fourier
                                                                                                      Shape alone can't depict the full
             3.       transform, curvature scale, template     Template matching method
                                                                                                      picture content
                      tatching, etc.)
                      Low-level feature: texture (wavelet      Shading likeness based recovery;
                                                                                                      Shading and surface alone can't
             4.       transform, edge statistics, Gabor        Color cognizance vector based
                                                                                                      depict the full picture content
                      filters, statistical based, etc.)        recovery
                                                               Foreseeing amino corrosive changes
                      Scale Invariant Feature Transform:                                              Science application; Concluded to
             5.                                                in Protein structure; Features for
                      SIFT                                                                            discover better descriptor
                                                               picture recovery
                                                               Novel scale and revolution invariant
                                                                                                      Not totally indented to endeavor
             6.       Speeded Up Robust Feature: SURF          element depiction; CBIR visual
                                                                                                      semantic hole filling
                                                               consideration demonstrate
                                                               Study: CBIR with abnormal state
                      Ontology    based    image   retrieval                                          Basic ontology with restricted visual
             7.                                                semantics         Ontology    based
                      methods                                                                         highlights
                                                               intellectual vision



                                                                     Ontology formation

                  Domain                                             Knowledge acquisition


                                                                      Visual Idea Ontology                                          Knowledge Base


              Examples

                                          Similarity measure                                             Reducing semantic gap




             Automatic                                                   Image retrieval
                                          Retrieval process                                            Retrieval results                      Results
          concept extraction                                                engine




                     Image database
                                                                                           User query




        Image samples




                                            Fig. 1. Design ontology connected picture recovery process



   The Fig. 1 setting portrayal could be surrounded utilizing                   Description Logics (DL), the information portrayal could be
metaphysics from the above said picture ideas. Applying                         framed. The DAML (DARPA Markup Language) and OIL




                                                                           16
(Ontology Interface Language) are utilized for this usage which              summarize the main applications of ontologies in GEOBIA,
is accessible with OWL (Web Ontology Language). Standards                    especially for data discovery, automatic image interpretation,
for portraying connection between picture includes in                        data interoperability, workflow management and data
metaphysics can be characterized utilizing the DL moreover.                  publication. Among the broad spectrum of applications for
Once the idea cosmology encircled (for instance spatial                      ontologies, mention systems engineering, interoperability and
philosophy), the similitude coordinating of client inquiry with              communication. Because the method is a part of systems
extricated picture highlight is evaluated through the metaphysics            engineering, GEOBIA experts follow a series of analytical
chain of command. This furnishes more closeness to client                    procedures to develop a system designed to produce geographic
question with pictures in database. There are a few instruments              information. These procedures principally involve (i) data
have been produced to be specific “OntoVis” which perform                    discovery and (ii) data processing and analysis, i.e., image
three undertakings space learning securing, metaphysics driven               interpretation.
visual securing and picture case administration. The advantage
of utilizing visual thought metaphysics is to fill the semantic              C. Application to sports events
hole however much as could be expected amongst low and                           In this paper [19], authors present an ontology-based
abnormal state idea                                                          information extraction and retrieval system and its application in
                                                                             the soccer domain. In general, authors deal with three issues in
       IV. SEMANTIC WEB IN DOMAIN SPECIFIC TASKS                             semantic search, namely, usability, scalability and retrieval
                                                                             performance. Authors propose a keyword-based semantic
A. Application in robotics
                                                                             retrieval approach. The performance of the system is improved
    Mobile robots are one of the key applications that benefit               considerably using domain-specific information extraction,
from object recognition technologies. More and more robots are               inferencing and rules. Scalability is achieved by adapting a
entering human living and working environments, where to                     semantic indexing approach and representing the whole world
operate successfully they are faced with multitude of real world             as small independent models. The system is implemented using
challenges such as being able to handle many objects located in              the state-of-the-art technologies in Semantic Web and its
different places. To overcome these challenges a solution of                 performance is evaluated against traditional systems as well as
robot efficiency needs to be found both in terms of                          the query expansion methods. Furthermore, a detailed
computational efficiency and reducing the amount of detected                 evaluation is provided to observe the performance gain due to
false positives. Addition of ontology has shown to be beneficial             domain-specific information extraction and inferencing.
in the object recognition tasks for methods such as RoboEarth
[16] where the addition of such semantic mapping layer has                       Authors presented a novel semantic retrieval framework and
experimentally shown to decrease computational time by only                  its application in the soccer domain, which includes all the
checking against the 10 most promising object annotations in                 aspects of Semantic Web, namely, ontology development,
large object databases. RoboEarth describes methodology they                 information extraction, ontology population, inferencing,
name as “action recipes” which semantically describe what                    semantic rules, semantic indexing and retrieval. When these
action need to be performed to complete a specific task such as              technologies are combined with the comfort of keyword based
map the environment or to determine the location of an object.               search interface, authors obtain a user-friendly, high
For example, “ObjectSearch” action recipe based on the prior                 performance and scalable semantic retrieval system. The
partial room knowledge and its landmarks is capable to infer                 evaluation results show that this approach can easily outperform
potential locations from where the desired object might be                   both the traditional approach and the query expansion methods.
detected.                                                                    Moreover, authors observed that the system can answer complex
                                                                             semantic queries without requiring formal queries such as
B. Application in Geographic Information Science                             SPARQL. Authors observe that the system can get close to the
    GEOBIA (Geographic Object-Based Image Analysis) [17]                     performance of SPARQL, which is the best that can be achieved
is not only a hot topic of current remote sensing and                        with semantic querying. Finally, authors show how the structural
geographical research. It is believed to be a paradigm in remote             ambiguities can be resolved easily using semantic indexing.
sensing and Geographic Information Science (GIScience). It
aims to develop automated methods for partitioning remote                                         V. CONCLUSIONS
sensing (RS) imagery into meaningful image objects, and to                       In this paper we present the importance and usefulness of
assess their characteristics through spatial, spectral, textural, and        applying ontology for image recognition tasks. We have
temporal features, thus generating new geographic information                reviewed different ontology based techniques, compared them
in a GIS ready format.                                                       to more classical approaches such as SIFT and SURF and
                                                                             provided with a list of benefits and possible drawbacks of using
    Geographic Object-Based Image Analysis (GEOBIA) [18]
                                                                             such techniques. With our paper we have concluded that
represents the most innovative new trend for processing remote
                                                                             applying semantics can greatly improve not only the overall
sensing images that has appeared during the last decade.
                                                                             performance of object recognition but also the performance and
However, its application is mainly based on expert knowledge,
                                                                             quality of individual tasks required for object recognition such
which consequently highlights important scientific issues with
                                                                             as image segmentation. Moreover, we have found that ontology
respect to the robustness of the methods applied in GEOBIA. In
                                                                             can be used to substantially reduce semantic gap i.e. the
this paper, authors argue that GEOBIA would benefit from
                                                                             difference between the understanding of images by human and
another technical enhancement involving knowledge
                                                                             interpretation of images by machine, allowing for better
representation techniques such as ontologies. Authors
                                                                             automatization in training neural networks, as the dataset




                                                                        17
preparation can be offloaded to a machine instead of being                    Segmentation Using Mixture Models and Multiple
manufactured by hand. Finally, we have discussed the concept                  CRFs,” IEEE Trans. Image Process., vol. 25, no. 7, pp.
of semantic web, based on which the ontology can be formed.                   3233–3248, 2016.
                                                                       [13]   Y. Fu, T. M. Hospedales, T. Xiang, and S. Gong,
                                                                              “Learning multimodal latent attributes,” IEEE Trans.
                        REFERENCES                                            Pattern Anal. Mach. Intell., vol. 36, no. 2, pp. 303–316,
                                                                              2014.
[1]    P. Auer et al., “A Research Roadmap of Cognitive                [14]   U. Manzoor and M. A. Balubaid, “Semantic Image
       Vision,” IST Proj. IST-2001-35454, 2005.                               Retrieval : An Ontology Based Approach,” vol. 4, no.
[2]    J. P. Schober, T. Hermes, and O. Herzog, “Content-                     4, pp. 1–8, 2015.
       based image retrieval by ontology-based object                  [15]   M. Schoeler and F. Worgotter, “Bootstrapping the
       recognition,” KI-2004 Work. Appl. Descr. Logics, no.                   Semantics of Tools: Affordance Analysis of Real
       January, 2004.                                                         World Objects on a Per-part Basis,” IEEE Trans. Cogn.
[3]    A. Hanbury, “A survey of methods for image                             Dev. Syst., vol. 8, no. 2, pp. 84–98, 2016.
       annotation,” J. Vis. Lang. Comput., vol. 19, no. 5, pp.         [16]   L. Riazuelo et al., “RoboEarth Semantic Mapping: A
       617–627, 2008.                                                         Cloud Enabled Knowledge-Based Approach,” IEEE
[4]    D. Tahmoush and C. Bonial, “Applying Attributes to                     Trans. Autom. Sci. Eng., vol. 12, no. 2, pp. 432–443,
       Improve Human Activity Recognition,” Appl. Imag.                       2015.
       Pattern Recognit. Work. (AIPR), 2015 IEEE, 2015.                [17]   H. Y. Gu, H. T. Li, L. Yan, and X. J. Lu, “A framework
[5]    U. Akdemir, P. Turaga, and R. Chellappa, “An                           for Geographic Object-Based Image Analysis
       ontology based approach for activity recognition from                  (GEOBIA) based on geographic ontology,” Int. Arch.
       video,” Proceeding 16th ACM Int. Conf. Multimed., pp.                  Photogramm. Remote Sens. Spat. Inf. Sci. - ISPRS
       709–712, 2008.                                                         Arch., vol. 40, no. 7W4, pp. 27–33, 2015.
[6]    S. Tongphu, B. Suntisrivaraporn, B. Uyyanonvara, and            [18]   D. Arvor, L. Durieux, S. Andrés, and M. A. Laporte,
       M. N. Dailey, “Ontology-based object recognition of                    “Advances in Geographic Object-Based Image
       car sides,” 2012 9th Int. Conf. Electr. Eng. Comput.                   Analysis with ontologies: A review of main
       Telecommun. Inf. Technol. ECTI-CON 2012, 2012.                         contributions and limitations from a remote sensing
[7]    A. M. Tousch, S. Herbin, and J. Y. Audibert, “Semantic                 perspective,” ISPRS J. Photogramm. Remote Sens., vol.
       hierarchies for image annotation: A survey,” Pattern                   82, pp. 125–137, 2013.
       Recognit., vol. 45, no. 1, pp. 333–345, 2012.                   [19]   S. Kara, Ö. Alan, O. Sabuncu, S. Akpınar, N. K.
[8]    D. Zhang, M. M. Islam, and G. Lu, “A review on                         Cicekli, and F. N. Alpaslan, “An ontology-based
       automatic image annotation techniques,” Pattern                        retrieval system using semantic indexing,” Inf. Syst.,
       Recognit., vol. 45, no. 1, pp. 346–362, 2012.                          vol. 37, no. 4, pp. 294–305, 2011.
[9]    K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask           [20]   M: Wróbel, J. T. Starczewski, and C. Napoli,
       R-CNN,” 2017.                                                          “Handwriting recognition with extraction of letter
[10]   T. Athanasiadis, P. Mylonas, Y. Avrithis, and S.                       fragments”, International Conference on Artificial
       Kollias, “Semantic image segmentation and object                       Intelligence and Soft Computing, pp. 183-192, 2017.
       labeling,” IEEE Trans. Circuits Syst. Video Technol.,           [21]   J. T. Starczewski, S. Pabiasz, N. Vladymyrska, A.
       vol. 17, no. 3, pp. 298–311, 2007.                                     Marvuglia, C. Napoli, and M. Woźniak, “Self
[11]   E. Shelhamer, J. Long, and T. Darrell, “Fully                          organizing maps for 3D face understanding”.
       Convolutional Networks for Semantic Segmentation,”                     International Conference on Artificial Intelligence and
       IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4,               Soft Computing, pp. 210-217, 2017.
       pp. 640–651, 2017.
[12]   M. Zand, S. Doraisamy, A. A. Halin, and M. R.
       Mustaffa,      “Ontology-Based      Semantic     Image




                                                                  18