SemNav: How Rich Semantic Knowledge Can
     Guide Robot Navigation In Indoor Spaces

                 Snehasis Banerjee? , Balamurali Purushothaman?

               TCS Research & Innovation, Tata Consultancy Services
                  {snehasis.banerjee,balamurali.p}@tcs.com


        Abstract. We have developed an ontological representation SemNav,
        specific to indoor spaces and service robots related to navigation and
        target finding task. The same has been tested on real life settings. This
        paper positions semantic web technology as one of the key elements of
        decision making in robotic tasks like navigation and object finding.

        Keywords: Ontology, Cognitive Robotics, Semantic Robot Navigation


1     Background
A major challenge in robotics is successful execution of complex tasks by a
robot in dynamic and uncertain environments. As an example, it is difficult for
the robot to find objects that are either partially observable or out of sight,
or not where it was expected to be found. Knowledge of the robot, objects
and environment with rich semantic relations aids in such scenarios. Approaches
presented in related works [1] and [2] are enhanced and adapted for this problem.

2     SemNav Ontology
SemNav was created by listing down relevant competency questions specific to
the navigation and object finding problem. Initially starting with seed ontology
of relations, images scenes [3] were processed to extract objects and captions
in order to derive semantic relations. The same were aligned with WordNet to
resolve ambiguity. Some of the prominent object relations are enlisted below:
(1) Occlusion: a bigger object can occlude a smaller object (like cup and jug)
(2) Co-location: two objects are in close proximity (like water glass and jug)
(3) RCC-8 classes: region connection calculus based qualitative spatial relations
(4) Location: object is usually located at specific zones (like pillow in bedroom)
(5) Disjoint: two objects do not co-occur together in a scene (like jug and towel)
(6) Dimension: range of length, width, height of objects (mobile phones, remote)
(7) Shape: objects can be abstracted to forms (like TV, window as rectangles)
(8) Color and texture similarity - indistinguishable objects (wooden furniture)
(9) Attribute relations from SPARQL endpoints (DBpedia) and commonsense
ontologies: objects’ shared resources (like electricity), inter-dependence, use, etc.
?
    Copyright © 2020 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
        SemNav: Snehasis Banerjee et. al.

       (a)


       (b)


    Fig. 1. a) System using SemNav for robotic tasks b) OWL snapshot of SemNav


3    Semantic Navigation
Once the SemNav ontology (Fig. 1.a) is populated with relations and instance
data (RDF) from parsing scenes, and human validated across some random
samples, the knowledge base for semantic navigation (Fig. 1.b) is ready. The
robot processes current scene using computer vision techniques. Obstacles and
free spaces are understood from scene processing. Based on the goal given (like
find an object ‘cup’), the decision module consults the knowledge store; and
matches it with current and past history of scenes with reference to GeoSem
Map (a geocentric navigation map storing object instances and their instance
relations). The semantic decision module instructs the Navigation module to
derive a plan for navigation to the next spot (having highest probability to reach
target goal) which in turn instructs the robot’s motor wheels to move in that
direction. The knowledge store’s relation weights are updated by exploration.

References
1. Anagnostopoulos, C., et. al.: Ontonav: A semantic indoor navigation system. In: 1st
   Workshop on Semantics in Mobile Environments (SME05), Ayia. Citeseer (2005)
2. Bruno, B., et. al.: Knowledge representation for culturally competent personal
   robots: requirements, design principles, implementation, and assessment. Interna-
   tional Journal of Social Robotics, Springer, pp. 1–24. (2019)
3. Krishna, R., et. al.: Visual genome: Connecting language and vision using crowd-
   sourced dense image annotations. IJCV, Springer, (123-1), 32–73. (2017)