Using Knowledge Representation and Reasoning Tools in the Design of Robots

                      Mohan Sridharan                                              Michael Gelfond
             Electrical and Computer Engineering                            Department of Computer Science
           The University of Auckland, New Zealand                            Texas Tech University, USA
            m.sridharan@auckland.ac.nz                                      michael.gelfond@ttu.edu


                            Abstract                                    tion and reasoning tools tailored towards different reasoning
                                                                        tasks. In comparison with our prior work [Zhang et al., 2014],
     The paper describes the authors’ experience in us-                 we introduce precise (and some new) definitions of basic no-
     ing knowledge representation and reasoning tools                   tions used to build mathematical models of the domain.
     in the design of robots. The focus is on the system-
     atic construction of models of the robot’s capabili-                  We started with describing the state transition diagram
     ties and its domain at different resolutions, and on               modeling possible trajectories of the robot’s domain, in ac-
     establishing a clear relationship between the mod-                 tion description language ALd [Gelfond and Inclezan, 2013].
     els at the different resolutions.                                  Using ALd allowed us to concisely represent the diagram even
                                                                        for domains with complex relationships between fluents, and
                                                                        represent recorded histories of the robot’s actions and obser-
1    Introduction                                                       vations. We expanded the standard notion of recorded history
Our prior paper described an architecture for robots [Zhang et          to include the robot’s default knowledge about the initial sit-
al., 2014], whose reasoning system we view (in this paper) as           uation, which simplifies the diagnostics task. Although there
an interplay between a logician and a statistician. The former          exist action languages that allow causal laws specifying de-
has an abstract, coarse-resolution view of the world. The lo-           fault values of fluents at arbitrary time steps [Lee et al., 2013],
gician perceives an office domain, for instance, as a collection        such languages are too powerful for our purposes, and occa-
of connected rooms containing various types of objects, and             sionally pose difficulties with representing all exceptions to
assumes that the robot is capable of successfully moving be-            such defaults when the domain is expanded. Reasoning with
tween two adjacent rooms, and of finding an object located in           theories in ALd is performed by reducing planning, diagnos-
a room. If the robot’s goal is to find a specific book, the logi-       tics, and other robotic tasks to computing answer sets of a
cian can design a plan of abstract actions directing the robot          program in CR-Prolog, a variant of Answer Set Prolog (ASP)
to go to the library, where books are normally stored. The              that supports representation and reasoning with defaults and
first action of this plan, e.g., move to the lab, which happens         their direct and indirect exceptions [Balduccini and Gelfond,
to be the first room on the way to the library, will be passed to       2003]. This reduction, which considers histories with initial
a statistician, who has a rather different, fine-resolution view        state defaults, is the first novel contribution, but it expands
of the world. According to the statistician, the same office            previous work that established close relationship between ac-
domain consists of small grid cells belonging to the different          tion languages and logic programming with answer set se-
rooms. The statistician believes that the robot is usually capa-        mantics. Existing efficient solvers help automate the neces-
ble of moving to a neighboring cell, and of checking a cell for         sary reasoning [Leone et al., 2006].
a target object, but these actions are non-deterministic—they              The second novel contribution is the precise definition of
only succeed with some probability that the statistician knows          the world view of the statistician as a refinement of the transi-
a priori or can learn by running some experiments. For the              tion diagram of the logician. The new diagram can be viewed
coarse-resolution action of moving to the lab, the statistician         as the result of increasing the resolution of the robot’s ability
thus has the robot move from cell to cell, observe its position,        to see the world. In our example, the new world view includes
and revise its belief of its position. If the statistician has a suf-   cells in rooms, which were so far invisible, and relations and
ficiently high confidence that the abstract action passed by the        actions involving these cells, e.g. location of an object in a
logician has been executed, i.e., that the robot has moved to           cell, and the action of moving from a cell to its neighbor.
the lab, this information is reported back to the logician. Oth-        Construction of such a refinement may require some amount
erwise, statistician reports failure after some time, and the           of domain knowledge, but the statistician’s diagram is related
logician has to diagnose and replan. Even this simple sce-              to the logician’s diagram in a precise way that is captured by
nario shows that our robot’s architecture should be capable of          our novel mathematical definition of refinement. We provide
representing and manipulating both logical and probabilistic            detailed guidelines for the construction of the refinement’s
knowledge. In this paper we report our experience in the sys-           action theory, which includes axioms establishing the rela-
tematic design of such a robot using knowledge representa-              tionship between fluents of the coarse-resolution diagram and
their fine-resolution counterparts, as well as the axioms de-        ple declarative programming and continuous-time planners
scribing the effects of observations.                                for path planning in teams [Saribatur et al., 2014], combine
   After the refinement of the domain is constructed, the fine-      a probabilistic extension of ASP with POMDPs for human-
resolution description is randomized, i.e., modified to con-         robot dialog [Zhang and Stone, 2015], combine logic pro-
sider the non-deterministic effects of fine-resolution actions       gramming and reinforcement learning to discover domain ax-
and observations. The third novel contribution is to expand          ioms [Sridharan et al., 2016], or use a three-layered orga-
the action language ALd by a construct that supports the nat-        nization for knowledge and reasoning with first-order logic
ural representation of such effects. Probabilities computed          and probabilities in open worlds [Hanheide et al., 2015].
experimentally by the statistician are then associated with the      Some general formulations that combine logical and prob-
purely logical diagram to obtain a probabilistic diagram. Rea-       abilistic reasoning include Markov logic network [Richard-
soning in this new diagram also considers belief states, i.e.,       son and Domingos, 2006], Bayesian logic [Milch et al.,
probability distributions over states of the logical diagram.        2006], and probabilistic extensions to ASP [Baral et al., 2009;
Theoretically, the statistician can now use probabilistic graph-     Lee and Wang, 2015]. However, algorithms based on first-
ical models such as a partially observable Markov decision           order logic do not support non-monotonic logical reasoning
process (POMDP) to select and execute the “best” possible            and do not provide the desired expressiveness—it is not al-
action, make observations, and update the belief state until         ways possible to associate numbers with logic statements to
the abstract action provided by the logician is completed with       express degrees of belief. Algorithms based on logic pro-
high probability, or all hope of doing so is lost. However,          gramming do not support one or more of the desired capabil-
POMDP algorithms need a representation of the diagram that           ities such as incremental revision of (probabilistic) informa-
lists all possible combinations of physical states and actions,      tion; and reasoning with large probabilistic components. As a
which can become intractable even for a comparatively small          step towards addressing these limitations, we have developed
collection of fluents. To avoid this problem, the robot zooms        architectures that couple declarative programming and prob-
to the part of the fine-resolution diagram that is relevant to the   abilistic graphical models [Zhang et al., 2014; 2015]. Here,
execution of the coarse-resolution action provided by the lo-        we expand on our prior work to explore the systematic con-
gician. For instance, to execute the action “move from room          struction of robots with the desired knowledge representation
R1 to room R2 ”, with the two rooms being next to each other,        and reasoning capabilities. We illustrate these design steps
the fluents and actions related to other rooms are eliminated,       and our contributions using the following example.
dramatically reducing the size of the corresponding diagram.         Example 1. [Office Domain] Consider a robot assigned the
The fourth new contribution is a precise definition of zooming,      goal of moving specific objects to specific places in an office
which helps automate this process. Finally, the zoomed part          domain. This domain contains:
of the randomized fine-resolution diagram is represented in               • The sorts: place, thing, robot, and ob ject, with ob ject
the format suitable for use with POMDP solvers. The corre-                 and robot being subsorts of thing. Sorts textbook,
sponding policy is invoked to execute a sequence of concrete               printer and kitchenware, are subsorts of the sort ob ject.
actions that implements the abstract action, with the action              • Four places: o f f ice, main library, aux library, and
outcomes being added to the coarse-resolution history.                     kitchen of which some of them are directly accessible
                                                                           from each other.
2   Related Work and Example Domain                                       • An instance of the sort robot, called rob1 , and a number
                                                                           of instances of subsorts of the sort ob ject.
Logic-based representations and probabilistic graphical mod-         Although this domain may appear simplistic, it illustrates
els have been used to control sensing, navigation and interac-       many of the representation, reasoning, perception, and actua-
tion for robots [Bai et al., 2014; Hawes et al., 2010]. Formu-       tion challenges that exist in more complex robotics domains.
lations based on probabilistic representations (by themselves)
make it difficult to perform commonsense reasoning, whereas          3   Logician’s Description
approaches based on logic programming tend to require con-
siderable prior knowledge of the domain and the agent’s ca-          We start with the logician’s view of the world.
pabilities, and make it difficult to merge new, unreliable infor-    Action Language ALd : The logician’s state transition dia-
mation with an existing knowledge base. Theories of reason-          gram is specified as a system description (theory) in a variant
ing about actions and change, and the non-monotonic logical          of action language ALd , which allows statements of the form:
reasoning ability of ASP have been used by an international
research community, e.g., for natural language human-robot                           a causes f (x̄) = y if body
interaction [Chen et al., 2012], control of unmanned aerial                           f (x̄) = y if body
vehicles [Balduccini et al., 2014], and coordination of robot                        impossible a0 , . . . , an if body
teams [Saribatur et al., 2014]. However, the basic version of
ASP does not support probabilistic representation of uncer-          The first statement describes an action’s direct effect—if ac-
tainty, whereas a lot of information extracted from sensors          tion a is executed in a state satisfying condition body, the
and actuators is represented probabilistically.                      value of fluent f in the resulting state will be y. For instance:
   Researchers have designed architectures for robots that                         move(R, Pl) causes loc(R) = Pl
combine logic-based and probabilistic algorithms for task and
motion planning [Kaelbling and Lozano-Perez, 2013], cou-             says that a robot R moving to place Pl will end up in Pl.
   The second statement is a state constraint, which says that         to defaults (like an observation above) and uses CR-Prolog,
f (x̄) = y in any state that satisfies body. For instance:             which justifies our departure from standard ASP.
       loc(Ob) = Pl if loc(R) = Pl, in hand(R, Ob)                     Planning and diagnostics: The description of the logician’s
                                                                       knowledge, as provided above, is sufficient for adequately
guarantees that the object grasped by a robot shares the               performing planning and diagnostics—these tasks are re-
robot’s location. The third statement prohibits simultaneous           duced to computing answer sets of program Π(DH , H ) com-
execution of actions a0 , . . . , an in a state satisfying body. The   bined with standard encoding of the robot’s goal. This pro-
functions in these statements are of two types, those whose            gram is passed to an efficient ASP solver—we use SPARC,
values can be changed by actions (fluents) and those whose             which expands CR-Prolog and provides explicit constructs
values cannot be changed (statics). Fluent can be basic or             to specify objects, relations, and their sorts [Balai et al.,
defined; the former is subject to inertia laws while the latter        2013]. Atoms of the form occurs(action, step) belong-
is defined in terms of other fluents. Formal semantics of the          ing to the answer set obtained by solving this program,
original ALd is discussed in [Gelfond and Kahl, 2014]. Un-             e.g., occurs(a1 , 1), . . . , occurs(an , n), represent the shortest
fortunately, it only allows boolean fluents, a restriction that is     sequence of abstract actions for achieving the logician’s goal.
removed in our variant of the language.                                Prior research results in the theory of action languages and
Histories with defaults: In addition to action theory of ALd ,         ASP ensure that the plan is provably correct. In a similar
the logician’s knowledge contains a recorded history, a col-           manner, suitable atoms in the answer set can be used for di-
lection of the robot’s observations and actions. In action lan-        agnostics, e.g., to explain unexpected observations.
guages, such a recorded history typically consists of state-           Example 2. [Logician’s view of the world]
ments of the form obs( f , y,true, i) (or obs( f , y, f alse, i))—at   The logician’s system description DH of the domain in Ex-
step i of the robot’s trajectory, the value of fluent f is observed    ample 1 consists of sorted signature ΣH and axioms describ-
to be (or not to be) y; and hpd(a, i)—action a was success-            ing the transition diagram τH . ΣH defines the names of
fully executed (i.e., happened) at step i. The recorded history        objects and functions available for use, e.g., the sorts are
defines a collection of models, trajectories of the system com-        place, thing, robot, and ob ject, with ob ject and robot be-
patible with this record. This syntax of histories was rather          ing subsorts of thing, and textbook, printer and kitchenware
limited for our purposes, and the robot can benefit substan-           being subsorts of ob ject. The statics include a relation
tially from some forms of default knowledge. In Example 1,             next to(place, place), which describes if two places are next
the robot can, for instance, be told that books are normally           to each other. The domain’s fluents are: loc : thing → place
kept in the library. To address this limitation, we introduced         and in hand : robot × ob ject → boolean. These are ba-
an additional type of historical record:                               sic fluents subject to the laws of inertia. The domain’s
               initial default f (x̄) = y if body                      actions are move(robot, place), grasp(robot, ob ject), and
                                                                       putdown(robot, ob ject). The domain dynamics are defined
to assume that if the initial state satisfies body, the value of       using axioms that consist of causal laws:
 f (x̄) in this state is y. For instance, the default statement:
                                                                                 move(R, Pl) causes loc(R) = Pl
  initial default loc(X) = main library if textbook(X)                           grasp(R, Ob) causes in hand(R, Ob)
says that textbooks are normally kept in the library, whereas                    putdown(R, Ob) causes ¬in hand(R, Ob)
the default statement:                                                 state constraints:
      initial default loc(X) =o f f ice if textbook(X),                       loc(Ob) = Pl if loc(R) = Pl, in hand(R, Ob)
                              loc(X) 6= main library.                  and executability conditions such as:
gives the second most likely location for a textbook. Such             impossible move(R, Pl2 ) if loc(R) = Pl1 , ¬next to(Pl1 , Pl2 )
defaults may substantially simplify the planning of the logi-          impossible grasp(R, Ob) if loc(R) 6= loc(Ob)
cian. For instance, the plan for finding a textbook tb will
                                                                       impossible putdown(R, Ob) if ¬in hand(R, Ob)
consist of going directly to the library and looking for the
book there. However, defaults are also useful for diagnostics.         For any given domain, the part of ΣH described so far is un-
For instance, if tb is not found in the library, i.e., history in-     likely to change substantially. However, the last step in the
cludes obs(loc(tb), library, f alse, i), the robot would realize       construction of ΣH , which populates the basic sorts with spe-
that this observation defeats the first default, try the second        cific objects, e.g robot = {rob1 }, place = {r1 , . . . , rn }, and
default, and look for the book in the office. To ensure such           textbook = {tb1 , . . . tbn }, is likely to undergo frequent revi-
behavior, we had to redefine the notion of a model of a his-           sions. Ground instances of axioms are obtained by replacing
tory, and design a new algorithm for computing such models.            variables by ground terms from the corresponding sorts.
To this end, given an ALd theory DH and recorded history H ,              In the transition diagram τH described by DH , actions are
we construct the program Π(DH , H ) consisting of the stan-            assumed to be deterministic, and values of fluents are as-
dard encoding of DH into ASP, and the collection of atoms              sumed to be observable, which aid in fast, tentative planning
representing observations and actions of H together with the           and diagnostics for achieving the goals. This domain rep-
initial state defaults. This encoding allows indirect exceptions       resentation should ideally be tested extensively by including
various recorded histories of the domain, which may include             in our illustrative domain we have:
histories with prioritized defaults, and using the resulting pro-
grams to solve various reasoning tasks.                                    loc∗ (T h) = Rm if component(Cl, Rm), loc(T h) = Cl
                                                                        which includes the new static component(c, r) for every cell
4    Statistician’s Description                                         c within every room r—next to(c1 , c2 ) is defined in a similar
In this section we describe our design of the statistician.             manner for every pair of adjacent cells accessible from each
                                                                        other. More importantly, we add to DL , basic knowledge flu-
Refinement: First we specify the deterministic version of the           ents that model the direct and indirect knowledge effects of
world of the statistician. It would be given by a system de-            sensing, and introduce axioms relating these fluents and the
scription DL defining a transition diagram τL which serves              test actions. For instance:
as a (deterministic) refinement of the logician’s diagram τH .
Recall that τL is the result of increased resolution which mag-           test(R, F,Y ) causes dir obs(R, F,Y ) = true        if F = Y.
nifies some objects in the signature ΣH of system description           In our example, robot rob1 testing if location of object o is cell
DH of τH . Newly discovered parts of the magnified objects              c will make basic knowledge fluent dir obs(rob1 , loc(o), c)
are referred to as its refined components. In our example, for          true if the object is indeed there. Note, that dir obs may have
instance, every room is magnified and is viewed as a collec-            three possible values, true, false and undef —it is initially set
tion of its component cells.                                            to the third value. Another fluent, indir obs(rob1 , loc(o), R)
   The refinement will be represented by a system descrip-              holds if loc(o) is observed to be true in some component cell
tion DL with signature ΣL . We say that a signature ΣL refines          of room R. The fluent’s value is observed if it is observed
signature ΣH if it is obtained by:                                      directly or indirectly. The designers of the statistician should
• Replacing every basic sort stH of ΣH , whose elements were            make sure that τL specified by our system description DL is
   magnified, by its coarse-resolution version stL∗ = stH , and         indeed a refinement, i.e., it matches the following formal def-
   fine-resolution counterpart stL = {o1 , . . . , om } consisting of   initions of refinement of a state and a system description.
   the components of magnified elements of stH , e.g., ΣL re-
   places sort place = {r1 , . . . , rn } of rooms in ΣH by:            Definition 1. [Refinement of a state]
                              ∗
                        place = {r1 , . . . , rn }                      A state δ of τL is said to be a refinement of a state σ of τH if:
                        place = {c1 , . . . , cm }                        • For every magnified fluent f from the signature of ΣH :

  where rooms are magnified and viewed as a collection of                                 f (x) = y ∈ σ iff f ∗ (x) = y ∈ δ
  newly discovered objects, their component cells c1 , . . . , cm .       • For every other fluent of ΣH :
• Introducing static relation component between magnified
  objects from stL∗ and the corresponding newly-discovered                                 f (x) = y ∈ σ iff f (x) = y ∈ δ
  objects from stL , e.g., relation component(c, r) is defined
  to be true iff cell c is in room r.                                   Definition 2. [Refinement of a system description]
• Replacing sort f luent of ΣH by its coarse-resolution copy            Let DL and DH be system descriptions with transition dia-
   f luent ∗ (a defined fluent) and its fine-resolution counter-        grams τL and τH respectively. DL is a refinement of DH if:
  part f luent (a basic fluent). In our example, the new fluents          • States of τL are the refinements of states of τH .
  obtained by refinement are:
                                                                          • For every transition hσ1 , aH , σ2 i of τH , every fluent f in
                       loc∗ : thing → place∗                                a set F of simultaneously observable fluents, and every
                       loc : thing → place                                  refinement δ1 of σ1 , there is a path P in τL from δ1 to a
                                                                            refinement δ2 of σ2 such that:
   Other fluents (and their signatures) are unchanged.                         – Every action of P is executed by the robot which
• Obtaining actions of ΣL by replacing the magnified pa-                          executes aH .
   rameters of the original actions from ΣH by their fine-                     – Every state of P is a refinement of σ1 or σ2 , i.e., no
   resolution counterparts. In our example, ΣL will con-                          unrelated fluents are changed.
   tain original actions grasp and putdown, and new ac-
   tion move(robot, cell) of a robot moving to an (adjacent)                   – observed(R, f ,Y ) = true ∈ δ2 if ( f = Y ) ∈ δ2 and
   cell. ΣL will also include knowledge-producing action                          observed(R, f ,Y ) = f alse ∈ δ2 if ( f = Y ) 6∈ δ2 .
   test(robot, f luent, value) that activates algorithms on the
                                                                        Randomization: Our next step is to expand DL to capture
   robot to check the value of an observable fluent of DL in
                                                                        the non-determinism in action execution and observations on
   a given state, e.g., test(R, loc(T h),Cell). Note that test
                                                                        the robot, which is essential for the statistician. To do so, we
   is a general action belonging to every refinement of ΣH .
                                                                        extended ALd by non-deterministic causal laws:
   We also add fluents to describe the result of testing, e.g.,
   observed(robot, f luent, value) is true if the most recent                        a causes f (x̄) : {Y : p(Y )} if body
   (direct/indirect) observation of f luent returned value.                          a causes f (x̄) : sort name if body
Axioms of the refined system description DL include ax-
ioms of DH , and domain-dependent axioms relating coarse-               where the first statement says that if a is executed in a state
resolution fluents and their fine-resolution counterparts, e.g.,        satisfying body, f may take on any value from the set {Y :
p(Y )} ∩ range( f ) in the resulting state—second statement             Definition 4. [Zoom]
says that f may take any value from {sort name ∩ range( f )}.           To construct DLR (T ), we need to determine the signature and
Randomized fine-resolution system description DLR is then               the axioms describing the transition diagram τLR (T ). The sig-
obtained by replacing each action’s deterministic causal laws           nature of DLR (T ) is constructed as follows:
in DL by non-deterministic ones, declaring the affected fluent
as a random fluent. For instance, in our example, the non-               1. If stH is a sort of DH with at least one element directly
deterministic causal law for move is:                                       relevant to T :

   move(R,C2 ) causes loc(R) = {C : range(loc(R),C)}                           • If sort stH is not magnified, it is its own zoomed
                                                                                 counterpart stLz .
where defined fluent range is given by:
                                                                               • If sort stH is magnified, stLz is the set of components
       range(loc(R),C) if loc(R) = C                                             of elements of stH directly relevant to T .
       range(loc(R),C) if loc(R) = C1 , next to(C,C1 )
                                                                             The zoomed counterparts form a hierarchy of basic sorts
where a robot moving to a cell can end up in other cells that                of DLR (T ) (with subclass relation inherited from DLR ).
are within range—the robot’s current cell and neighboring                    In our example, the sorts of DLR (T ) are robotLz = {rob1 }
cells are all within range.                                                  and placezL = {ci : ci ∈ kitchen ∪ o f f ice}.
   To complete our model of the statistician with probabilis-
tic information, we run experiments that sample specific in-             2. Functions of DLR (T ) are those of DLR restricted to the
stances of each ground non-deterministic causal laws in DLR ,               identified sorts. Functions in our example are loc(rob1 )
have the robot execute the corresponding action multiple                    taking values from placezL , next to(placezL , placezL ),
times, and collect statistics (e.g., counts) of the number of               range(loc(rob1 ), placezL ), and properly restricted func-
times each outcome of the corresponding fluent is obtained.                 tions related to testing these functions’ values.
These statistics collected in an initial training phase are used         3. Actions of DLR (T ) are restricted to the sorts identified
to compute causal probabilities of action outcomes, and the                 above. In our example, the relevant actions are of the
probability of observations being correct. Local symmetry                   form move(rob1 , ci ) and test(rob1 , loc(rob1 ), ci ), where
assumptions are used to simplify this collection of statistics,             ci are individual elements of placezL .
e.g., movement from a cell to one of its neighbors is assumed
to be the same for any cell, given a specific robot. The de-            The axioms of DLR (T ) are those of DLR that are restricted
signer provides the required domain-specific information. In            to the signature of DLR (T ). In our example, this interpreta-
our example, if rob1 in cell c1 may reach {c1 , c2 , c3 } when          tion removes the causal laws for grasp and put down, and
executing move(rob1 , c2 ), the probabilities of these outcomes         removes the state constraint related to fluent in hand in DLR .
may be 0.1, 0.8, and 0.1 respectively. Similarly, the robot             Furthermore, in the causal law and executability condition
may compute that 0.85 is the probability with which it can              corresponding to the action move, C can only take values from
recognize a specific object in a specific cell. Any prior beliefs       placezL , i.e., any cell in the kitchen or the o f f ice.
about these probabilities (e.g., from a human) can be used as
the initial belief that is revised by the experimental trials.
                                                                        Example 3. [Example of zoom]
Zooming: The statistician uses DLR and the computed prob-               Assume that robot rob1 has to execute aH = grasp(rob1 ,tb1 )
abilities for fine-resolution execution of the transition T =           to pickup textbook tb1 —rob1 is, and tb1 is assumed to be, in
hσ1 , aH , σ2 i ∈ τH . Since reasoning probabilistically about all      the o f f ice. Zooming constructs the following signature:
of DLR may result in incorrect behavior and can be compu-
tationally intractable, the statistician first identifies the part        • Relevant sorts of DH are robot, ob ject, and place, and
τLR (T ) of diagram τLR that is necessary for the fine-resolution           stLz = {robotLz , ob jectLz , placezL } are the zoomed counter-
execution of aH —we call this operation zooming. The size                   parts in DLR (T ), with robotLz = {rob1 }, ob jectLz = {tb1 }
of τLR (T ) is decreased with respect to τLR by only consider-              and placezL = {ci : ci ∈ o f f ice}.
ing refinements of σ1 and σ2 , the states of τLR relevant to T .
Next, fluents and actions not relevant to the execution of aH             • Functions of DLR (T ) include (a) loc(robotLz ) and
are removed based on the following definitions.                             loc(ob jectLz ), basic fluents that takes values from
                                                                            placezL ; (b) static next to(placezL , placezL ); (c) de-
Definition 3. [Direct relevance]                                            fined fluent range(loc(robotLz ), placezL ); and (d) the
An element y of a basic sort stH of DH is directly relevant to              knowledge fluent dir obs restricted to the zoomed
a transition T of τH if:                                                    sorts, e.g., dir obs(robotLz , loc(robotLz ), placezL ) and
                                                                            dir obs(robotLz , loc(ob jectLz ), placezL ).
   • Element y occurs in aH ; or
   • For some f , f (x̄) = y belongs to σ1 or σ2 but not both.            • Actions of DLR (T ) include (a) move(robotLz , placezL ); (b)
                                                                            grasp(robotLz , ob jectLz ); (c) putdown(robotLz , ob jectLz );
Consider the transition corresponding to robot rob1 moving
                                                                            and (d) knowledge-producing actions to test the location
from the kitchen to the o f f ice, i.e., aH = move(rob1 , o f f ice).
                                                                            of rob1 and tb1 , e.g., test(robotLz , loc(ob jectLz ), placezL ).
For this transition, element rob1 of sort robot, and elements
o f f ice and kitchen of sort place, are relevant.                      The axioms of DLR (T ) are those of DLR restricted to the sig-
nature of DLR (T ). For instance, these axioms include:               have a high probability associated with them after invoking
                                                                      the policy several times. The latter case is interpreted as the
move(rob1 , c j ) causes loc(rob1 ) = {C : range(loc(rob1 ),C)} failure to execute the coarse-resolution action.
grasp(rob1 ,tb1 ) causes in hand(rob1 ,tb1 ) = {true, f alse}             The observations and action outcomes obtained by exe-
test(rob1 , loc(tb1 ), c j ) causes dir obs(rob1 , loc(tb1 ), c j )   cuting the POMDP policy correspond to observations and
                           = {true, f alse} if loc(tb1 ) = c j        fluents in DLR (T ). This information, in turn, revises H ,
                                                                      to be used for subsequent reasoning by the logician. If τH
impossible move(rob1 , c j ) if loc(rob1 ) = ci , ¬next to(c j , ci ) is constructed correctly, and the statistics collected during
where range(loc(rob1 ),C) may hold for values in {ci , c j , ck }     the initial training phase correctly model the domain dy-
within the range of the robot’s current location (ci ), and           namics, following an optimal policy produced by an exact
                            z
are elements of placeL . The states of τLR (T ) thus include          POMDP solver is most likely (among all possible policies)
atoms of the form loc(rob1 ) = ci and loc(tb1 ) = c j , where         to take the robot to the desired goal state [Littman, 1996;
                                 z
ci and c j are values in placeL , in hand(rob1 ,tb1 ), direct ob-     Sondik, 1971]. Since we use an approximate solver for com-
servations of these atoms, and statics such as next to(ci , c j ).    putational efficiency, we obtain a bound on the regret (i.e.,
Specific actions include move(rob1 , ci ), grasp(rob1 ,tb1 ),         loss in value) due to the computed policy [Ong et al., 2010].
putdown(rob1 ,tb1 ) and test actions.                                 Due to space constraints, we provide (below) a simplistic ex-
                                                                      ample of constructing a POMDP.
POMDP construction: The statistician uses system descrip-
tion DLR (T ), and the learned probabilities, to construct a          Example 4. [POMDP construction]
POMDP for the probabilistic implementation of a in state H            Consider rob1 in the o f f ice and the implementation of aH =
σ1 of τH . It may be possible to use other (more computation-         move(rob1 , kitchen), with one cell in each room. Due to
ally efficient) probabilistic models for implementing specific        zooming,    the robot only reasons about its location. Assume
actions. Also, it may be possible to use specific (computation-       that a move from a cell to a neighboring cell succeeds with
ally efficient) heuristic or probabilistic algorithms for specific    probability 0.85—otherwise, the robot remains where it is.
tasks such as path planning. However, POMDPs provide (a)              Assume that all non-terminal actions have unit cost. Termi-
principled and quantifiable trade-off between accuracy and            nating the POMDP policy after reaching cell 1 (in kitchen)
computational efficiency in the presence of uncertainty; and          receives a large positive reward (100), whereas termination
(b) near-optimal solution if the POMDP is modeled accu-               while in cell 0 (in o f f ice) receives a large negative reward
rately. Also, our architecture only constructs a POMDP for            (−100). Elements of the POMDP are described (below) in
a small (relevant) part of the domain, significantly reducing         the format used by the POMDP solver [Ong et al., 2010].
the computational complexity of solving the POMDP. Fur-                   The states correspond to possible robot locations, and absb
thermore, many POMDPs for any given domain can be pre-                is a  terminal state. The actions correspond to the robot mov-
computed, solved and used as needed.                                  ing   to specific cells, testing its cell location, or terminating
                                              L  L   L   L
   A POMDP is described by a tuple hS , A , Z , T , O , R i    L    L policy   execution (transitioning to absb). The robot observes
for a specific goal state. Elements of this tuple correspond          its  cell location, or receives no observation. Knowledge-
to the set of states, set of actions, set of values of observable     producing    actions do not cause a state transition, and actions
fluents, the state transition function, the observation function,     that  change  the state do not provide observations. Our archi-
and the reward specification, with the last three elements be-        tecture constructs such data structures for complex domains
ing based on the statistics acquired during randomization—            by zooming to the relevant part of the system description (as
for details about POMDPs and their use in AI and robotics,            described earlier). Also, for each state si and action a j , the
     [
see Littman, 1996; Zhang et al., 2015 .     ]                         corresponding     ASP program Π(DLR (T ), si , a j ) is solved to
   The POMDP formulation considers states to be partially             (a) identify inconsistencies and eliminate impossible states,
observable, and reasons with probability distributions over           e.g., the robot and an object cannot be in different locations
                                                    L
the states, called belief states. Functions T and O de-        L      when the robot is holding the object; and (b) identify possi-
scribe a probabilistic transition diagram over belief states.         ble state transitions, eliminating impossible transitions in the
The POMDP formulation also implicitly includes a history              construction of the transition function.
of observations and actions—the current belief state is as-           discount: 0.99
sumed to be the result of all information obtained in so far.         states: robot-0 robot-1 absb
The POMDP tuple is used to compute a policy π : bt 7→ at+1        L , actions: move-0 move-1 test-0 test--1 done
which maps belief states to actions, using an algorithm that          observations: rob-found rob-not-found none
maximizes the reward over a planning horizon. The policy is
then used to choose an action in the current belief state, revis-     % Transition function format
ing the belief state through Bayesian updates after executing         % T : action : S x S’ -> [0, 1]
the action at+1 and receiving an observation ot+1 . The be-           T: move-0
lief update continues until policy execution is terminated. In        1             0           0
our case, a terminal action is executed when it has a higher          0.85          0.15        0
(expected cumulative) utility than continuing to execute non-         0             0           1
terminal actions. This action choice happens when the belief
in a specific state is high (e.g., ≥ 0.8), or none of the states      T: move-1
0.15        0.85       0                                              Algorithm 1: Control loop
0           1          0
                                                                       Input: coarse-resolution system description DH and
0           0          1
                                                                              history H ; randomized fine-resolution system
                                                                              description DLR ; coarse-resolution goal.
T: test-robot-0
                                                                       Output: robot is in a state satisfying the goal; reports
identity
                                                                                failure if this is impossible.
T: test-robot-1                                                   1  while goal is not achieved do
identity                                                          2     Logician uses τH and H to find a plan of abstract
                                                                        actions, aH              H
                                                                                    1 , . . . , an to achieve the goal.
T: done                                                            3    if no plan exists then
uniform                                                            4         return failure
                                                                   5    end
% Observation function format                                      6    i := 1, continue1 := true
% O : action : s_i : z_i -> [0, 1] (or)                            7    while continue1 do
%            : S x Z -> [0, 1]                                     8         Check pre-requisites of aH      i .
O: move-0 : * : none 1                                             9         if pre-requisites not satisfied then
O: move-1 : * : none 1                                            10              continue1 := false
                                                                  11         else
O: test-robot-0                                                   12              Statistician zooms to the relevant part of τLR
0.95    0.05    0                                                                 for executing aH     i and constructs a POMDP.
0.05    0.95    0                                                 13              Statistician solves POMDP to compute an
0       0       1
                                                                                  action policy to implement aH      i .
                                                                  14              continue2 := true
O: test-robot-1
                                                                  15              while continue2 do
0.05    0.95    0
                                                                  16                   Statistician invokes POMDP policy to
0.95    0.05    0
                                                                                       select and execute an action, obtain
0       0       1
                                                                                       observation, and update belief state.
                                                                  17                   if terminal action invoked then
O: done : * : none 1
                                                                  18                          Statistician communicates success or
% Reward function format                                                                      failure of aH
                                                                                                          i to the logician, to be
% R : action : s_i : s_i’ : real value                                                        recorded in H .
R: * : * : * : -1                                                 19                          continue2 = false
R: done   : robot-0 : * : -100                                    20                          i := i+1
R: done   : robot-1 : * : 100                                     21                          continue1 := (i < n + 1)
                                                                  22                   end
                                                                  23              end
5   Reasoning System                                              24         end
Algorithm 1 describes the reasoning loop. For any given goal,     25    end
the logician reasons with the system description DH and a         26 end
recorded history H with initial state defaults, accounting
for any discrepancies between observations and predictions
to compute a plan of abstract actions. For an abstract action     fice domain, indicate that our architecture provides signifi-
aH , the statistician zooms to the relevant part of the random-   cantly higher efficiency and accuracy in comparison with us-
ized refinement of DH . The corresponding system description      ing just POMDPs with hierarchical decompositions, similar
DLR (T ) and probabilities are used to construct and solve a      to our prior work [Sridharan et al., 2015]. These results are
POMDP. The POMDP policy is invoked repeatedly (until ter-         not described here because the focus is on describing the steps
mination) to execute a sequence of concrete actions. The cor-     in the design process, and due to space limitations.
responding outcomes are reported to the logician to be used
for subsequent reasoning. Correctness of this control loop is
ensured by (1) applying the algorithm that reduces coarse-        6      Conclusions
resolution planning and diagnostics to computing answer sets      This paper described the systematic design of robots capa-
of the corresponding program; (2) using the refinement and        ble of representing and reasoning with logic-based and prob-
zoom operations as described above; and (3) using POMDPs          abilistic descriptions of domain knowledge and uncertainty.
to probabilistically execute an action sequence for each ab-      Our architecture uses tightly-coupled transition diagrams of
stract action in the logician’s plan.                             the domain at two levels of granularity, with a fine-resolution
   Experimental trials, both in simulation and on physical        diagram being a refinement of a coarse-resolution diagram.
robots assisting humans in finding and moving objects an of-      For any given goal, non-monotonic logical reasoning at the
coarse-resolution plans a sequence of abstract actions. Each         [Kaelbling and Lozano-Perez, 2013] Leslie Kaelbling and Tomas
abstract action is implemented probabilistically as a sequence          Lozano-Perez. Integrated Task and Motion Planning in Belief
of concrete actions by zooming to the relevant part of the fine-        Space. International Journal of Robotics Research, 32(9-10),
resolution description, constructing and solving a a POMDP,             2013.
and invoking a policy until termination. The corresponding           [Lee and Wang, 2015] Joohyung Lee and Yi Wang. A Probabilistic
outcomes revise the coarse-resolution history for subsequent            Extension of the Stable Model Semantics. In AAAI Spring Sym-
reasoning. The design steps are illustrated using examples,             posium on Logical Formalizations of Commonsense Reasoning,
which indicate that the architecture supports reasoning at the          Stanford, USA, March 2015.
sensorimotor level and the cognitive level with violation of         [Lee et al., 2013] Joohyun Lee, Vladimir Lifschitz, and Fangkai
defaults, and unreliable observations and actions.                      Yang. Action Language BC: Preliminary Report. In Interna-
                                                                        tional Joint Conference on Artificial Intelligence (IJCAI), Bei-
Acknowledgements                                                        jing, China, August 3-9, 2013.
The authors thank Jeremy Wyatt and Shiqi Zhang for discus-           [Leone et al., 2006] N. Leone, G. Pfeifer, W. Faber, T. Eiter,
sions related to the architecture described in this paper. The          G. Gottlob, S. Perri, and F. Scarcello. The DLV System for
                                                                        Knowledge Representation and Reasoning. ACM Transactions
first author was supported in part by the US Office of Naval            on Computational Logic, 7(3):499–562, 2006.
Research Science of Autonomy award N00014-13-1-0766.
                                                                     [Littman, 1996] Michael Littman. Algorithms for Sequential De-
                                                                        cision Making. PhD thesis, Brown University, Department of
References                                                              Computer Science, Providence, USA, March 1996.
[Bai et al., 2014] Haoyu Bai, David Hsu, and Wee Sun Lee. In-
                                                                     [Milch et al., 2006] Brian Milch, Bhaskara Marthi, Stuart Russell,
  tegrated Perception and Planning in the Continuous Space: A
  POMDP Approach. International Journal of Robotics Research,           David Sontag, Daniel L. Ong, and Andrey Kolobov. BLOG:
  33(8), 2014.                                                          Probabilistic Models with Unknown Objects. In Statistical Re-
                                                                        lational Learning. MIT Press, 2006.
[Balai et al., 2013] Evgenii Balai, Michael Gelfond, and Yuanlin
  Zhang. Towards Answer Set Programming with Sorts. In Inter-        [Ong et al., 2010] Sylvie C. Ong, Shao Wei Png, David Hsu, and
  national Conference on Logic Programming and Nonmonotonic             Wee Sun Lee. Planning under Uncertainty for Robotic Tasks with
  Reasoning, Corunna, Spain, 2013.                                      Mixed Observability. IJRR, 29(8):1053–1068, July 2010.
[Balduccini and Gelfond, 2003] Marcello Balduccini and Michael       [Richardson and Domingos, 2006] Matthew Richardson and Pedro
  Gelfond. Logic Programs with Consistency-Restoring Rules. In          Domingos. Markov Logic Networks. Machine learning, 62(1),
  AAAI Spring Symposium on Logical Formalization of Common-             2006.
  sense Reasoning, pages 9–18, 2003.                                 [Saribatur et al., 2014] Zeynep G. Saribatur, Esra Erdem, and
[Balduccini et al., 2014] Marcello Balduccini, William C. Regli,        Volkan Patoglu. Cognitive Factories with Multiple Teams of
  and Duc N. Nguyen. An ASP-Based Architecture for Au-                  Heterogeneous Robots: Hybrid Reasoning for Optimal Feasible
  tonomous UAVs in Dynamic Environments: Progress Report. In            Global Plans. In International Conference on Intelligent Robots
  International Workshop on Non-Monotonic Reasoning (NMR),              and Systems (IROS), Chicago, USA, 2014.
  Vienna, Austria, July 17-19, 2014.                                 [Sondik, 1971] Edward J. Sondik. The Optimal Control of Partially
[Baral et al., 2009] Chitta Baral, Michael Gelfond, and Nelson          Observable Markov Processes. PhD thesis, Stanford University,
  Rushton. Probabilistic Reasoning with Answer Sets. Theory and         1971.
  Practice of Logic Programming, 9(1):57–144, January 2009.          [Sridharan et al., 2015] Mohan Sridharan, Michael Gelfond, Shiqi
[Chen et al., 2012] Xiaoping Chen, Jiongkun Xie, Jianmin Ji, and        Zhang, and Jeremy Wyatt. A Refinement-Based Architecture for
  Zhiqiang Sui. Toward Open Knowledge Enabling for Human-               Knowledge Representation and Reasoning in Robotics. Techni-
  Robot Interaction. Human-Robot Interaction, 1(2):100–117,             cal report, Unrefereed CoRR abstract: http://arxiv.org/
  2012.                                                                 abs/1508.03891, August 2015.
[Gelfond and Inclezan, 2013] Michael Gelfond and Daniela In-         [Sridharan et al., 2016] Mohan Sridharan, Prashanth Devarakonda,
  clezan. Some Properties of System Descriptions of ALd . Journal       and Rashmica Gupta. Discovering Domain Axioms Using Rela-
  of Applied Non-Classical Logics, Special Issue on Equilibrium         tional Reinforcement Learning and Declarative Programming. In
  Logic and Answer Set Programming, 23(1-2):105–120, 2013.              ICAPS Workshop on Planning and Robotics (PlanRob), London,
[Gelfond and Kahl, 2014] Michael Gelfond and Yulia Kahl.                UK, June 13-14, 2016.
  Knowledge Representation, Reasoning and the Design of              [Zhang and Stone, 2015] Shiqi Zhang and Peter Stone. CORPP:
  Intelligent Agents. Cambridge University Press, 2014.                 Commonsense Reasoning and Probabilistic Planning, as Applied
[Hanheide et al., 2015] Marc Hanheide, Moritz Gobelbecker, Gra-         to Dialog with a Mobile Robot. In AAAI Conference on Artificial
  ham Horn, Andrzej Pronobis, Kristoffer Sjoo, Patric Jensfelt,         Intelligence (AAAI), pages 1394–1400, Austin, USA, 2015.
  Charles Gretton, Richard Dearden, Miroslav Janicek, Hendrik        [Zhang et al., 2014] Shiqi Zhang, Mohan Sridharan, Michael Gel-
  Zender, Geert-Jan Kruijff, Nick Hawes, and Jeremy Wyatt. Robot        fond, and Jeremy Wyatt. Towards An Architecture for Knowl-
  Task Planning and Explanation in Open and Uncertain Worlds.           edge Representation and Reasoning in Robotics. In International
  Artificial Intelligence, 2015.                                        Conference on Social Robotics (ICSR), Sydney, Australia, 2014.
[Hawes et al., 2010] Nick Hawes, Jeremy Wyatt, Mohan Sridharan,      [Zhang et al., 2015] Shiqi Zhang, Mohan Sridharan, and Jeremy
  Henrik Jacobsson, Richard Dearden, Aaron Sloman, and Geert-           Wyatt. Mixed Logical Inference and Probabilistic Planning for
  Jan Kruijff. Architecture and Representations. In Cognitive Sys-      Robots in Unreliable Worlds. IEEE Transactions on Robotics,
  tems, volume 8 of Cognitive Systems Monographs, pages 51–93.          31(3):699–713, 2015.
  Springer Berlin Heidelberg, April 2010.