-

Using Knowledge Representation and Reasoning Tools in the Design of Robots

Mohan Sridharan

m.sridharan@auckland.ac.nz 1

Michael Gelfond

michael.gelfond@ttu.edu 0 0 Department of Computer Science, Texas Tech University , USA 1 Electrical and Computer Engineering, The University of Auckland , New Zealand

2010

The paper describes the authors' experience in using knowledge representation and reasoning tools in the design of robots. The focus is on the systematic construction of models of the robot's capabilities and its domain at different resolutions, and on establishing a clear relationship between the models at the different resolutions.

Our prior paper described an architecture for robots [Zhang et al., 2014], whose reasoning system we view (in this paper) as an interplay between a logician and a statistician. The former has an abstract, coarse-resolution view of the world. The logician perceives an office domain, for instance, as a collection of connected rooms containing various types of objects, and assumes that the robot is capable of successfully moving between two adjacent rooms, and of finding an object located in a room. If the robot’s goal is to find a specific book, the logician can design a plan of abstract actions directing the robot to go to the library, where books are normally stored. The first action of this plan, e.g., move to the lab, which happens to be the first room on the way to the library, will be passed to a statistician, who has a rather different, fine-resolution view of the world. According to the statistician, the same office domain consists of small grid cells belonging to the different rooms. The statistician believes that the robot is usually capable of moving to a neighboring cell, and of checking a cell for a target object, but these actions are non-deterministic—they only succeed with some probability that the statistician knows a priori or can learn by running some experiments. For the coarse-resolution action of moving to the lab, the statistician thus has the robot move from cell to cell, observe its position, and revise its belief of its position. If the statistician has a sufficiently high confidence that the abstract action passed by the logician has been executed, i.e., that the robot has moved to the lab, this information is reported back to the logician. Otherwise, statistician reports failure after some time, and the logician has to diagnose and replan. Even this simple scenario shows that our robot’s architecture should be capable of representing and manipulating both logical and probabilistic knowledge. In this paper we report our experience in the systematic design of such a robot using knowledge representation and reasoning tools tailored towards different reasoning tasks. In comparison with our prior work [Zhang et al., 2014], we introduce precise (and some new) definitions of basic notions used to build mathematical models of the domain.

We started with describing the state transition diagram modeling possible trajectories of the robot’s domain, in action description language ALd [Gelfond and Inclezan, 2013]. Using ALd allowed us to concisely represent the diagram even for domains with complex relationships between fluents, and represent recorded histories of the robot’s actions and observations. We expanded the standard notion of recorded history to include the robot’s default knowledge about the initial situation, which simplifies the diagnostics task. Although there exist action languages that allow causal laws specifying default values of fluents at arbitrary time steps [Lee et al., 2013], such languages are too powerful for our purposes, and occasionally pose difficulties with representing all exceptions to such defaults when the domain is expanded. Reasoning with theories in ALd is performed by reducing planning, diagnostics, and other robotic tasks to computing answer sets of a program in CR-Prolog, a variant of Answer Set Prolog (ASP) that supports representation and reasoning with defaults and their direct and indirect exceptions [Balduccini and Gelfond, 2003]. This reduction, which considers histories with initial state defaults, is the first novel contribution, but it expands previous work that established close relationship between action languages and logic programming with answer set semantics. Existing efficient solvers help automate the necessary reasoning [Leone et al., 2006].

The second novel contribution is the precise definition of the world view of the statistician as a refinement of the transition diagram of the logician. The new diagram can be viewed as the result of increasing the resolution of the robot’s ability to see the world. In our example, the new world view includes cells in rooms, which were so far invisible, and relations and actions involving these cells, e.g. location of an object in a cell, and the action of moving from a cell to its neighbor. Construction of such a refinement may require some amount of domain knowledge, but the statistician’s diagram is related to the logician’s diagram in a precise way that is captured by our novel mathematical definition of refinement. We provide detailed guidelines for the construction of the refinement’s action theory, which includes axioms establishing the relationship between fluents of the coarse-resolution diagram and their fine-resolution counterparts, as well as the axioms describing the effects of observations.

After the refinement of the domain is constructed, the fineresolution description is randomized, i.e., modified to consider the non-deterministic effects of fine-resolution actions and observations. The third novel contribution is to expand the action language ALd by a construct that supports the natural representation of such effects. Probabilities computed experimentally by the statistician are then associated with the purely logical diagram to obtain a probabilistic diagram. Reasoning in this new diagram also considers belief states, i.e., probability distributions over states of the logical diagram. Theoretically, the statistician can now use probabilistic graphical models such as a partially observable Markov decision process (POMDP) to select and execute the “best” possible action, make observations, and update the belief state until the abstract action provided by the logician is completed with high probability, or all hope of doing so is lost. However, POMDP algorithms need a representation of the diagram that lists all possible combinations of physical states and actions, which can become intractable even for a comparatively small collection of fluents. To avoid this problem, the robot zooms to the part of the fine-resolution diagram that is relevant to the execution of the coarse-resolution action provided by the logician. For instance, to execute the action “move from room R1 to room R2”, with the two rooms being next to each other, the fluents and actions related to other rooms are eliminated, dramatically reducing the size of the corresponding diagram. The fourth new contribution is a precise definition of zooming, which helps automate this process. Finally, the zoomed part of the randomized fine-resolution diagram is represented in the format suitable for use with POMDP solvers. The corresponding policy is invoked to execute a sequence of concrete actions that implements the abstract action, with the action outcomes being added to the coarse-resolution history. 2

Related Work and Example Domain

Logic-based representations and probabilistic graphical models have been used to control sensing, navigation and interaction for robots [Bai et al., 2014; Hawes et al., 2010]. Formulations based on probabilistic representations (by themselves) make it difficult to perform commonsense reasoning, whereas approaches based on logic programming tend to require considerable prior knowledge of the domain and the agent’s capabilities, and make it difficult to merge new, unreliable information with an existing knowledge base. Theories of reasoning about actions and change, and the non-monotonic logical reasoning ability of ASP have been used by an international research community, e.g., for natural language human-robot interaction [Chen et al., 2012], control of unmanned aerial vehicles [Balduccini et al., 2014], and coordination of robot teams [Saribatur et al., 2014]. However, the basic version of ASP does not support probabilistic representation of uncertainty, whereas a lot of information extracted from sensors and actuators is represented probabilistically.

Researchers have designed architectures for robots that combine logic-based and probabilistic algorithms for task and motion planning [Kaelbling and Lozano-Perez, 2013], couple declarative programming and continuous-time planners for path planning in teams [Saribatur et al., 2014], combine a probabilistic extension of ASP with POMDPs for humanrobot dialog [Zhang and Stone, 2015], combine logic programming and reinforcement learning to discover domain axioms [Sridharan et al., 2016], or use a three-layered organization for knowledge and reasoning with first-order logic and probabilities in open worlds [Hanheide et al., 2015]. Some general formulations that combine logical and probabilistic reasoning include Markov logic network [Richardson and Domingos, 2006], Bayesian logic [Milch et al., 2006], and probabilistic extensions to ASP [Baral et al., 2009; Lee and Wang, 2015]. However, algorithms based on firstorder logic do not support non-monotonic logical reasoning and do not provide the desired expressiveness—it is not always possible to associate numbers with logic statements to express degrees of belief. Algorithms based on logic programming do not support one or more of the desired capabilities such as incremental revision of (probabilistic) information; and reasoning with large probabilistic components. As a step towards addressing these limitations, we have developed architectures that couple declarative programming and probabilistic graphical models [Zhang et al., 2014; 2015]. Here, we expand on our prior work to explore the systematic construction of robots with the desired knowledge representation and reasoning capabilities. We illustrate these design steps and our contributions using the following example. Example 1. [Office Domain] Consider a robot assigned the goal of moving specific objects to specific places in an office domain. This domain contains:

The sorts: place, thing, robot, and ob ject, with ob ject

and robot being subsorts of thing. Sorts textbook, printer and kitchenware, are subsorts of the sort ob ject.

Four places: o f f ice, main library, aux library, and

kitchen of which some of them are directly accessible from each other.

An instance of the sort robot, called rob1, and a number of instances of subsorts of the sort ob ject.

Although this domain may appear simplistic, it illustrates many of the representation, reasoning, perception, and actuation challenges that exist in more complex robotics domains. 3

Logician’s Description We start with the logician’s view of the world.

Action Language ALd : The logician’s state transition diagram is specified as a system description (theory) in a variant of action language ALd , which allows statements of the form: a causes f (x¯) = y if body f (x¯) = y if body im possible a0; : : : ; an if body The first statement describes an action’s direct effect—if action a is executed in a state satisfying condition body, the value of fluent f in the resulting state will be y. For instance: says that a robot R moving to place Pl will end up in Pl.

The second statement is a state constraint, which says that f (x¯) = y in any state that satisfies body. For instance: to defaults (like an observation above) and uses CR-Prolog, which justifies our departure from standard ASP.

loc(Ob) = Pl if loc(R) = Pl; in hand(R; Ob) guarantees that the object grasped by a robot shares the robot’s location. The third statement prohibits simultaneous execution of actions a0; : : : ; an in a state satisfying body. The functions in these statements are of two types, those whose values can be changed by actions (fluents) and those whose values cannot be changed (statics). Fluent can be basic or defined; the former is subject to inertia laws while the latter is defined in terms of other fluents. Formal semantics of the original ALd is discussed in [Gelfond and Kahl, 2014]. Unfortunately, it only allows boolean fluents, a restriction that is removed in our variant of the language.

Histories with defaults: In addition to action theory of ALd , the logician’s knowledge contains a recorded history, a collection of the robot’s observations and actions. In action languages, such a recorded history typically consists of statements of the form obs( f ; y; true; i) (or obs( f ; y; f alse; i))—at step i of the robot’s trajectory, the value of fluent f is observed to be (or not to be) y; and hpd(a; i)—action a was successfully executed (i.e., happened) at step i. The recorded history defines a collection of models, trajectories of the system compatible with this record. This syntax of histories was rather limited for our purposes, and the robot can benefit substantially from some forms of default knowledge. In Example 1, the robot can, for instance, be told that books are normally kept in the library. To address this limitation, we introduced an additional type of historical record:

initial default f (x¯) = y if body to assume that if the initial state satisfies body, the value of f (x¯) in this state is y. For instance, the default statement: initial default loc(X ) = main library if textbook(X ) says that textbooks are normally kept in the library, whereas the default statement: initial default loc(X ) =o f f ice if textbook(X ); loc(X ) 6= main library: gives the second most likely location for a textbook. Such defaults may substantially simplify the planning of the logician. For instance, the plan for finding a textbook tb will consist of going directly to the library and looking for the book there. However, defaults are also useful for diagnostics. For instance, if tb is not found in the library, i.e., history includes obs(loc(tb); library; f alse; i), the robot would realize that this observation defeats the first default, try the second default, and look for the book in the office. To ensure such behavior, we had to redefine the notion of a model of a history, and design a new algorithm for computing such models. To this end, given an ALd theory DH and recorded history H , we construct the program P(DH ; H ) consisting of the standard encoding of DH into ASP, and the collection of atoms representing observations and actions of H together with the initial state defaults. This encoding allows indirect exceptions Planning and diagnostics: The description of the logician’s knowledge, as provided above, is sufficient for adequately performing planning and diagnostics—these tasks are reduced to computing answer sets of program P(DH ; H ) combined with standard encoding of the robot’s goal. This program is passed to an efficient ASP solver—we use SPARC, which expands CR-Prolog and provides explicit constructs to specify objects, relations, and their sorts [Balai et al., 2013]. Atoms of the form occurs(action; step) belonging to the answer set obtained by solving this program, e.g., occurs(a1; 1); : : : ; occurs(an; n), represent the shortest sequence of abstract actions for achieving the logician’s goal. Prior research results in the theory of action languages and ASP ensure that the plan is provably correct. In a similar manner, suitable atoms in the answer set can be used for diagnostics, e.g., to explain unexpected observations.

Example 2. [Logician’s view of the world]

The logician’s system description DH of the domain in Example 1 consists of sorted signature SH and axioms describing the transition diagram tH . SH defines the names of objects and functions available for use, e.g., the sorts are place, thing, robot, and ob ject, with ob ject and robot being subsorts of thing, and textbook, printer and kitchenware being subsorts of ob ject. The statics include a relation next to(place; place), which describes if two places are next to each other. The domain’s fluents are: loc : thing ! place and in hand : robot ob ject ! boolean. These are basic fluents subject to the laws of inertia. The domain’s actions are move(robot; place), grasp(robot; ob ject), and putdown(robot; ob ject). The domain dynamics are defined using axioms that consist of causal laws: move(R; Pl) causes loc(R) = Pl grasp(R; Ob) causes in hand(R; Ob) putdown(R; Ob) causes :in hand(R; Ob) state constraints:

loc(Ob) = Pl if loc(R) = Pl; in hand(R; Ob) and executability conditions such as: impossible move(R; Pl2) if loc(R) = Pl1; :next to(Pl1; Pl2) impossible grasp(R; Ob) if loc(R) 6= loc(Ob) impossible putdown(R; Ob) if :in hand(R; Ob) For any given domain, the part of SH described so far is unlikely to change substantially. However, the last step in the construction of SH , which populates the basic sorts with specific objects, e.g robot = frob1g, place = fr1; : : : ; rng, and textbook = ftb1; : : : tbng, is likely to undergo frequent revisions. Ground instances of axioms are obtained by replacing variables by ground terms from the corresponding sorts.

In the transition diagram tH described by DH , actions are assumed to be deterministic, and values of fluents are assumed to be observable, which aid in fast, tentative planning and diagnostics for achieving the goals. This domain representation should ideally be tested extensively by including various recorded histories of the domain, which may include histories with prioritized defaults, and using the resulting programs to solve various reasoning tasks. 4

Statistician’s Description

In this section we describe our design of the statistician. Refinement: First we specify the deterministic version of the world of the statistician. It would be given by a system description DL defining a transition diagram tL which serves as a (deterministic) refinement of the logician’s diagram tH . Recall that tL is the result of increased resolution which magnifies some objects in the signature SH of system description DH of tH . Newly discovered parts of the magnified objects are referred to as its refined components. In our example, for instance, every room is magnified and is viewed as a collection of its component cells.

The refinement will be represented by a system description DL with signature SL. We say that a signature SL refines signature SH if it is obtained by:

Replacing every basic sort stH of SH , whose elements were magnified, by its coarse-resolution version stL = stH , and fine-resolution counterpart stL = fo1; : : : ; omg consisting of the components of magnified elements of stH , e.g., SL replaces sort place = fr1; : : : ; rng of rooms in SH by: place = fr1; : : : ; rng place = fc1; : : : ; cmg where rooms are magnified and viewed as a collection of newly discovered objects, their component cells c1; : : : ; cm. Introducing static relation component between magnified objects from stL and the corresponding newly-discovered objects from stL, e.g., relation component(c; r) is defined to be true iff cell c is in room r.

Replacing sort f luent of SH by its coarse-resolution copy f luent (a defined fluent) and its fine-resolution counterpart f luent (a basic fluent). In our example, the new fluents obtained by refinement are: loc : thing ! place loc : thing ! place

Other fluents (and their signatures) are unchanged.

Obtaining actions of SL by replacing the magnified parameters of the original actions from SH by their fineresolution counterparts. In our example, SL will contain original actions grasp and putdown, and new action move(robot; cell) of a robot moving to an (adjacent) cell. SL will also include knowledge-producing action test(robot; f luent; value) that activates algorithms on the robot to check the value of an observable fluent of DL in a given state, e.g., test(R; loc(T h);Cell). Note that test is a general action belonging to every refinement of SH . We also add fluents to describe the result of testing, e.g., observed(robot; f luent; value) is true if the most recent (direct/indirect) observation of f luent returned value. Axioms of the refined system description DL include axioms of DH , and domain-dependent axioms relating coarseresolution fluents and their fine-resolution counterparts, e.g., in our illustrative domain we have:

loc (T h) = Rm if component(Cl; Rm); loc(T h) = Cl which includes the new static component(c; r) for every cell c within every room r—next to(c1; c2) is defined in a similar manner for every pair of adjacent cells accessible from each other. More importantly, we add to DL, basic knowledge fluents that model the direct and indirect knowledge effects of sensing, and introduce axioms relating these fluents and the test actions. For instance:

test(R; F;Y ) causes dir obs(R; F;Y ) = true if F = Y: In our example, robot rob1 testing if location of object o is cell c will make basic knowledge fluent dir obs(rob1; loc(o); c) true if the object is indeed there. Note, that dir obs may have three possible values, true, false and undef —it is initially set to the third value. Another fluent, indir obs(rob1; loc(o); R) holds if loc(o) is observed to be true in some component cell of room R. The fluent’s value is observed if it is observed directly or indirectly. The designers of the statistician should make sure that tL specified by our system description DL is indeed a refinement, i.e., it matches the following formal definitions of refinement of a state and a system description.

Definition 1. [Refinement of a state]

A state d of tL is said to be a refinement of a state s of tH if: For every magnified fluent f from the signature of SH : For every other fluent of SH : f (x) = y 2 s iff f (x) = y 2 d f (x) = y 2 s iff f (x) = y 2 d

Definition 2. [Refinement of a system description]

Let DL and DH be system descriptions with transition diagrams tL and tH respectively. DL is a refinement of DH if: States of tL are the refinements of states of tH .

For every transition hs1; aH ; s2i of tH , every fluent f in a set F of simultaneously observable fluents, and every refinement d1 of s1, there is a path P in tL from d1 to a refinement d2 of s2 such that: – Every action of P is executed by the robot which executes aH . – Every state of P is a refinement of s1 or s2, i.e., no unrelated fluents are changed. – observed(R; f ;Y ) = true 2 d2 if ( f = Y ) 2 d2 and observed(R; f ;Y ) = f alse 2 d2 if ( f = Y ) 62 d2.

Randomization: Our next step is to expand DL to capture the non-determinism in action execution and observations on the robot, which is essential for the statistician. To do so, we extended ALd by non-deterministic causal laws: a causes f (x¯) : fY : p(Y )g if body a causes f (x¯) : sort name if body where the first statement says that if a is executed in a state satisfying body, f may take on any value from the set fY : p(Y )g \ range( f ) in the resulting state—second statement says that f may take any value from fsort name \ range( f )g. Randomized fine-resolution system description DLR is then obtained by replacing each action’s deterministic causal laws in DL by non-deterministic ones, declaring the affected fluent as a random fluent. For instance, in our example, the nondeterministic causal law for move is:

move(R;C2) causes loc(R) = fC : range(loc(R);C)g where defined fluent range is given by: range(loc(R);C) if loc(R) = C range(loc(R);C) if loc(R) = C1; next to(C;C1) where a robot moving to a cell can end up in other cells that are within range—the robot’s current cell and neighboring cells are all within range.

To complete our model of the statistician with probabilistic information, we run experiments that sample specific instances of each ground non-deterministic causal laws in DLR, have the robot execute the corresponding action multiple times, and collect statistics (e.g., counts) of the number of times each outcome of the corresponding fluent is obtained. These statistics collected in an initial training phase are used to compute causal probabilities of action outcomes, and the probability of observations being correct. Local symmetry assumptions are used to simplify this collection of statistics, e.g., movement from a cell to one of its neighbors is assumed to be the same for any cell, given a specific robot. The designer provides the required domain-specific information. In our example, if rob1 in cell c1 may reach fc1; c2; c3g when executing move(rob1; c2), the probabilities of these outcomes may be 0:1, 0:8, and 0:1 respectively. Similarly, the robot may compute that 0:85 is the probability with which it can recognize a specific object in a specific cell. Any prior beliefs about these probabilities (e.g., from a human) can be used as the initial belief that is revised by the experimental trials. Zooming: The statistician uses DLR and the computed probabilities for fine-resolution execution of the transition T = hs1; aH ; s2i 2 tH . Since reasoning probabilistically about all of DLR may result in incorrect behavior and can be computationally intractable, the statistician first identifies the part tLR(T ) of diagram tLR that is necessary for the fine-resolution execution of aH —we call this operation zooming. The size of tLR(T ) is decreased with respect to tLR by only considering refinements of s1 and s2, the states of tLR relevant to T . Next, fluents and actions not relevant to the execution of aH are removed based on the following definitions.

Definition 3. [Direct relevance]

An element y of a basic sort stH of DH is directly relevant to a transition T of tH if:

Element y occurs in aH ; or

For some f , f (x¯) = y belongs to s1 or s2 but not both. Consider the transition corresponding to robot rob1 moving from the kitchen to the o f f ice, i.e., aH = move(rob1; o f f ice). For this transition, element rob1 of sort robot, and elements o f f ice and kitchen of sort place, are relevant.

Definition 4. [Zoom] To construct DLR(T ), we need to determine the signature and the axioms describing the transition diagram tLR(T ). The signature of DLR(T ) is constructed as follows: 1. If stH is a sort of DH with at least one element directly relevant to T :

If sort stH is not magnified, it is its own zoomed counterpart stLz .

If sort stH is magnified, stLz is the set of components of elements of stH directly relevant to T .

The zoomed counterparts form a hierarchy of basic sorts of DLR(T ) (with subclass relation inherited from DLR). In our example, the sorts of DLR(T ) are robotLz = frob1g and placezL = fci : ci 2 kitchen [ o f f iceg. 2. Functions of DLR(T ) are those of DLR restricted to the identified sorts. Functions in our example are loc(rob1) taking values from placezL, next to(placezL; placezL), range(loc(rob1); placezL), and properly restricted functions related to testing these functions’ values. 3. Actions of DLR(T ) are restricted to the sorts identified above. In our example, the relevant actions are of the form move(rob1; ci) and test(rob1; loc(rob1); ci), where ci are individual elements of placezL.

The axioms of DLR(T ) are those of DLR that are restricted to the signature of DLR(T ). In our example, this interpretation removes the causal laws for grasp and put down, and removes the state constraint related to fluent in hand in DLR. Furthermore, in the causal law and executability condition corresponding to the action move, C can only take values from placezL, i.e., any cell in the kitchen or the o f f ice.

Example 3. [Example of zoom]

Assume that robot rob1 has to execute aH = grasp(rob1; tb1) to pickup textbook tb1—rob1 is, and tb1 is assumed to be, in the o f f ice. Zooming constructs the following signature:

Relevant sorts of DH are robot, ob ject, and place, and stLz = frobotLz ; ob jectLz ; placezLg are the zoomed counterparts in DLR(T ), with robotLz = frob1g, ob jectLz = ftb1g and placezL = fci : ci 2 o f f iceg.

Functions of DLR(T ) include (a) loc(robotLz ) and loc(ob jectLz ), basic fluents that takes values from placezL; (b) static next to(placezL; placezL); (c) defined fluent range(loc(robotLz ); placezL); and (d) the knowledge fluent dir obs restricted to the zoomed sorts, e.g., dir obs(robotLz ; loc(robotLz ); placezL) and dir obs(robotLz ; loc(ob jectLz ); placezL).

Actions of DLR(T ) include (a) move(robotLz ; placezL); (b) grasp(robotLz ; ob jectLz ); (c) putdown(robotLz ; ob jectLz ); and (d) knowledge-producing actions to test the location of rob1 and tb1, e.g., test(robotLz ; loc(ob jectLz ); placezL).

The axioms of DLR(T ) are those of DLR restricted to the signature of DLR(T ). For instance, these axioms include: have a high probability associated with them after invoking the policy several times. The latter case is interpreted as the move(rob1; c j) causes loc(rob1) = fC : range(loc(rob1);C)g failure to execute the coarse-resolution action. grasp(rob1; tb1) causes in hand(rob1; tb1) = ftrue; f alseg The observations and action outcomes obtained by exetest(rob1; loc(tb1); c j) causes dir obs(rob1; loc(tb1); c j) cuting the POMDP policy correspond to observations and = ftrue; f alseg if loc(tb1) = c j fluents in DLR(T ). This information, in turn, revises H , to be used for subsequent reasoning by the logician. If tH impossible move(rob1; c j) if loc(rob1) = ci; :next to(c j; ci) is constructed correctly, and the statistics collected during where range(loc(rob1);C) may hold for values in fci; c j; ckg the initial training phase correctly model the domain dywithin the range of the robot’s current location (ci), and namics, following an optimal policy produced by an exact are elements of placezL. The states of tLR(T ) thus include POMDP solver is most likely (among all possible policies) atoms of the form loc(rob1) = ci and loc(tb1) = c j, where to take the robot to the desired goal state [Littman, 1996; ci and c j are values in placezL, in hand(rob1; tb1), direct ob- Sondik, 1971]. Since we use an approximate solver for comservations of these atoms, and statics such as next to(ci; c j). putational efficiency, we obtain a bound on the regret (i.e., Specific actions include move(rob1; ci), grasp(rob1; tb1), loss in value) due to the computed policy [Ong et al., 2010]. putdown(rob1; tb1) and test actions. Due to space constraints, we provide (below) a simplistic example of constructing a POMDP.

POMDP construction: The statistician uses system description DLR(T ), and the learned probabilities, to construct a POMDP for the probabilistic implementation of aH in state s1 of tH . It may be possible to use other (more computationally efficient) probabilistic models for implementing specific actions. Also, it may be possible to use specific (computationally efficient) heuristic or probabilistic algorithms for specific tasks such as path planning. However, POMDPs provide (a) principled and quantifiable trade-off between accuracy and computational efficiency in the presence of uncertainty; and (b) near-optimal solution if the POMDP is modeled accurately. Also, our architecture only constructs a POMDP for a small (relevant) part of the domain, significantly reducing the computational complexity of solving the POMDP. Furthermore, many POMDPs for any given domain can be precomputed, solved and used as needed.

A POMDP is described by a tuple hSL; AL; ZL; T L; OL; RLi for a specific goal state. Elements of this tuple correspond to the set of states, set of actions, set of values of observable fluents, the state transition function, the observation function, and the reward specification, with the last three elements being based on the statistics acquired during randomization— for details about POMDPs and their use in AI and robotics, see [Littman, 1996; Zhang et al., 2015].

The POMDP formulation considers states to be partially observable, and reasons with probability distributions over the states, called belief states. Functions T L and OL describe a probabilistic transition diagram over belief states.

The POMDP formulation also implicitly includes a history of observations and actions—the current belief state is assumed to be the result of all information obtained in so far.

L The POMDP tuple is used to compute a policy p : bt 7! at+1, which maps belief states to actions, using an algorithm that maximizes the reward over a planning horizon. The policy is then used to choose an action in the current belief state, revising the belief state through Bayesian updates after executing the action at+1 and receiving an observation ot+1. The belief update continues until policy execution is terminated. In our case, a terminal action is executed when it has a higher (expected cumulative) utility than continuing to execute nonterminal actions. This action choice happens when the belief in a specific state is high (e.g., 0:8), or none of the states

Example 4. [POMDP construction]

Consider rob1 in the o f f ice and the implementation of aH = move(rob1; kitchen), with one cell in each room. Due to zooming, the robot only reasons about its location. Assume that a move from a cell to a neighboring cell succeeds with probability 0:85—otherwise, the robot remains where it is. Assume that all non-terminal actions have unit cost. Terminating the POMDP policy after reaching cell 1 (in kitchen) receives a large positive reward (100), whereas termination while in cell 0 (in o f f ice) receives a large negative reward ( 100). Elements of the POMDP are described (below) in the format used by the POMDP solver [Ong et al., 2010].

The states correspond to possible robot locations, and absb is a terminal state. The actions correspond to the robot moving to specific cells, testing its cell location, or terminating policy execution (transitioning to absb). The robot observes its cell location, or receives no observation. Knowledgeproducing actions do not cause a state transition, and actions that change the state do not provide observations. Our architecture constructs such data structures for complex domains by zooming to the relevant part of the system description (as described earlier). Also, for each state si and action a j, the corresponding ASP program P(DLR(T ); si; a j) is solved to (a) identify inconsistencies and eliminate impossible states, e.g., the robot and an object cannot be in different locations when the robot is holding the object; and (b) identify possible state transitions, eliminating impossible transitions in the construction of the transition function. discount: 0.99 states: robot-0 robot-1 absb actions: move-0 move-1 test-0 test--1 done observations: rob-found rob-not-found none % Transition function format % T : action : S x S’ -> [0, 1] T: move-0 1 0 0 0.85 0.15 0 0 0 1 T: move-1 % Reward function format % R : action : s_i : s_i’ : real value R: * : * : * : -1 R: done : robot-0 : * : -100 R: done : robot-1 : * : 100 5

Reasoning System

Algorithm 1 describes the reasoning loop. For any given goal, the logician reasons with the system description DH and a recorded history H with initial state defaults, accounting for any discrepancies between observations and predictions to compute a plan of abstract actions. For an abstract action aH , the statistician zooms to the relevant part of the randomized refinement of DH . The corresponding system description DLR(T ) and probabilities are used to construct and solve a POMDP. The POMDP policy is invoked repeatedly (until termination) to execute a sequence of concrete actions. The corresponding outcomes are reported to the logician to be used for subsequent reasoning. Correctness of this control loop is ensured by (1) applying the algorithm that reduces coarseresolution planning and diagnostics to computing answer sets of the corresponding program; (2) using the refinement and zoom operations as described above; and (3) using POMDPs to probabilistically execute an action sequence for each abstract action in the logician’s plan.

Experimental trials, both in simulation and on physical robots assisting humans in finding and moving objects an of1 while goal is not achieved do 2 Logician uses tH and H to find a plan of abstract actions, a1H ; : : : ; anH to achieve the goal. if no plan exists then

return failure end i := 1, continue1 := true while continue1 do

Check pre-requisites of aiH . if pre-requisites not satisfied then

continue1 := false else

Statistician zooms to the relevant part of tLR for executing aiH and constructs a POMDP.

Statistician solves POMDP to compute an action policy to implement aiH . continue2 := true while continue2 do

Statistician invokes POMDP policy to select and execute an action, obtain observation, and update belief state. if terminal action invoked then

Statistician communicates success or failure of aiH to the logician, to be recorded in H . continue2 = false i := i+1 continue1 := (i < n + 1) end fice domain, indicate that our architecture provides significantly higher efficiency and accuracy in comparison with using just POMDPs with hierarchical decompositions, similar to our prior work [Sridharan et al., 2015]. These results are not described here because the focus is on describing the steps in the design process, and due to space limitations. 6

Conclusions

This paper described the systematic design of robots capable of representing and reasoning with logic-based and probabilistic descriptions of domain knowledge and uncertainty. Our architecture uses tightly-coupled transition diagrams of the domain at two levels of granularity, with a fine-resolution diagram being a refinement of a coarse-resolution diagram. For any given goal, non-monotonic logical reasoning at the coarse-resolution plans a sequence of abstract actions. Each abstract action is implemented probabilistically as a sequence of concrete actions by zooming to the relevant part of the fineresolution description, constructing and solving a a POMDP, and invoking a policy until termination. The corresponding outcomes revise the coarse-resolution history for subsequent reasoning. The design steps are illustrated using examples, which indicate that the architecture supports reasoning at the sensorimotor level and the cognitive level with violation of defaults, and unreliable observations and actions.

Acknowledgements

The authors thank Jeremy Wyatt and Shiqi Zhang for discussions related to the architecture described in this paper. The first author was supported in part by the US Office of Naval Research Science of Autonomy award N00014-13-1-0766. [Kaelbling and Lozano-Perez, 2013] Leslie Kaelbling and Tomas Lozano-Perez. Integrated Task and Motion Planning in Belief Space. International Journal of Robotics Research, 32(9-10), 2013.

[Bai et al., 2014 ]

Haoyu

Bai , David Hsu, and Wee Sun Lee. Integrated Perception and Planning in the Continuous Space: A POMDP Approach . International Journal of Robotics Research , 33 ( 8 ), 2014 .

[Balai et al., 2013 ]

Evgenii

Balai ,

Michael

Gelfond ,

and Yuanlin

Zhang . Towards Answer Set Programming with Sorts . In International Conference on Logic Programming and Nonmonotonic Reasoning , Corunna, Spain, 2013 .

[Balduccini and Gelfond , 2003]

Marcello

Balduccini and

Michael

Gelfond . Logic Programs with Consistency-Restoring Rules . In AAAI Spring Symposium on Logical Formalization of Commonsense Reasoning , pages 9 - 18 , 2003 .

[Balduccini et al., 2014 ]

Marcello

Balduccini , William C. Regli, and Duc

Nguyen . An ASP-Based Architecture for Autonomous UAVs in Dynamic Environments: Progress Report . In International Workshop on Non-Monotonic Reasoning (NMR), Vienna, Austria, July 17-19 , 2014 .

[Baral et al., 2009 ]

Chitta

Baral ,

Michael

Gelfond , and

Nelson

Rushton . Probabilistic Reasoning with Answer Sets . Theory and Practice of Logic Programming , 9 ( 1 ): 57 - 144 , January 2009 .

[Chen et al., 2012 ]

Xiaoping

Chen , Jiongkun Xie, Jianmin Ji, and

Zhiqiang

Sui . Toward Open Knowledge Enabling for HumanRobot Interaction . Human-Robot Interaction , 1 ( 2 ): 100 - 117 , 2012 .

[Gelfond and Inclezan , 2013]

Michael

Gelfond and

Daniela

Inclezan . Some Properties of System Descriptions of ALd . Journal of Applied Non-Classical Logics, Special Issue on Equilibrium Logic and Answer Set Programming , 23 ( 1-2 ): 105 - 120 , 2013 .

[Gelfond and Kahl , 2014] Michael Gelfond and Knowledge Representation, Reasoning and the Intelligent Agents . Cambridge University Press, 2014 .

[Lee and Wang , 2015]

Joohyung

Lee and

Wang . A Probabilistic Extension of the Stable Model Semantics . In AAAI Spring Symposium on Logical Formalizations of Commonsense Reasoning , Stanford, USA, March 2015 .

[Lee et al., 2013 ]

Joohyun

Lee ,

Vladimir

Lifschitz , and

Fangkai

Yang . Action Language BC: Preliminary Report. In International Joint Conference on Artificial Intelligence (IJCAI) , Beijing, China, August 3-9 , 2013 .

[Leone et al., 2006 ]

Leone , G. Pfeifer,

Faber ,

Eiter , G. Gottlob,

Perri , and

Scarcello . The DLV System for Knowledge Representation and Reasoning . ACM Transactions on Computational Logic , 7 ( 3 ): 499 - 562 , 2006 .

[Littman , 1996 ]

Michael

Littman . Algorithms for Sequential Decision Making . PhD thesis , Brown University, Department of Computer Science, Providence, USA, March 1996 .

[Milch et al., 2006 ]

Brian

Milch , Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, and

Andrey

Kolobov . BLOG: Probabilistic Models with Unknown Objects. In Statistical Relational Learning . MIT Press, 2006 .

[Ong et al., 2010 ] Sylvie

Ong , Shao Wei Png, David Hsu, and Wee Sun Lee. Planning under Uncertainty for Robotic Tasks with Mixed Observability . IJRR, 29 ( 8 ): 1053 - 1068 , July 2010 .

[Richardson and Domingos , 2006]

Matthew

Richardson and

Pedro

Domingos . Markov Logic Networks . Machine learning , 62 ( 1 ), 2006 .

[Saribatur et al., 2014 ] Zeynep G. Saribatur, Esra Erdem, and

Volkan

Patoglu . Cognitive Factories with Multiple Teams of Heterogeneous Robots: Hybrid Reasoning for Optimal Feasible Global Plans . In International Conference on Intelligent Robots and Systems (IROS) , Chicago, USA, 2014 .

[Sondik , 1971]

Edward J.

Sondik . The Optimal Control of Partially Observable Markov Processes . PhD thesis , Stanford University, 1971 .

[Sridharan et al., 2015 ]

Mohan

Sridharan , Michael Gelfond, Shiqi Zhang, and

Jeremy

Wyatt . A Refinement-Based Architecture for Knowledge Representation and Reasoning in Robotics . Technical report , Unrefereed CoRR abstract: http://arxiv.org/ abs/1508.03891, August 2015 .

[Sridharan et al., 2016 ]

Mohan

Sridharan , Prashanth Devarakonda, and

Rashmica

Gupta . Discovering Domain Axioms Using Relational Reinforcement Learning and Declarative Programming . In ICAPS Workshop on Planning and Robotics (PlanRob) , London, UK, June 13-14, 2016 .

[Zhang and Stone , 2015]

Shiqi

Zhang and

Peter

Stone . CORPP: Commonsense Reasoning and Probabilistic Planning, as Applied to Dialog with a Mobile Robot . In AAAI Conference on Artificial Intelligence (AAAI) , pages 1394 - 1400 , Austin, USA, 2015 .

[Zhang et al., 2014 ]

Shiqi

Zhang , Mohan Sridharan,

Michael

Gelfond , and

Jeremy

Wyatt . Towards An Architecture for Knowledge Representation and Reasoning in Robotics . In International Conference on Social Robotics (ICSR) , Sydney, Australia, 2014 .

[Zhang et al., 2015 ]

Shiqi

Zhang , Mohan Sridharan, and

Jeremy

Wyatt . Mixed Logical Inference and Probabilistic Planning for Robots in Unreliable Worlds . IEEE Transactions on Robotics , 31 ( 3 ): 699 - 713 , 2015 .