Using Knowledge Representation and Reasoning Tools in the Design of Robots Mohan Sridharan Michael Gelfond Electrical and Computer Engineering Department of Computer Science The University of Auckland, New Zealand Texas Tech University, USA m.sridharan@auckland.ac.nz michael.gelfond@ttu.edu Abstract tion and reasoning tools tailored towards different reasoning tasks. In comparison with our prior work [Zhang et al., 2014], The paper describes the authors’ experience in us- we introduce precise (and some new) definitions of basic no- ing knowledge representation and reasoning tools tions used to build mathematical models of the domain. in the design of robots. The focus is on the system- atic construction of models of the robot’s capabili- We started with describing the state transition diagram ties and its domain at different resolutions, and on modeling possible trajectories of the robot’s domain, in ac- establishing a clear relationship between the mod- tion description language ALd [Gelfond and Inclezan, 2013]. els at the different resolutions. Using ALd allowed us to concisely represent the diagram even for domains with complex relationships between fluents, and represent recorded histories of the robot’s actions and obser- 1 Introduction vations. We expanded the standard notion of recorded history Our prior paper described an architecture for robots [Zhang et to include the robot’s default knowledge about the initial sit- al., 2014], whose reasoning system we view (in this paper) as uation, which simplifies the diagnostics task. Although there an interplay between a logician and a statistician. The former exist action languages that allow causal laws specifying de- has an abstract, coarse-resolution view of the world. The lo- fault values of fluents at arbitrary time steps [Lee et al., 2013], gician perceives an office domain, for instance, as a collection such languages are too powerful for our purposes, and occa- of connected rooms containing various types of objects, and sionally pose difficulties with representing all exceptions to assumes that the robot is capable of successfully moving be- such defaults when the domain is expanded. Reasoning with tween two adjacent rooms, and of finding an object located in theories in ALd is performed by reducing planning, diagnos- a room. If the robot’s goal is to find a specific book, the logi- tics, and other robotic tasks to computing answer sets of a cian can design a plan of abstract actions directing the robot program in CR-Prolog, a variant of Answer Set Prolog (ASP) to go to the library, where books are normally stored. The that supports representation and reasoning with defaults and first action of this plan, e.g., move to the lab, which happens their direct and indirect exceptions [Balduccini and Gelfond, to be the first room on the way to the library, will be passed to 2003]. This reduction, which considers histories with initial a statistician, who has a rather different, fine-resolution view state defaults, is the first novel contribution, but it expands of the world. According to the statistician, the same office previous work that established close relationship between ac- domain consists of small grid cells belonging to the different tion languages and logic programming with answer set se- rooms. The statistician believes that the robot is usually capa- mantics. Existing efficient solvers help automate the neces- ble of moving to a neighboring cell, and of checking a cell for sary reasoning [Leone et al., 2006]. a target object, but these actions are non-deterministic—they The second novel contribution is the precise definition of only succeed with some probability that the statistician knows the world view of the statistician as a refinement of the transi- a priori or can learn by running some experiments. For the tion diagram of the logician. The new diagram can be viewed coarse-resolution action of moving to the lab, the statistician as the result of increasing the resolution of the robot’s ability thus has the robot move from cell to cell, observe its position, to see the world. In our example, the new world view includes and revise its belief of its position. If the statistician has a suf- cells in rooms, which were so far invisible, and relations and ficiently high confidence that the abstract action passed by the actions involving these cells, e.g. location of an object in a logician has been executed, i.e., that the robot has moved to cell, and the action of moving from a cell to its neighbor. the lab, this information is reported back to the logician. Oth- Construction of such a refinement may require some amount erwise, statistician reports failure after some time, and the of domain knowledge, but the statistician’s diagram is related logician has to diagnose and replan. Even this simple sce- to the logician’s diagram in a precise way that is captured by nario shows that our robot’s architecture should be capable of our novel mathematical definition of refinement. We provide representing and manipulating both logical and probabilistic detailed guidelines for the construction of the refinement’s knowledge. In this paper we report our experience in the sys- action theory, which includes axioms establishing the rela- tematic design of such a robot using knowledge representa- tionship between fluents of the coarse-resolution diagram and their fine-resolution counterparts, as well as the axioms de- ple declarative programming and continuous-time planners scribing the effects of observations. for path planning in teams [Saribatur et al., 2014], combine After the refinement of the domain is constructed, the fine- a probabilistic extension of ASP with POMDPs for human- resolution description is randomized, i.e., modified to con- robot dialog [Zhang and Stone, 2015], combine logic pro- sider the non-deterministic effects of fine-resolution actions gramming and reinforcement learning to discover domain ax- and observations. The third novel contribution is to expand ioms [Sridharan et al., 2016], or use a three-layered orga- the action language ALd by a construct that supports the nat- nization for knowledge and reasoning with first-order logic ural representation of such effects. Probabilities computed and probabilities in open worlds [Hanheide et al., 2015]. experimentally by the statistician are then associated with the Some general formulations that combine logical and prob- purely logical diagram to obtain a probabilistic diagram. Rea- abilistic reasoning include Markov logic network [Richard- soning in this new diagram also considers belief states, i.e., son and Domingos, 2006], Bayesian logic [Milch et al., probability distributions over states of the logical diagram. 2006], and probabilistic extensions to ASP [Baral et al., 2009; Theoretically, the statistician can now use probabilistic graph- Lee and Wang, 2015]. However, algorithms based on first- ical models such as a partially observable Markov decision order logic do not support non-monotonic logical reasoning process (POMDP) to select and execute the “best” possible and do not provide the desired expressiveness—it is not al- action, make observations, and update the belief state until ways possible to associate numbers with logic statements to the abstract action provided by the logician is completed with express degrees of belief. Algorithms based on logic pro- high probability, or all hope of doing so is lost. However, gramming do not support one or more of the desired capabil- POMDP algorithms need a representation of the diagram that ities such as incremental revision of (probabilistic) informa- lists all possible combinations of physical states and actions, tion; and reasoning with large probabilistic components. As a which can become intractable even for a comparatively small step towards addressing these limitations, we have developed collection of fluents. To avoid this problem, the robot zooms architectures that couple declarative programming and prob- to the part of the fine-resolution diagram that is relevant to the abilistic graphical models [Zhang et al., 2014; 2015]. Here, execution of the coarse-resolution action provided by the lo- we expand on our prior work to explore the systematic con- gician. For instance, to execute the action “move from room struction of robots with the desired knowledge representation R1 to room R2 ”, with the two rooms being next to each other, and reasoning capabilities. We illustrate these design steps the fluents and actions related to other rooms are eliminated, and our contributions using the following example. dramatically reducing the size of the corresponding diagram. Example 1. [Office Domain] Consider a robot assigned the The fourth new contribution is a precise definition of zooming, goal of moving specific objects to specific places in an office which helps automate this process. Finally, the zoomed part domain. This domain contains: of the randomized fine-resolution diagram is represented in • The sorts: place, thing, robot, and ob ject, with ob ject the format suitable for use with POMDP solvers. The corre- and robot being subsorts of thing. Sorts textbook, sponding policy is invoked to execute a sequence of concrete printer and kitchenware, are subsorts of the sort ob ject. actions that implements the abstract action, with the action • Four places: o f f ice, main library, aux library, and outcomes being added to the coarse-resolution history. kitchen of which some of them are directly accessible from each other. 2 Related Work and Example Domain • An instance of the sort robot, called rob1 , and a number of instances of subsorts of the sort ob ject. Logic-based representations and probabilistic graphical mod- Although this domain may appear simplistic, it illustrates els have been used to control sensing, navigation and interac- many of the representation, reasoning, perception, and actua- tion for robots [Bai et al., 2014; Hawes et al., 2010]. Formu- tion challenges that exist in more complex robotics domains. lations based on probabilistic representations (by themselves) make it difficult to perform commonsense reasoning, whereas 3 Logician’s Description approaches based on logic programming tend to require con- siderable prior knowledge of the domain and the agent’s ca- We start with the logician’s view of the world. pabilities, and make it difficult to merge new, unreliable infor- Action Language ALd : The logician’s state transition dia- mation with an existing knowledge base. Theories of reason- gram is specified as a system description (theory) in a variant ing about actions and change, and the non-monotonic logical of action language ALd , which allows statements of the form: reasoning ability of ASP have been used by an international research community, e.g., for natural language human-robot a causes f (x̄) = y if body interaction [Chen et al., 2012], control of unmanned aerial f (x̄) = y if body vehicles [Balduccini et al., 2014], and coordination of robot impossible a0 , . . . , an if body teams [Saribatur et al., 2014]. However, the basic version of ASP does not support probabilistic representation of uncer- The first statement describes an action’s direct effect—if ac- tainty, whereas a lot of information extracted from sensors tion a is executed in a state satisfying condition body, the and actuators is represented probabilistically. value of fluent f in the resulting state will be y. For instance: Researchers have designed architectures for robots that move(R, Pl) causes loc(R) = Pl combine logic-based and probabilistic algorithms for task and motion planning [Kaelbling and Lozano-Perez, 2013], cou- says that a robot R moving to place Pl will end up in Pl. The second statement is a state constraint, which says that to defaults (like an observation above) and uses CR-Prolog, f (x̄) = y in any state that satisfies body. For instance: which justifies our departure from standard ASP. loc(Ob) = Pl if loc(R) = Pl, in hand(R, Ob) Planning and diagnostics: The description of the logician’s knowledge, as provided above, is sufficient for adequately guarantees that the object grasped by a robot shares the performing planning and diagnostics—these tasks are re- robot’s location. The third statement prohibits simultaneous duced to computing answer sets of program Π(DH , H ) com- execution of actions a0 , . . . , an in a state satisfying body. The bined with standard encoding of the robot’s goal. This pro- functions in these statements are of two types, those whose gram is passed to an efficient ASP solver—we use SPARC, values can be changed by actions (fluents) and those whose which expands CR-Prolog and provides explicit constructs values cannot be changed (statics). Fluent can be basic or to specify objects, relations, and their sorts [Balai et al., defined; the former is subject to inertia laws while the latter 2013]. Atoms of the form occurs(action, step) belong- is defined in terms of other fluents. Formal semantics of the ing to the answer set obtained by solving this program, original ALd is discussed in [Gelfond and Kahl, 2014]. Un- e.g., occurs(a1 , 1), . . . , occurs(an , n), represent the shortest fortunately, it only allows boolean fluents, a restriction that is sequence of abstract actions for achieving the logician’s goal. removed in our variant of the language. Prior research results in the theory of action languages and Histories with defaults: In addition to action theory of ALd , ASP ensure that the plan is provably correct. In a similar the logician’s knowledge contains a recorded history, a col- manner, suitable atoms in the answer set can be used for di- lection of the robot’s observations and actions. In action lan- agnostics, e.g., to explain unexpected observations. guages, such a recorded history typically consists of state- Example 2. [Logician’s view of the world] ments of the form obs( f , y,true, i) (or obs( f , y, f alse, i))—at The logician’s system description DH of the domain in Ex- step i of the robot’s trajectory, the value of fluent f is observed ample 1 consists of sorted signature ΣH and axioms describ- to be (or not to be) y; and hpd(a, i)—action a was success- ing the transition diagram τH . ΣH defines the names of fully executed (i.e., happened) at step i. The recorded history objects and functions available for use, e.g., the sorts are defines a collection of models, trajectories of the system com- place, thing, robot, and ob ject, with ob ject and robot be- patible with this record. This syntax of histories was rather ing subsorts of thing, and textbook, printer and kitchenware limited for our purposes, and the robot can benefit substan- being subsorts of ob ject. The statics include a relation tially from some forms of default knowledge. In Example 1, next to(place, place), which describes if two places are next the robot can, for instance, be told that books are normally to each other. The domain’s fluents are: loc : thing → place kept in the library. To address this limitation, we introduced and in hand : robot × ob ject → boolean. These are ba- an additional type of historical record: sic fluents subject to the laws of inertia. The domain’s initial default f (x̄) = y if body actions are move(robot, place), grasp(robot, ob ject), and putdown(robot, ob ject). The domain dynamics are defined to assume that if the initial state satisfies body, the value of using axioms that consist of causal laws: f (x̄) in this state is y. For instance, the default statement: move(R, Pl) causes loc(R) = Pl initial default loc(X) = main library if textbook(X) grasp(R, Ob) causes in hand(R, Ob) says that textbooks are normally kept in the library, whereas putdown(R, Ob) causes ¬in hand(R, Ob) the default statement: state constraints: initial default loc(X) =o f f ice if textbook(X), loc(Ob) = Pl if loc(R) = Pl, in hand(R, Ob) loc(X) 6= main library. and executability conditions such as: gives the second most likely location for a textbook. Such impossible move(R, Pl2 ) if loc(R) = Pl1 , ¬next to(Pl1 , Pl2 ) defaults may substantially simplify the planning of the logi- impossible grasp(R, Ob) if loc(R) 6= loc(Ob) cian. For instance, the plan for finding a textbook tb will impossible putdown(R, Ob) if ¬in hand(R, Ob) consist of going directly to the library and looking for the book there. However, defaults are also useful for diagnostics. For any given domain, the part of ΣH described so far is un- For instance, if tb is not found in the library, i.e., history in- likely to change substantially. However, the last step in the cludes obs(loc(tb), library, f alse, i), the robot would realize construction of ΣH , which populates the basic sorts with spe- that this observation defeats the first default, try the second cific objects, e.g robot = {rob1 }, place = {r1 , . . . , rn }, and default, and look for the book in the office. To ensure such textbook = {tb1 , . . . tbn }, is likely to undergo frequent revi- behavior, we had to redefine the notion of a model of a his- sions. Ground instances of axioms are obtained by replacing tory, and design a new algorithm for computing such models. variables by ground terms from the corresponding sorts. To this end, given an ALd theory DH and recorded history H , In the transition diagram τH described by DH , actions are we construct the program Π(DH , H ) consisting of the stan- assumed to be deterministic, and values of fluents are as- dard encoding of DH into ASP, and the collection of atoms sumed to be observable, which aid in fast, tentative planning representing observations and actions of H together with the and diagnostics for achieving the goals. This domain rep- initial state defaults. This encoding allows indirect exceptions resentation should ideally be tested extensively by including various recorded histories of the domain, which may include in our illustrative domain we have: histories with prioritized defaults, and using the resulting pro- grams to solve various reasoning tasks. loc∗ (T h) = Rm if component(Cl, Rm), loc(T h) = Cl which includes the new static component(c, r) for every cell 4 Statistician’s Description c within every room r—next to(c1 , c2 ) is defined in a similar In this section we describe our design of the statistician. manner for every pair of adjacent cells accessible from each other. More importantly, we add to DL , basic knowledge flu- Refinement: First we specify the deterministic version of the ents that model the direct and indirect knowledge effects of world of the statistician. It would be given by a system de- sensing, and introduce axioms relating these fluents and the scription DL defining a transition diagram τL which serves test actions. For instance: as a (deterministic) refinement of the logician’s diagram τH . Recall that τL is the result of increased resolution which mag- test(R, F,Y ) causes dir obs(R, F,Y ) = true if F = Y. nifies some objects in the signature ΣH of system description In our example, robot rob1 testing if location of object o is cell DH of τH . Newly discovered parts of the magnified objects c will make basic knowledge fluent dir obs(rob1 , loc(o), c) are referred to as its refined components. In our example, for true if the object is indeed there. Note, that dir obs may have instance, every room is magnified and is viewed as a collec- three possible values, true, false and undef —it is initially set tion of its component cells. to the third value. Another fluent, indir obs(rob1 , loc(o), R) The refinement will be represented by a system descrip- holds if loc(o) is observed to be true in some component cell tion DL with signature ΣL . We say that a signature ΣL refines of room R. The fluent’s value is observed if it is observed signature ΣH if it is obtained by: directly or indirectly. The designers of the statistician should • Replacing every basic sort stH of ΣH , whose elements were make sure that τL specified by our system description DL is magnified, by its coarse-resolution version stL∗ = stH , and indeed a refinement, i.e., it matches the following formal def- fine-resolution counterpart stL = {o1 , . . . , om } consisting of initions of refinement of a state and a system description. the components of magnified elements of stH , e.g., ΣL re- places sort place = {r1 , . . . , rn } of rooms in ΣH by: Definition 1. [Refinement of a state] ∗ place = {r1 , . . . , rn } A state δ of τL is said to be a refinement of a state σ of τH if: place = {c1 , . . . , cm } • For every magnified fluent f from the signature of ΣH : where rooms are magnified and viewed as a collection of f (x) = y ∈ σ iff f ∗ (x) = y ∈ δ newly discovered objects, their component cells c1 , . . . , cm . • For every other fluent of ΣH : • Introducing static relation component between magnified objects from stL∗ and the corresponding newly-discovered f (x) = y ∈ σ iff f (x) = y ∈ δ objects from stL , e.g., relation component(c, r) is defined to be true iff cell c is in room r. Definition 2. [Refinement of a system description] • Replacing sort f luent of ΣH by its coarse-resolution copy Let DL and DH be system descriptions with transition dia- f luent ∗ (a defined fluent) and its fine-resolution counter- grams τL and τH respectively. DL is a refinement of DH if: part f luent (a basic fluent). In our example, the new fluents • States of τL are the refinements of states of τH . obtained by refinement are: • For every transition hσ1 , aH , σ2 i of τH , every fluent f in loc∗ : thing → place∗ a set F of simultaneously observable fluents, and every loc : thing → place refinement δ1 of σ1 , there is a path P in τL from δ1 to a refinement δ2 of σ2 such that: Other fluents (and their signatures) are unchanged. – Every action of P is executed by the robot which • Obtaining actions of ΣL by replacing the magnified pa- executes aH . rameters of the original actions from ΣH by their fine- – Every state of P is a refinement of σ1 or σ2 , i.e., no resolution counterparts. In our example, ΣL will con- unrelated fluents are changed. tain original actions grasp and putdown, and new ac- tion move(robot, cell) of a robot moving to an (adjacent) – observed(R, f ,Y ) = true ∈ δ2 if ( f = Y ) ∈ δ2 and cell. ΣL will also include knowledge-producing action observed(R, f ,Y ) = f alse ∈ δ2 if ( f = Y ) 6∈ δ2 . test(robot, f luent, value) that activates algorithms on the Randomization: Our next step is to expand DL to capture robot to check the value of an observable fluent of DL in the non-determinism in action execution and observations on a given state, e.g., test(R, loc(T h),Cell). Note that test the robot, which is essential for the statistician. To do so, we is a general action belonging to every refinement of ΣH . extended ALd by non-deterministic causal laws: We also add fluents to describe the result of testing, e.g., observed(robot, f luent, value) is true if the most recent a causes f (x̄) : {Y : p(Y )} if body (direct/indirect) observation of f luent returned value. a causes f (x̄) : sort name if body Axioms of the refined system description DL include ax- ioms of DH , and domain-dependent axioms relating coarse- where the first statement says that if a is executed in a state resolution fluents and their fine-resolution counterparts, e.g., satisfying body, f may take on any value from the set {Y : p(Y )} ∩ range( f ) in the resulting state—second statement Definition 4. [Zoom] says that f may take any value from {sort name ∩ range( f )}. To construct DLR (T ), we need to determine the signature and Randomized fine-resolution system description DLR is then the axioms describing the transition diagram τLR (T ). The sig- obtained by replacing each action’s deterministic causal laws nature of DLR (T ) is constructed as follows: in DL by non-deterministic ones, declaring the affected fluent as a random fluent. For instance, in our example, the non- 1. If stH is a sort of DH with at least one element directly deterministic causal law for move is: relevant to T : move(R,C2 ) causes loc(R) = {C : range(loc(R),C)} • If sort stH is not magnified, it is its own zoomed counterpart stLz . where defined fluent range is given by: • If sort stH is magnified, stLz is the set of components range(loc(R),C) if loc(R) = C of elements of stH directly relevant to T . range(loc(R),C) if loc(R) = C1 , next to(C,C1 ) The zoomed counterparts form a hierarchy of basic sorts where a robot moving to a cell can end up in other cells that of DLR (T ) (with subclass relation inherited from DLR ). are within range—the robot’s current cell and neighboring In our example, the sorts of DLR (T ) are robotLz = {rob1 } cells are all within range. and placezL = {ci : ci ∈ kitchen ∪ o f f ice}. To complete our model of the statistician with probabilis- tic information, we run experiments that sample specific in- 2. Functions of DLR (T ) are those of DLR restricted to the stances of each ground non-deterministic causal laws in DLR , identified sorts. Functions in our example are loc(rob1 ) have the robot execute the corresponding action multiple taking values from placezL , next to(placezL , placezL ), times, and collect statistics (e.g., counts) of the number of range(loc(rob1 ), placezL ), and properly restricted func- times each outcome of the corresponding fluent is obtained. tions related to testing these functions’ values. These statistics collected in an initial training phase are used 3. Actions of DLR (T ) are restricted to the sorts identified to compute causal probabilities of action outcomes, and the above. In our example, the relevant actions are of the probability of observations being correct. Local symmetry form move(rob1 , ci ) and test(rob1 , loc(rob1 ), ci ), where assumptions are used to simplify this collection of statistics, ci are individual elements of placezL . e.g., movement from a cell to one of its neighbors is assumed to be the same for any cell, given a specific robot. The de- The axioms of DLR (T ) are those of DLR that are restricted signer provides the required domain-specific information. In to the signature of DLR (T ). In our example, this interpreta- our example, if rob1 in cell c1 may reach {c1 , c2 , c3 } when tion removes the causal laws for grasp and put down, and executing move(rob1 , c2 ), the probabilities of these outcomes removes the state constraint related to fluent in hand in DLR . may be 0.1, 0.8, and 0.1 respectively. Similarly, the robot Furthermore, in the causal law and executability condition may compute that 0.85 is the probability with which it can corresponding to the action move, C can only take values from recognize a specific object in a specific cell. Any prior beliefs placezL , i.e., any cell in the kitchen or the o f f ice. about these probabilities (e.g., from a human) can be used as the initial belief that is revised by the experimental trials. Example 3. [Example of zoom] Zooming: The statistician uses DLR and the computed prob- Assume that robot rob1 has to execute aH = grasp(rob1 ,tb1 ) abilities for fine-resolution execution of the transition T = to pickup textbook tb1 —rob1 is, and tb1 is assumed to be, in hσ1 , aH , σ2 i ∈ τH . Since reasoning probabilistically about all the o f f ice. Zooming constructs the following signature: of DLR may result in incorrect behavior and can be compu- tationally intractable, the statistician first identifies the part • Relevant sorts of DH are robot, ob ject, and place, and τLR (T ) of diagram τLR that is necessary for the fine-resolution stLz = {robotLz , ob jectLz , placezL } are the zoomed counter- execution of aH —we call this operation zooming. The size parts in DLR (T ), with robotLz = {rob1 }, ob jectLz = {tb1 } of τLR (T ) is decreased with respect to τLR by only consider- and placezL = {ci : ci ∈ o f f ice}. ing refinements of σ1 and σ2 , the states of τLR relevant to T . Next, fluents and actions not relevant to the execution of aH • Functions of DLR (T ) include (a) loc(robotLz ) and are removed based on the following definitions. loc(ob jectLz ), basic fluents that takes values from placezL ; (b) static next to(placezL , placezL ); (c) de- Definition 3. [Direct relevance] fined fluent range(loc(robotLz ), placezL ); and (d) the An element y of a basic sort stH of DH is directly relevant to knowledge fluent dir obs restricted to the zoomed a transition T of τH if: sorts, e.g., dir obs(robotLz , loc(robotLz ), placezL ) and dir obs(robotLz , loc(ob jectLz ), placezL ). • Element y occurs in aH ; or • For some f , f (x̄) = y belongs to σ1 or σ2 but not both. • Actions of DLR (T ) include (a) move(robotLz , placezL ); (b) grasp(robotLz , ob jectLz ); (c) putdown(robotLz , ob jectLz ); Consider the transition corresponding to robot rob1 moving and (d) knowledge-producing actions to test the location from the kitchen to the o f f ice, i.e., aH = move(rob1 , o f f ice). of rob1 and tb1 , e.g., test(robotLz , loc(ob jectLz ), placezL ). For this transition, element rob1 of sort robot, and elements o f f ice and kitchen of sort place, are relevant. The axioms of DLR (T ) are those of DLR restricted to the sig- nature of DLR (T ). For instance, these axioms include: have a high probability associated with them after invoking the policy several times. The latter case is interpreted as the move(rob1 , c j ) causes loc(rob1 ) = {C : range(loc(rob1 ),C)} failure to execute the coarse-resolution action. grasp(rob1 ,tb1 ) causes in hand(rob1 ,tb1 ) = {true, f alse} The observations and action outcomes obtained by exe- test(rob1 , loc(tb1 ), c j ) causes dir obs(rob1 , loc(tb1 ), c j ) cuting the POMDP policy correspond to observations and = {true, f alse} if loc(tb1 ) = c j fluents in DLR (T ). This information, in turn, revises H , to be used for subsequent reasoning by the logician. If τH impossible move(rob1 , c j ) if loc(rob1 ) = ci , ¬next to(c j , ci ) is constructed correctly, and the statistics collected during where range(loc(rob1 ),C) may hold for values in {ci , c j , ck } the initial training phase correctly model the domain dy- within the range of the robot’s current location (ci ), and namics, following an optimal policy produced by an exact z are elements of placeL . The states of τLR (T ) thus include POMDP solver is most likely (among all possible policies) atoms of the form loc(rob1 ) = ci and loc(tb1 ) = c j , where to take the robot to the desired goal state [Littman, 1996; z ci and c j are values in placeL , in hand(rob1 ,tb1 ), direct ob- Sondik, 1971]. Since we use an approximate solver for com- servations of these atoms, and statics such as next to(ci , c j ). putational efficiency, we obtain a bound on the regret (i.e., Specific actions include move(rob1 , ci ), grasp(rob1 ,tb1 ), loss in value) due to the computed policy [Ong et al., 2010]. putdown(rob1 ,tb1 ) and test actions. Due to space constraints, we provide (below) a simplistic ex- ample of constructing a POMDP. POMDP construction: The statistician uses system descrip- tion DLR (T ), and the learned probabilities, to construct a Example 4. [POMDP construction] POMDP for the probabilistic implementation of a in state H Consider rob1 in the o f f ice and the implementation of aH = σ1 of τH . It may be possible to use other (more computation- move(rob1 , kitchen), with one cell in each room. Due to ally efficient) probabilistic models for implementing specific zooming, the robot only reasons about its location. Assume actions. Also, it may be possible to use specific (computation- that a move from a cell to a neighboring cell succeeds with ally efficient) heuristic or probabilistic algorithms for specific probability 0.85—otherwise, the robot remains where it is. tasks such as path planning. However, POMDPs provide (a) Assume that all non-terminal actions have unit cost. Termi- principled and quantifiable trade-off between accuracy and nating the POMDP policy after reaching cell 1 (in kitchen) computational efficiency in the presence of uncertainty; and receives a large positive reward (100), whereas termination (b) near-optimal solution if the POMDP is modeled accu- while in cell 0 (in o f f ice) receives a large negative reward rately. Also, our architecture only constructs a POMDP for (−100). Elements of the POMDP are described (below) in a small (relevant) part of the domain, significantly reducing the format used by the POMDP solver [Ong et al., 2010]. the computational complexity of solving the POMDP. Fur- The states correspond to possible robot locations, and absb thermore, many POMDPs for any given domain can be pre- is a terminal state. The actions correspond to the robot mov- computed, solved and used as needed. ing to specific cells, testing its cell location, or terminating L L L L A POMDP is described by a tuple hS , A , Z , T , O , R i L L policy execution (transitioning to absb). The robot observes for a specific goal state. Elements of this tuple correspond its cell location, or receives no observation. Knowledge- to the set of states, set of actions, set of values of observable producing actions do not cause a state transition, and actions fluents, the state transition function, the observation function, that change the state do not provide observations. Our archi- and the reward specification, with the last three elements be- tecture constructs such data structures for complex domains ing based on the statistics acquired during randomization— by zooming to the relevant part of the system description (as for details about POMDPs and their use in AI and robotics, described earlier). Also, for each state si and action a j , the [ see Littman, 1996; Zhang et al., 2015 . ] corresponding ASP program Π(DLR (T ), si , a j ) is solved to The POMDP formulation considers states to be partially (a) identify inconsistencies and eliminate impossible states, observable, and reasons with probability distributions over e.g., the robot and an object cannot be in different locations L the states, called belief states. Functions T and O de- L when the robot is holding the object; and (b) identify possi- scribe a probabilistic transition diagram over belief states. ble state transitions, eliminating impossible transitions in the The POMDP formulation also implicitly includes a history construction of the transition function. of observations and actions—the current belief state is as- discount: 0.99 sumed to be the result of all information obtained in so far. states: robot-0 robot-1 absb The POMDP tuple is used to compute a policy π : bt 7→ at+1 L , actions: move-0 move-1 test-0 test--1 done which maps belief states to actions, using an algorithm that observations: rob-found rob-not-found none maximizes the reward over a planning horizon. The policy is then used to choose an action in the current belief state, revis- % Transition function format ing the belief state through Bayesian updates after executing % T : action : S x S’ -> [0, 1] the action at+1 and receiving an observation ot+1 . The be- T: move-0 lief update continues until policy execution is terminated. In 1 0 0 our case, a terminal action is executed when it has a higher 0.85 0.15 0 (expected cumulative) utility than continuing to execute non- 0 0 1 terminal actions. This action choice happens when the belief in a specific state is high (e.g., ≥ 0.8), or none of the states T: move-1 0.15 0.85 0 Algorithm 1: Control loop 0 1 0 Input: coarse-resolution system description DH and 0 0 1 history H ; randomized fine-resolution system description DLR ; coarse-resolution goal. T: test-robot-0 Output: robot is in a state satisfying the goal; reports identity failure if this is impossible. T: test-robot-1 1 while goal is not achieved do identity 2 Logician uses τH and H to find a plan of abstract actions, aH H 1 , . . . , an to achieve the goal. T: done 3 if no plan exists then uniform 4 return failure 5 end % Observation function format 6 i := 1, continue1 := true % O : action : s_i : z_i -> [0, 1] (or) 7 while continue1 do % : S x Z -> [0, 1] 8 Check pre-requisites of aH i . O: move-0 : * : none 1 9 if pre-requisites not satisfied then O: move-1 : * : none 1 10 continue1 := false 11 else O: test-robot-0 12 Statistician zooms to the relevant part of τLR 0.95 0.05 0 for executing aH i and constructs a POMDP. 0.05 0.95 0 13 Statistician solves POMDP to compute an 0 0 1 action policy to implement aH i . 14 continue2 := true O: test-robot-1 15 while continue2 do 0.05 0.95 0 16 Statistician invokes POMDP policy to 0.95 0.05 0 select and execute an action, obtain 0 0 1 observation, and update belief state. 17 if terminal action invoked then O: done : * : none 1 18 Statistician communicates success or % Reward function format failure of aH i to the logician, to be % R : action : s_i : s_i’ : real value recorded in H . R: * : * : * : -1 19 continue2 = false R: done : robot-0 : * : -100 20 i := i+1 R: done : robot-1 : * : 100 21 continue1 := (i < n + 1) 22 end 23 end 5 Reasoning System 24 end Algorithm 1 describes the reasoning loop. For any given goal, 25 end the logician reasons with the system description DH and a 26 end recorded history H with initial state defaults, accounting for any discrepancies between observations and predictions to compute a plan of abstract actions. For an abstract action fice domain, indicate that our architecture provides signifi- aH , the statistician zooms to the relevant part of the random- cantly higher efficiency and accuracy in comparison with us- ized refinement of DH . The corresponding system description ing just POMDPs with hierarchical decompositions, similar DLR (T ) and probabilities are used to construct and solve a to our prior work [Sridharan et al., 2015]. These results are POMDP. The POMDP policy is invoked repeatedly (until ter- not described here because the focus is on describing the steps mination) to execute a sequence of concrete actions. The cor- in the design process, and due to space limitations. responding outcomes are reported to the logician to be used for subsequent reasoning. Correctness of this control loop is ensured by (1) applying the algorithm that reduces coarse- 6 Conclusions resolution planning and diagnostics to computing answer sets This paper described the systematic design of robots capa- of the corresponding program; (2) using the refinement and ble of representing and reasoning with logic-based and prob- zoom operations as described above; and (3) using POMDPs abilistic descriptions of domain knowledge and uncertainty. to probabilistically execute an action sequence for each ab- Our architecture uses tightly-coupled transition diagrams of stract action in the logician’s plan. the domain at two levels of granularity, with a fine-resolution Experimental trials, both in simulation and on physical diagram being a refinement of a coarse-resolution diagram. robots assisting humans in finding and moving objects an of- For any given goal, non-monotonic logical reasoning at the coarse-resolution plans a sequence of abstract actions. Each [Kaelbling and Lozano-Perez, 2013] Leslie Kaelbling and Tomas abstract action is implemented probabilistically as a sequence Lozano-Perez. Integrated Task and Motion Planning in Belief of concrete actions by zooming to the relevant part of the fine- Space. International Journal of Robotics Research, 32(9-10), resolution description, constructing and solving a a POMDP, 2013. and invoking a policy until termination. The corresponding [Lee and Wang, 2015] Joohyung Lee and Yi Wang. A Probabilistic outcomes revise the coarse-resolution history for subsequent Extension of the Stable Model Semantics. In AAAI Spring Sym- reasoning. The design steps are illustrated using examples, posium on Logical Formalizations of Commonsense Reasoning, which indicate that the architecture supports reasoning at the Stanford, USA, March 2015. sensorimotor level and the cognitive level with violation of [Lee et al., 2013] Joohyun Lee, Vladimir Lifschitz, and Fangkai defaults, and unreliable observations and actions. Yang. Action Language BC: Preliminary Report. In Interna- tional Joint Conference on Artificial Intelligence (IJCAI), Bei- Acknowledgements jing, China, August 3-9, 2013. The authors thank Jeremy Wyatt and Shiqi Zhang for discus- [Leone et al., 2006] N. Leone, G. Pfeifer, W. Faber, T. Eiter, sions related to the architecture described in this paper. The G. Gottlob, S. Perri, and F. Scarcello. The DLV System for Knowledge Representation and Reasoning. ACM Transactions first author was supported in part by the US Office of Naval on Computational Logic, 7(3):499–562, 2006. Research Science of Autonomy award N00014-13-1-0766. [Littman, 1996] Michael Littman. Algorithms for Sequential De- cision Making. PhD thesis, Brown University, Department of References Computer Science, Providence, USA, March 1996. [Bai et al., 2014] Haoyu Bai, David Hsu, and Wee Sun Lee. In- [Milch et al., 2006] Brian Milch, Bhaskara Marthi, Stuart Russell, tegrated Perception and Planning in the Continuous Space: A POMDP Approach. International Journal of Robotics Research, David Sontag, Daniel L. Ong, and Andrey Kolobov. BLOG: 33(8), 2014. Probabilistic Models with Unknown Objects. In Statistical Re- lational Learning. MIT Press, 2006. [Balai et al., 2013] Evgenii Balai, Michael Gelfond, and Yuanlin Zhang. Towards Answer Set Programming with Sorts. In Inter- [Ong et al., 2010] Sylvie C. Ong, Shao Wei Png, David Hsu, and national Conference on Logic Programming and Nonmonotonic Wee Sun Lee. Planning under Uncertainty for Robotic Tasks with Reasoning, Corunna, Spain, 2013. Mixed Observability. IJRR, 29(8):1053–1068, July 2010. [Balduccini and Gelfond, 2003] Marcello Balduccini and Michael [Richardson and Domingos, 2006] Matthew Richardson and Pedro Gelfond. Logic Programs with Consistency-Restoring Rules. In Domingos. Markov Logic Networks. Machine learning, 62(1), AAAI Spring Symposium on Logical Formalization of Common- 2006. sense Reasoning, pages 9–18, 2003. [Saribatur et al., 2014] Zeynep G. Saribatur, Esra Erdem, and [Balduccini et al., 2014] Marcello Balduccini, William C. Regli, Volkan Patoglu. Cognitive Factories with Multiple Teams of and Duc N. Nguyen. An ASP-Based Architecture for Au- Heterogeneous Robots: Hybrid Reasoning for Optimal Feasible tonomous UAVs in Dynamic Environments: Progress Report. In Global Plans. In International Conference on Intelligent Robots International Workshop on Non-Monotonic Reasoning (NMR), and Systems (IROS), Chicago, USA, 2014. Vienna, Austria, July 17-19, 2014. [Sondik, 1971] Edward J. Sondik. The Optimal Control of Partially [Baral et al., 2009] Chitta Baral, Michael Gelfond, and Nelson Observable Markov Processes. PhD thesis, Stanford University, Rushton. Probabilistic Reasoning with Answer Sets. Theory and 1971. Practice of Logic Programming, 9(1):57–144, January 2009. [Sridharan et al., 2015] Mohan Sridharan, Michael Gelfond, Shiqi [Chen et al., 2012] Xiaoping Chen, Jiongkun Xie, Jianmin Ji, and Zhang, and Jeremy Wyatt. A Refinement-Based Architecture for Zhiqiang Sui. Toward Open Knowledge Enabling for Human- Knowledge Representation and Reasoning in Robotics. Techni- Robot Interaction. Human-Robot Interaction, 1(2):100–117, cal report, Unrefereed CoRR abstract: http://arxiv.org/ 2012. abs/1508.03891, August 2015. [Gelfond and Inclezan, 2013] Michael Gelfond and Daniela In- [Sridharan et al., 2016] Mohan Sridharan, Prashanth Devarakonda, clezan. Some Properties of System Descriptions of ALd . Journal and Rashmica Gupta. Discovering Domain Axioms Using Rela- of Applied Non-Classical Logics, Special Issue on Equilibrium tional Reinforcement Learning and Declarative Programming. In Logic and Answer Set Programming, 23(1-2):105–120, 2013. ICAPS Workshop on Planning and Robotics (PlanRob), London, [Gelfond and Kahl, 2014] Michael Gelfond and Yulia Kahl. UK, June 13-14, 2016. Knowledge Representation, Reasoning and the Design of [Zhang and Stone, 2015] Shiqi Zhang and Peter Stone. CORPP: Intelligent Agents. Cambridge University Press, 2014. Commonsense Reasoning and Probabilistic Planning, as Applied [Hanheide et al., 2015] Marc Hanheide, Moritz Gobelbecker, Gra- to Dialog with a Mobile Robot. In AAAI Conference on Artificial ham Horn, Andrzej Pronobis, Kristoffer Sjoo, Patric Jensfelt, Intelligence (AAAI), pages 1394–1400, Austin, USA, 2015. Charles Gretton, Richard Dearden, Miroslav Janicek, Hendrik [Zhang et al., 2014] Shiqi Zhang, Mohan Sridharan, Michael Gel- Zender, Geert-Jan Kruijff, Nick Hawes, and Jeremy Wyatt. Robot fond, and Jeremy Wyatt. Towards An Architecture for Knowl- Task Planning and Explanation in Open and Uncertain Worlds. edge Representation and Reasoning in Robotics. In International Artificial Intelligence, 2015. Conference on Social Robotics (ICSR), Sydney, Australia, 2014. [Hawes et al., 2010] Nick Hawes, Jeremy Wyatt, Mohan Sridharan, [Zhang et al., 2015] Shiqi Zhang, Mohan Sridharan, and Jeremy Henrik Jacobsson, Richard Dearden, Aaron Sloman, and Geert- Wyatt. Mixed Logical Inference and Probabilistic Planning for Jan Kruijff. Architecture and Representations. In Cognitive Sys- Robots in Unreliable Worlds. IEEE Transactions on Robotics, tems, volume 8 of Cognitive Systems Monographs, pages 51–93. 31(3):699–713, 2015. Springer Berlin Heidelberg, April 2010.