<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Classical Planning in Deep Latent Space: From Unlabeled Images to PDDL (and back)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Masataro Asai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alex Fukunaga</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>guicho2.71828 gmail.com Graduate School of Arts and Sciences University of Tokyo</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Current domain-independent, classical planners require symbolic models of the problem domain and instance as input, resulting in a knowledge acquisition bottleneck. Meanwhile, although recent work in deep learning has achieved impressive results in many elds, the knowledge is encoded in a subsymbolic representation which cannot be directly used by symbolic systems such as planners. We propose LatPlan, an integrated architecture combining deep learning and a classical planner. Given a set of unlabeled training image pairs showing allowed actions in the problem domain, and a pair of images representing the start and goal states, LatPlan uses a Variational Autoencoder to generate a discrete latent vector from the images, based on which a PDDL model can be constructed and then solved by an o -the-shelf planner. We evaluate LatPlan using image-based versions of 3 planning domains: 8-puzzle, LightsOut, and Towers of Hanoi.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Recent advances in domain-independent planning have greatly enhanced their
capabilities. However, planning problems need to be provided to the planner in
a structured, symbolic representation such as PDDL [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], and in general, such
symbolic models need to be provided by a human, either directly in PDDL, or via
a compiler which transforms some other symbolic problem representation into
PDDL. This results in the knowledge-acquisition bottleneck, where the modeling
step is sometimes the bottleneck in the problem solving cycle. In addition, the
requirement for symbolic input poses a signi cant obstacle to applying planning
in new, unforeseen situations where no human is available to create such a model,
e.g., autonomous spacecraft exploration. This rst requires generating symbols
from raw sensor input, i.e., the symbol grounding problem [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ].
      </p>
      <p>
        Recently, signi cant advances have been made in neural network (NN)
approaches for cognitive tasks including image classi cation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], object recognition
[
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], speech recognition [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], machine translation as well as NN-based
problemsolving systems [
        <xref ref-type="bibr" rid="ref10 ref23">23, 10</xref>
        ]. However, the current state-of-the-art in pure NN-based
systems do not yet provide guarantees provided by symbolic planning systems,
such as deterministic completeness and solution optimality.
      </p>
      <p>Copyright © 2017 for this paper by its authors. Copying permitted for private and academic purposes.</p>
      <p>Initial state
image
?</p>
      <p>Goal state image
(black/white)</p>
      <p>Original</p>
      <p>Mandrill image</p>
      <p>Using a NN-based perceptual system to automatically provide input
models for domain-independent planners could greatly expand the applicability of
planning technology and o er the bene ts of both paradigms. We consider the
problem of robustly, automatically bridging the gap between such symbolic and
subsymbolic representations.</p>
      <p>Fig. 1 (left) shows a scrambled, 3x3 tiled version of the the photograph on
the right, i.e., an image-based instance of the 8-puzzle. We seek a
domainindependent system which, given only a set of unlabeled images showing the
valid moves for this image-based puzzle, nds an optimal solution to the
puzzle. Although the 8-puzzle is trivial for symbolic planners, solving this
imagebased problem with a domain-independent system which has no prior
assumptions/knowledge (e.g., \sliding objects", \tile arrangement", \a grid-like
structure") is nontrivial. The only assumption allowed about the nature of the task
is that it can be modeled and solved as a classical planning problem.</p>
      <p>We propose Latent-space Planner (LatPlan), a hybrid architecture which
uses NN-based image processing to completely automatically generate a
propositional, symbolic problem representation which can be used as the input for a
classical planner. LatPlan consists of 3 components: (1) a NN-based State
Autoencoder (SAE), which provides a bidirectional mapping between the raw input
of the world states and its symbolic/categorical representation, (2) an action
model generator which generates a PDDL model using the symbolic
representation acquired by the SAE, and (3) a symbolic planner. Given only a set of
unlabeled images from the domain as input, we train (unsupervised) the SAE
and use it to generate D, a PDDL representation of the image-based domain.</p>
      <p>Then, given a planning problem instance as a pair of initial and goal images such
as Fig. 1, LatPlan uses the SAE to map the problem to a symbolic planning
instance in D, and uses the planner to solve the problem.
2</p>
    </sec>
    <sec id="sec-2">
      <title>LatPlan: System Architecture</title>
      <p>This section describes the LatPlan architecture and the current implementation,
LatPlan . LatPlan works in 3 phases. In Phase 1 (symbol-grounding), a State
AutoEncoder providing a bidirectional mapping between raw data (e.g., images)
and symbols is learned (unsupervised) from a set of unlabeled images of
representative states. In Phase 2 (action model generation), the operators available
in the domain is generated from a set of pairs of unlabeled images, and a PDDL
domain model is generated. In Phase 3 (planning), a planning problem instance
The latent layer
converges to the categorical distrib.</p>
      <p>The output converges
to the input
is input as a pair of images (i; g) where i shows an initial state and g shows a goal
state. These are converted to symbolic form using the SAE, and the problem is
solved by the symbolic planner. For example, an 8-puzzle problem instance in our
system consists of an image of the start (scrambled) con guration of the puzzle
(i), and an image of the solved state (g). Finally, the symbolic, latent-space plan
is converted to a human-comprehensible visualization of the plan.</p>
      <p>Symbol Grounding with a State Autoencoder The State Autoencoder
(SAE) provides a bidirectional mapping between images and a symbolic
representation.</p>
      <p>
        First, note that a direct 1-to-1 mapping between images and discrete objects
can be trivially obtained simply by using the array of discretized pixel values as
a \symbol". However, such a trivial SAE lacks the crucial properties of
generalization { ability to encode/decode unforeseen world states to symbols { and
robustness { two similar images that represent \the same world state" should
map to the same symbolic representation. Thus, we need a mapping where the
symbolic representation captures the \essence" of the image, not merely the raw
pixel vector. The main technical contribution of this paper is the proposal of a
SAE which is implemented as a Variational Autoencoder [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] with a
GumbelSoftmax (GS) activation function [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Gumbel-Softmax (GS) activation is a recently proposed reparametrization
trick [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for categorical distribution. Using GS in the network in place of
standard activation functions (Sigmoid, Softmax, ReLU) forces the activation to
converge to a discrete one-hot vector. GS has a \temperature" parameter
which controls the magnitude of approximation. is annealed by a schedule
max(0:1; exp( rt)) where t is the current training epoch and r is an
annealing ratio [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. We chose r so that = 0:1 when the training nishes.
      </p>
      <p>In our implementation, the SAE is comprised of multilayer perceptrons
combined with Dropouts and Batch Normalization in both the encoder and the
decoder networks, with a GS layer in between. The input to the GS layer is the
at, last layer of the encoder network. The output is an (N; M ) matrix where
N is the number of categorical variables and M is the number of categories.</p>
      <p>Our key observation is that these categorical variables can be used directly as
propositional symbols by a symbolic reasoning system, i.e., this provides a
solution to the symbol grounding problem in our architecture. We specify M = 2,
e ectively obtaining N propositional state variables. It is possible to specify</p>
      <p>Encode</p>
      <p>Input 2:
Initial state
&amp; goal state
image EncodeInitial State Action definitions Intermesdtaiatetes</p>
      <p>in PDDL/SAS
ClainsdsDieocpamelanPidlnae-nnnter AAAPcccltttaiiiooonnnn132 PSDimDuLlPatlaonr</p>
      <p>Input 1: (a) Training images for training
the State AutoEncoder and (b) image
pairs representing valid actions</p>
      <p>SuSbysmymbobloiclic Decode
Goal State</p>
      <p>
        Solution Plan as images
di erent M for each variable and represent the world using multi-valued
representation as in SAS+ [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] but we always use M = 2 for simplicity.
      </p>
      <p>The trained SAE provides bidirectional mapping between the raw inputs
(subsymbolic representation) to and from their symbolic representations:
{ b = Encode(r) maps an image r to a boolean vector b.</p>
      <p>{ r~ = Decode(b) maps a boolean vector b to an image r~.</p>
      <p>Encode(r) maps raw input r to a symbolic representation by feeding the raw
input to the encoder network, extract the activation in the GS layer, and take
the rst row in the N 2 matrix, resulting in a binary vector of length N .
Similarly, Decode(b) maps a binary vector b back to an image by concatenating
b and its complement b to obtain a N 2 matrix and feeding it to the decoder.</p>
      <p>
        It is not su cient to use traditional activation functions such as softmax and
round the activation values to obtain discrete 0/1 values because we need to map
the symbolic plan back to images. We need a decoding network trained for 0/1
values approximated by a smooth function, e.g., GS or similar approach such
as [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. A rounding-based scheme would be unable to restore the images from
discrete values because the decoder is trained using continuous values. Also,
the rounding operation cannot be part of a backpropagated network because
rounding is non-di erentiable.
      </p>
      <p>An SAE trained on a small fraction of the possible states successfully
generalizes so that it can Encode and Decode every possible state in that domain.
In all our experiments below, we train the SAE using randomly selected images
from the domain. For example, on the 8-puzzle, the SAE trained on 12000
randomly generated con gurations out of 362880 possible con gurations is used by
the domain model generator to Encode every 8-puzzle state.</p>
      <p>Domain Model Generation The model generator takes as input a trained
SAE, and a set R contains pairs of raw images. In each image pair (prei; posti) 2
R, prei and posti are images representing the state of the world before and after
some action ai is executed, respectively. In each ground action image pair, the
\action" is implied by the di erence between prei and posti. The output of
the model generator is a PDDL domain le for a grounded unit-cost STRIPS
planning problem. For each (prei; posti) 2 R we apply the learned SAE to prei
and posti to obtain (Encode(prei); Encode(posti)), the symbolic representations
(latent space vectors) of the state before and after action ai is executed. This
results in a set of symbolic ground action instances A.</p>
      <p>
        Ideally, a model generation component would induce a complete action model
from a limited set of symbolic ground action instances. However, action model
learning from a limited set of action instances is a nontrivial area of active
research [
        <xref ref-type="bibr" rid="ref11 ref18 ref24 ref32 ref6 ref7">7, 11, 18, 24, 32, 6</xref>
        ]. Since the focus of this paper is on the overall LatPlan
architecture and the SAE, we leave model induction for future work. Instead,
the current implementation LatPlan uses a trivial, baseline strategy which
generates a model based on all ground actions, which are supposed to be easily
replaced by existing o -the-shelf action model learner. In this baseline method,
R contains image pairs representing all ground actions that are possible in this
domain, so A = fEncode(r)jr 2 Rg contains all symbolic ground actions possible
in the domain. In Sec. 5, we further discuss the implication and the impact of
this model. In the experiments (Sec. 3), we generate image pairs for all ground
actions using an external image generator. It is important to note that while R
contains all possible actions, R is not used for training the SAE. As explained
before, the SAE is trained using at most 12000 images while the entire state
space is much larger.
      </p>
      <p>LatPlan compiles A directly into a PDDL model as follows. For each action
(Encode(prei); Encode(posti)) 2 A, each bit bj(1 j N ) in these boolean
vectors is mapped to propositions (bj-true) and (bj-false) when the encoded
value is 1 and 0 (resp.). Encode(prei) is directly used as the preconditions of
action ai. The add/delete e ects of action i are computed by taking the
bitwise di erence between Encode(prei) and Encode(posti). For example, when
bj changes from 1 to 0, it compiles into (and (bj-false) (not (bj-true))).
The initial and the goal states are similarly created by applying the SAE to the
initial and goal images.</p>
      <p>
        Planning with an O -the-Shelf Planner The PDDL instance generated
in the previous step can be solved by an o -the-shelf planner. LatPlan uses the
Fast Downward planner [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. However, on the models generated by LatPlan ,
the invariant detection routines in the Fast Downward PDDL-SAS converter
became a bottleneck, so we wrote a trivial, replacement PDDL-SAS converter
without the invariant detection.
      </p>
      <p>LatPlan inherits all of the search-related properties of the planner which is
used. For example, if the planner is complete and optimal, LatPlan will nd an
optimal plan for the given problem (if one exists), with respect to the portion
of the state-space graph captured by the acquired model. Domain-independent
heuristics developed in the planning literature are designed to exploit structure
in the domain model. Although the structure in models acquired by LatPlan
may not directly correspond to those in hand-coded models, intuitively, there
should be some exploitable structure. The search results in Sec. 3 suggest that
the domain-independent heuristics can reduce the search e ort.</p>
      <p>Visualizing/Executing the Plans Since the actions comprising the plan
are SAE-generated latent bit vectors, the \meaning" of each symbol (and thus
the plan) is not necessarily clear to a human observer. However, we can obtain
a step-by-step visualization of the world (images) as the plan is executed (e.g.
Fig. 4) by starting with the latent state representation of the initial state,
applying (simulating) actions step-by-step (according to the PDDL model acquired
above) and Decode'ing the latent bit vectors for each intermediate state to
images using the SAE. In this paper, a \mental image" of the solution (i.e., the
image sequence visualization) is su cient. In a less simpli ed setting, mapping
the actions found by LatPlan (transitions between latent bit vector pairs) to
lower-level actuation would be necessary (future work).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Evaluation</title>
      <p>All of the SAE networks used in the evaluation have the same network topology
except the input layer which should t the size of the input images. The
network consists of the following layers: [Input, GaussianNoise(0.1), fc(4000), relu,
bn, dropout(0.4), fc(4000), relu, bn, dropout(0.4), fc(49x2), GumbelSoftmax,
dropout(0.4), fc(4000), relu, bn, dropout(0.4), fc(4000), relu, bn, dropout(0.4),
fc(input), sigmoid]. Here, fc = fully connected layer, bn = Batch
Normalization, and tensors are reshaped accordingly. The last layers can be replaced with
[fc(input 2), GumbelSoftmax, TakeFirstRow] for better reconstruction when
we can assume that the input image is binarized. The network is trained using
Adam optimizer (lr:0.001) for 1000 epochs.</p>
      <p>The latent layer has 49 bits, which su ciently covers the total number of
states in any of the problems that are used in the following experiments. This
could be reduced for each domain (made more compact) with further engineering.</p>
      <p>
        MNIST 8-puzzle This is an image-based version of the 8-puzzle, where tiles
contain hand-written digits (0-9) from the MNIST database [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Each digit is
shrunk to 14x14 pixels, so each state of the puzzle is a 42x42 image. Valid moves
in this domain swap the \0" tile with a neighboring tile, i.e., the \0" serves as
the \blank" tile in the classic 8-puzzle. The entire state space consists of 362880
states (9!). Note that the same image is used for each digit in all states, e.g., the
\1" digit is the same image in all states.
      </p>
      <p>Out of 362880 images, 12000 randomly selected images are used for training
the SAE. This set is further divided into a training set (11000) and a validation
set (1000). Training takes 40 minutes/1000 epochs on a NVIDIA GTX-1070.</p>
      <p>Scrambled Photograph 8-puzzle The above MNIST 8-puzzle described
above consists of images where each digit is cleanly separated from the black
region. To show that LatPlan does not rely on cleanly separated objects, we
solve 8-puzzles generated by cutting and scrambling real photographs (similar to
0-tile corresponds to the blank
tile in standard 8-puzzle</p>
      <p>Original
Mandrill
image:</p>
      <p>Right-eye tile corresponds
to the blank tile in
standard 8-puzzle
sliding tile puzzle toys sold in stores). We used the \Mandrill" image, a standard
benchmark in the image processing literature. The image was rst converted to
greyscale and then rounded to black/white (0/1) values. The same number of
images as in the MNIST-8puzzle experiments are used.</p>
      <p>Towers of Hanoi (ToH) Disks of various sizes must be moved from one
peg to another, with the constraint that a larger disk can never be placed on top
of a smaller disk. Due to the smaller number of states (3d states for d disks), we
used images of all states as the set of images for training SAE. This is further
divided into the training set (90%) and the validation set (10%), and we veri ed
that the network has learned a generalized model without over tting.</p>
      <p>3-disk ToH is solved successfully and optimally using the default
hyperparameters (Fig. 5, top). However, on 4-disks, the SAE trained with the default
hyperparameters (Fig. 5, middle) is confused, resulting in a awed model which
causes the planner to choose suboptimal moves (dashed box). Sometimes, the
size/existence of disks is confused (red box). Tuning the hyperparameters to
reduce the SAE loss corrects this problem. After increasing the training epochs
(10000) and tuning the network shape (fc(6000), N = 29), the SAE generated a
correct model, resulting in the optimal 15-step plan (Fig. 5, bottom).
Fig. 6. Output of solving 4x4 LightsOut (left) and its binarized result (right). Although
the goal state shows two blurred switches, they have low values (around 0.3) and
disappear after rounding.</p>
      <p>LightsOut A video game where a grid of lights is in some on/o con
guration (+: On), and pressing a light toggles its state (On/O ) as well as the
state of all of its neighbors. The goal is all lights O . Unlike the 8-puzzle where
each move a ects only two adjacent tiles, a single operator in 4x4 LightsOut can
simultaneously ip 5/16 locations. Also, unlike 8-puzzle and ToH, the
LightsOut game allows some \objects" (lights) to disappear. This demonstrates that
LatPlan is not limited to domains with highly local e ects and static objects.</p>
      <p>Twisted LightsOut In all of the above domains, the \objects" correspond
to rectangles. To show that LatPlan does not rely on rectangular regions, we
demonstrate its result on \Twisted LightsOut", a distorted version of the game
where the original LightsOut image is twisted around the center. Unlike previous
domains, the input images are not binarized.</p>
      <p>
        Robustness to Noisy Input We show the robustness of the system against
the input noise. We corrupted the initial/goal state inputs by adding Gaussian
or salt noise, as shown in Fig. 8. The system is robust enough to successfully
solve the problem, because our SAE is a Denoising Autoencoder [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] which has
an internal GaussianNoise layer which adds a Gaussian noise to the inputs (only
during training) and learn to reconstruct the original image from a corrupted
version of the image.
Fig. 8. SAE robustness vs noise: Corrupted initial state image r and its reconstruction
Decode(Encode(r)) by SAE on MNIST 8-puzzle and Twisted LightsOut. Images are
corrupted by Gaussian noise of up to 0:3 for both problems, and by salt noise up to
p = 0:06 for Twisted LightsOut. LatPlan successfully solved the problems. The SAE
maps the noisy image to the correct symbolic vector b = Encode(r), conduct planning,
then map b back to the de-noised image Decode(b).
      </p>
      <p>
        Are Domain-Independent Heuristics E ective in Latent Space? We
compare the numbers of nodes expanded by a search using a greedy merging PDB
[
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] and blind heuristics (i.e., breadth- rst search) in Fast Downward:
{ MNIST 8-puzzle (6 instances, mean(StdDev)): Blind 176658(25226), PDB
77811(32978)
{ Mandrill 8-puzzle (1 instance with 31-step optimal solution, corresponding
to the 8-puzzle instance [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]): Blind 335378, PDB 88851
{ ToH (4 disks, 1 instance): Blind 55, PDB 17,
{ 4x4 LightsOut (1 instance): Blind 952, PDB 27,
{ 3x3 Twisted LightsOut (1 instance): Blind 522, PDB 214
      </p>
      <p>
        The domain-independent PDB heuristic signi cantly reduced node
expansions. Search times (&lt; 3 seconds for all instances) were also faster for all
instances with the PDB. Although total runtimes including heuristic initialization
is slightly slower than blind search, in domains where goal states and operators
are the same for all instances (e.g., 8-puzzle) PDBs can be reused [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and PDB
generation time can be amortized across many instances. Although these results
show that existing heuristics for classical planning are able to reduce search e ort
compared to blind search, much more work is required in order to understand
how the features in latent space interact with existing heuristics.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Related Work</title>
      <p>
        [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] propose a method for generating PDDL from a low-level, sensor actuator
space of an agent characterized as a semi-MDP. The inputs to their system
are 33 variables representing accurate structured input (e.g., x/y distances) or
categorical states (the on/o state of a button etc.) while LatPlan takes noisy
unstructured images (e.g., for 8-puzzle, 42x42=1764-dimensional arrays).
      </p>
      <p>
        Compared to learning from observation (LfO) in the robotics literature [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
(1) LatPlan is trained based on image pairs showing individual actions, not
plan executions (sequence of actions); (2) LatPlan focuses on PDDL for
highlevel (puzzle-like) tasks, not on motion planning tasks. This signi cantly a ects
the data collection scheme: While LfO has action segmentation issue because it
does not know when an action starts/ends in the plan traces (e.g. video clip),
LatPlan does not, because it assumes that a robot can explore the world by itself,
initiating/terminating its own action and taking pictures by a camera. The robot
can perform a random walk under physical constraints and supervision, which
ensure the legal moves (e.g., the physical tile in 8-puzzle). If we further assume
that it can \reset" the world (e.g., into a random con guration), then, the robot
could eventually obtain images of the entire state space.
      </p>
      <p>
        A closely related line of work in LfO is learning of board game play from
videos [
        <xref ref-type="bibr" rid="ref15 ref17 ref4">4, 15, 17</xref>
        ]. Unlike LatPlan, these works make relatively strong assumptions
about the environment, e.g., that there is a grid-like environment.
      </p>
      <p>
        There is a large body of previous work using neural networks to directly solve
combinatorial tasks, such as TSP [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] or Tower of Hanoi [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Although they use
NNs to solve search problems, they assume a fully symbolic representation of
the problem as input. Other line of hybrid systems embed NNs inside a search
algorithm to provide search control knowledge [
        <xref ref-type="bibr" rid="ref1 ref27 ref29">29, 1, 27</xref>
        ]. In contrast, we use a
NN-based SAE for symbol grounding, not for search control.
      </p>
      <p>
        Deep Reinforcement Learning (DRL) has solved complex image-based
problems [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. For unit-action-cost planning, LatPlan does not require a
reinforcement signal (reward function). Also, it can provide guarantees of completeness
and solution cost optimality.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and Conclusion</title>
      <p>
        We proposed LatPlan, an integrated architecture for planning which, given only
a set of unlabeled images and no prior knowledge, generates a classical planning
problem model, solves it with a symbolic planner, and presents the resulting plan
as a human-comprehensible sequence of images. We demonstrated its feasibility
using image-based versions of planning/state-space-search problems (8-puzzle,
Towers of Hanoi, Lights Out). The key technical contribution is the SAE, which
leverages the Gumbel-Softmax reparametrization technique [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and learns
(unsupervised) a bidirectional mapping between raw images and a propositional
representation usable by symbolic planners. Aside from the key assumptions about
the deterministic environment and the su cient training images, we avoid
assumptions about the input domain. Thus, we have shown that domains with
di erent characteristics can all be solved by the same system. In other words,
LatPlan is a domain-independent, image-based classical planner.
      </p>
      <p>
        To our knowledge, LatPlan is the rst completely automated system of the
kind. However, as a proof-of-concept, it has signi cant limitations to be
addressed in future work. In particular, the domain model generator in LatPlan
does not perform action model learning from a small set of sample actions
because the focus of this paper is not on action learning. Thus the current generator
requires the entire set of latent states, transitions and in turn images. While this
is obviously impractical, this is not a fundamental limitation of the LatPlan
architecture. The primitive generator is merely a placeholder for investigating the
overall feasibility of an SAE-based end-to-end planning system (our major
contribution) and is supposed to be easily replaced by the more sophisticated ones
[
        <xref ref-type="bibr" rid="ref18 ref24 ref32 ref7">7, 18, 24, 32</xref>
        ]. To our knowledge, all previous domain learning methods require
the structured (e.g., propositional) representations of states.
      </p>
      <p>A related topic is how to specify a partial goal speci cation for LatPlan as
in IPC domains (e.g. \having tiles 0,1,2 in the correct places is the goal" in a
8-puzzle), rather than assuming a single goal state, is an interesting future work.</p>
      <p>Finally, we do not claim that the speci c implementation of SAE in this paper
works robustly on all images. Making a robust autoencoder is not a problem
unique to LatPlan, but rather, a fundamental problem in deep learning. Out
contribution is the demonstration that it is possible to leverage some existing
deep learning techniques quite e ectively in an planning system, and future work
will continue leveraging further improvements in image processing techniques.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arfaee</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zilles</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holte</surname>
          </string-name>
          , R.C.
          <article-title>: Learning Heuristic Functions for Large State Spaces</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>175</volume>
          (
          <fpage>16</fpage>
          -
          <lpage>17</lpage>
          ),
          <year>2075</year>
          {
          <year>2098</year>
          (
          <year>2011</year>
          ), http://dx.doi.org/10.1016/j.artint.
          <year>2011</year>
          .
          <volume>08</volume>
          .001;http://dblp.uni-trier. de/rec/bib/journals/ai/ArfaeeZH11
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Argall</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chernova</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veloso</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Browning</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A Survey of Robot Learning from Demonstration</article-title>
          .
          <source>Robotics and Autonomous Systems</source>
          <volume>57</volume>
          (
          <issue>5</issue>
          ),
          <volume>469</volume>
          {
          <fpage>483</fpage>
          (
          <year>2009</year>
          ), http://dx.doi.org/10.1016/j.robot.
          <year>2008</year>
          .
          <volume>10</volume>
          .024
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Backstrom,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Nebel</surname>
          </string-name>
          , B.:
          <article-title>Complexity Results for SAS+ Planning</article-title>
          .
          <source>Computational Intelligence</source>
          <volume>11</volume>
          (
          <issue>4</issue>
          ),
          <volume>625</volume>
          {
          <fpage>655</fpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Barbu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narayanaswamy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siskind</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          :
          <article-title>Learning Physically-Instantiated Game Play through Visual Observation</article-title>
          . In: ICRA. pp.
          <year>1879</year>
          {
          <year>1886</year>
          (
          <year>2010</year>
          ), http: //dx.doi.org/10.1109/ROBOT.
          <year>2010</year>
          .5509925
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bieszczad</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuchar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Neurosolver Learning to Solve Towers of Hanoi Puzzles</article-title>
          .
          <source>In: IJCCI</source>
          . vol.
          <volume>3</volume>
          , pp.
          <volume>28</volume>
          {
          <issue>38</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Celorrio</surname>
            , S.J., de la Rosa,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borrajo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>A Review of Machine Learning for</article-title>
          <source>Automated Planning. Knowledge Eng. Review</source>
          <volume>27</volume>
          (
          <issue>4</issue>
          ),
          <volume>433</volume>
          {
          <fpage>467</fpage>
          (
          <year>2012</year>
          ), http://dx.doi.org/10.1017/S026988891200001X
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cresswell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCluskey</surname>
            ,
            <given-names>T.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>West</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          :
          <article-title>Acquiring planning domain models using LOCM</article-title>
          .
          <source>Knowledge Eng. Review</source>
          <volume>28</volume>
          (
          <issue>2</issue>
          ),
          <volume>195</volume>
          {
          <fpage>213</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fei-Fei</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <string-name>
            <surname>ImageNet: A LargeScale Hierarchical Image</surname>
          </string-name>
          <article-title>Database</article-title>
          . In: CVPR. pp.
          <volume>248</volume>
          {
          <fpage>255</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kingsbury</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>New Types of Deep Neural Network Learning for Speech Recognition and Related Applications: An Overview</article-title>
          . In: ICASSP. pp.
          <volume>8599</volume>
          {
          <fpage>8603</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wayne</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reynolds</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harley</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Danihelka</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>GrabskaBarwinska</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colmenarejo</surname>
            ,
            <given-names>S.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grefenstette</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramalho</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agapiou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>Hybrid Computing using a Neural Network with Dynamic External Memory</article-title>
          .
          <source>Nature</source>
          <volume>538</volume>
          (
          <issue>7626</issue>
          ),
          <volume>471</volume>
          {
          <fpage>476</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Gregory</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cresswell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Domain model acquisition in the presence of static relations in the LOP system</article-title>
          .
          <source>In: ICAPS</source>
          . pp.
          <volume>97</volume>
          {
          <issue>105</issue>
          (
          <year>2015</year>
          ), http://www.aaai.org/ ocs/index.php/ICAPS/ICAPS15/paper/view/10621
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Helmert</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The Fast Downward Planning System</article-title>
          .
          <source>J. Artif. Intell. Res.(JAIR) 26</source>
          ,
          <fpage>191</fpage>
          {
          <fpage>246</fpage>
          (
          <year>2006</year>
          ), http://www.aaai.org/Papers/JAIR/Vol26/JAIR-2606.pdf
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hop</surname>
            <given-names>eld</given-names>
          </string-name>
          , J.J.,
          <string-name>
            <surname>Tank</surname>
            ,
            <given-names>D.W.</given-names>
          </string-name>
          :
          <article-title>"Neural" Computation of Decisions in Optimization Problems</article-title>
          . Biological cybernetics
          <volume>52</volume>
          (
          <issue>3</issue>
          ),
          <volume>141</volume>
          {
          <fpage>152</fpage>
          (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Jang</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poole</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Categorical Reparameterization with Gumbel-Softmax</article-title>
          . In: ICLR (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Learning Games from Videos Guided by Descriptive Complexity</article-title>
          . In: AAAI (
          <year>2012</year>
          ), http://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/ 5091;http://dblp.uni-trier.de/rec/bib/conf/aaai/Kaiser12
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohamed</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rezende</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welling</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Semi-supervised learning with deep generative models</article-title>
          .
          <source>In: NIPS</source>
          . pp.
          <volume>3581</volume>
          {
          <issue>3589</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Kirk</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laird</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          :
          <article-title>Learning General and E cient Representations of Novel Games Through Interactive Instruction</article-title>
          .
          <source>Advances in Cognitive Systems</source>
          <volume>4</volume>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Konidaris</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaelbling</surname>
            ,
            <given-names>L.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lozano-Perez</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Constructing Symbolic Representations for High-Level Planning</article-title>
          . In: AAAI. pp.
          <year>1932</year>
          {
          <year>1938</year>
          (
          <year>2014</year>
          ), http: //www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8424
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Korf</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Felner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Disjoint Pattern Database Heuristics</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>134</volume>
          (
          <issue>1-2</issue>
          ),
          <volume>9</volume>
          {
          <fpage>22</fpage>
          (
          <year>2002</year>
          ), http://dx.doi.org/10.1016/S0004-
          <volume>3702</volume>
          (
          <issue>01</issue>
          )
          <fpage>00092</fpage>
          -
          <lpage>3</lpage>
          ;http://dblp.uni-trier.de/rec/bib/journals/ai/KorfF02
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ha</surname>
            <given-names>ner</given-names>
          </string-name>
          , P.:
          <article-title>Gradient-Based Learning Applied to Document Recognition</article-title>
          .
          <source>Proc. of the IEEE</source>
          <volume>86</volume>
          (
          <issue>11</issue>
          ),
          <volume>2278</volume>
          {
          <fpage>2324</fpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Maddison</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mnih</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teh</surname>
            ,
            <given-names>Y.W.:</given-names>
          </string-name>
          <article-title>The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables</article-title>
          . In: ICLR (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>McDermott</surname>
            ,
            <given-names>D.V.</given-names>
          </string-name>
          :
          <article-title>The 1998 AI Planning Systems Competition</article-title>
          .
          <source>AI Magazine</source>
          <volume>21</volume>
          (
          <issue>2</issue>
          ),
          <volume>35</volume>
          {
          <fpage>55</fpage>
          (
          <year>2000</year>
          ), http://www.aaai.org/ojs/index.php/aimagazine/ article/view/1506
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Mnih</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silver</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rusu</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veness</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bellemare</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedmiller</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fidjeland</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ostrovski</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , et al.:
          <article-title>Human-Level Control through Deep Reinforcement Learning</article-title>
          .
          <source>Nature</source>
          <volume>518</volume>
          (
          <issue>7540</issue>
          ),
          <volume>529</volume>
          {
          <fpage>533</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24. Mour~ao,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.S.</given-names>
            ,
            <surname>Petrick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.P.A.</given-names>
            ,
            <surname>Steedman</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Learning STRIPS Operators from Noisy and Incomplete Observations</article-title>
          . In: UAI. pp.
          <volume>614</volume>
          {
          <issue>623</issue>
          (
          <year>2012</year>
          ), https://dslpitt.org/uai/displayArticleDetails.jsp?
          <source>mmnu= 1&amp;smnu=2&amp;article_id=2322&amp;proceeding_id=28</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Reinefeld</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Complete Solution of the Eight-Puzzle and the Bene t of Node Ordering in IDA</article-title>
          . In: IJCAI. pp.
          <volume>248</volume>
          {
          <issue>253</issue>
          (
          <year>1993</year>
          ), http://ijcai.org/Proceedings/ 93-1/Papers/035.pdf
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <string-name>
            <surname>Faster</surname>
            <given-names>R-CNN</given-names>
          </string-name>
          :
          <article-title>Towards Real-time Object Detection with Region Proposal Networks</article-title>
          .
          <source>In: NIPS</source>
          . pp.
          <volume>91</volume>
          {
          <issue>99</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Satzger</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kramer</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Goal Distance Estimation for Automated Planning using Neural Networks and Support Vector Machines</article-title>
          .
          <source>Natural Computing</source>
          <volume>12</volume>
          (
          <issue>1</issue>
          ),
          <volume>87</volume>
          {
          <fpage>100</fpage>
          (
          <year>2013</year>
          ), http://dx.doi.org/10.1007/s11047-012-9332-y
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Sievers</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ortlieb</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Helmert</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>E cient Implementation of Pattern Database Heuristics for Classical Planning</article-title>
          . In: SOCS (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Silver</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maddison</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sifre</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , Van Den Driessche, G.,
          <string-name>
            <surname>Schrittwieser</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antonoglou</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panneershelvam</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lanctot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Mastering the Game of Go with Deep Neural Networks and Tree Search</article-title>
          .
          <source>Nature</source>
          <volume>529</volume>
          (
          <issue>7587</issue>
          ),
          <volume>484</volume>
          {
          <fpage>489</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Steels</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The Symbol Grounding Problem has been solved. So what's next</article-title>
          ? In: de Vega,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Glenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Graesser</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <source>Symbols and Embodiment</source>
          . Oxford University Press (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Vincent</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manzagol</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          :
          <article-title>Extracting and Composing Robust Features with Denoising Autoencoders</article-title>
          . In: ICML. pp.
          <volume>1096</volume>
          {
          <fpage>1103</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Learning Action Models from Plan Examples using Weighted MAX-SAT</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>171</volume>
          (
          <issue>2-3</issue>
          ),
          <volume>107</volume>
          {
          <fpage>143</fpage>
          (
          <year>2007</year>
          ), http://dx.doi.org/10.1016/j.artint.
          <year>2006</year>
          .
          <volume>11</volume>
          .005;http://dblp. uni-trier.de/rec/bib/journals/ai/YangWJ07
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>