<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Jozefowicz R, Zaremba W, Sutskever I. An empirical exploration of recurrent network architectures[C]//
International Conference on International Conference on Machine Learning. JMLR.org</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Learning Safety-Aware Policy with Imitation Learning for Context-Adaptive Navigation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bo Xiong</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fangshi Wang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chao Yu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fei Qiao</string-name>
          <email>qiaofei@tsinghua.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yi Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qi Wei</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xin-Jun Liu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electronic Engineering, Tsinghua University</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Mechanical Engineering, Tsinghua University</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Software Engineering, Beijing Jiaotong University</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>2342</volume>
      <fpage>2342</fpage>
      <lpage>2350</lpage>
      <abstract>
        <p>This paper presents an Imitation Learning (IL) based visual navigation system, which could guide the robots navigating from some start position to a goal location without any explicit map. We pay close attention to the safety issue due to partially-observability and data distribution mismatching-when the robot meets some incomplete or unfamiliar states, it probably performs an unsafe action, making it hard to work on lifelong robot navigation. In this paper, a sequenceto-sequence (Seq2seq) deep neural network is built to enhance the agent's context-awareness in partially-observable conditions and boost the model's adaptability to unseen scenarios. Additionally, we propose Uncertainty-Aware Imitation Learning (UAIL) by explicitly estimating model uncertainty and actively request experts for labeling samples according to the uncertainty with On-Policy IL. Simulations demonstrated that the combined method-Safety-Aware Imitation Learning (SAIL) in goal-driven visual navigation achieves 35.6% shorter expected moving steps and 22% fewer collisions compared with current counterparts. With the learned safer policy, SAIL had be successfully adapted to unseen environments with minimal navigation performance loss.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Considering a task of navigating from a current location to find a specific goal. Classical geometry-based methods,
such as Simultaneous Localization and Mapping (SLAM) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] can be divided into two stages: one
stage is building a 3D map using imagery feature matching and geometry constraints, the other stage is global
or local path planning. SLAM-based approaches require carefully designed image features and are hard to work
on texture-less environments. Recently, learning-based approaches have dominated robot learning including
manipulation [
        <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
        ], self-driving cars [
        <xref ref-type="bibr" rid="ref10 ref15 ref16">10, 15, 16</xref>
        ] and robot navigation [
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ]. Compared with traditional
methods, learning-based navigation could work in an end-to-end fashion without any explicit map-building. One
framework for doing this is Reinforcement Learning (RL) [
        <xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6, 7, 8, 9</xref>
        ]. A reward function is usually given in RL,
then agents learn a policy to maximize the cumulative reward by interacting with the environment. However,
designing such an appropriate reward function in real-world scenarios is too difficult for humans. Moreover, the
sparse reward of RL in goal-driven task could lead to poor convergence and computation efficiency.
      </p>
      <p>
        Alternatively, Imitation Learning (IL) [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ] has been proposed to resolve reward-function-designing
issues in RL. Rather than designing such a reward function, IL could learn policies directly from observing
expert’s demonstrations and generalize to new situations without any explicit interaction with environments.
A common approach for IL is Behavior Cloning (BC) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], where a robot observes a supervisor’s policy and
learns a mapping from states to actions directly with supervised learning. However, BC has a prerequisite that
demonstrations must meet the i.i.d assumption of statistical learning, or will suffer from several problems—with
the execution of the robot’s policy, robot’s state will move to a different distribution from teacher’s demonstration
which it was trained on, making it drift to dangerous states [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. For instance, when a robot moving to a strange
state, collisions could easily happen. Moreover, the robot’s action estimation errors will compound once the
robot’s states drift away from the supervisor’s demonstrations. Therefore, the model is hard to be adapted to
context-changing environments, which prevents the application in life-long navigation. On-policy approach, such
as Data Aggregation [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], can partially alleviate this issue by querying corrective samples online and iteratively
aggregate new data for training. However, DAgger requires a huge number of queries to update its policy, which
could be tedious for human teachers to answer and add more unnecessary computations. Recent works have
paid more attention to safety issues in robot learning tasks, such as safe RL [43, 44, 45], but seldom consider
the IL’s safety in specific applications. In addition, the model’s adaptability toward unseen environments plays
a vital role in lifelong robot navigation, which should be explicitly modeled in the deep neural network of IL.
      </p>
      <p>This paper presents Safety-Aware Imitation Learning (SAIL) framework by addressing both partially
observability and data distribution mismatching in IL. Firstly, to enhance the context-awareness in partially observable
environments and facilitate the adaptability to unseen scenarios and goals, we build a sequence-to-sequence deep
neural network with Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). In this
network, both spatial relevance and temporal relevance are taken into consideration, which could significantly
enhance the agent’s context-awareness and improve the model’s generation performance toward unseen
scenarios and goals. Secondly, UAIL is proposed using Bayesian approximation with MC-Dropout [42]. For those
potentially uncertain or unsafe actions that occurred in some unseen or unfamiliar scenarios, rather than to
perform it anyway, UAIL request for expert’s advising (similar to active learning) for whether to act or label
it. We predict the model uncertainty with MC-Dropout, a Bayesian approximation for uncertainty estimation
in deep learning. This uncertainty has been used to improve the safety and efficiency of On-Policy Imitation
Learning such as DAgger. Extensive experiments in the simulator have been conducted to compare the combined
SAIL method with the vanilla IL approach, SAIL shows better navigation performance (35.6% fewer expected
moving steps) and safer policy (22% lower collision rates). Additionally, with the learned safer policy, SAIL had
shown successfully adapted to unseen scenarios and goals with minimal navigation performance loss. The main
contributions of our work are listed as followings:</p>
      <p>(1) We built a Seq2seq deep neural network to enhance the agent’s context-awareness in partially-observable
scenarios and facilitate the model’s adaptability to unseen scenarios and goals.</p>
      <p>(2) We presented Uncertainty-Aware Imitation Learning (UAIL) approach by estimating model uncertainty
with MC-Dropout and combining it with On-policy IL method, which significantly improves the safety of IL.</p>
      <p>(3) We proposed SAIL by combining UAIL and the Seq2seq network and done extensive experiments and
evaluations, including navigation performance, safety, and the adaptability to unseen environments and goals.</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>This work is relevant to past literature in the domains of imitation learning, learning-based navigation and
uncertainty estimation. In this section, these areas’ related work is reviewed respectively.
2.1</p>
      <sec id="sec-2-1">
        <title>Imitation Learning</title>
        <p>
          One of the commonly used solutions for IL is Behavioral Cloning (BC) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], where the robot passively observes
expert’s full demonstrations and learns a policy mapping state to action via purely supervised learning. However,
BC suffers from serious safety problems, when executing its policy, the robot will drift to dangerous states. For
example, when a self-driving car steers to the edge of the road, it cannot be able to recover from it [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] has
pointed out it was due to the robot’s distribution being different from its demonstrator’s, once robot drifting away
from expert’s demonstrations, robot’s error will compound, which is a known problem named compounding error
(or covariate shift). DART algorithm [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], where noise is injected into the expert’s trajectory, can make the gap of
distribution more nearly, but DART cannot work on the scenario where the gap is serious. Inverse Reinforcement
Learning (IRL) [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] could restore the reward function from the expert’s behavior trajectory, which enables RL in
turn. [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] are two different branches of IRL. Generative Adversarial Imitation Learning (GAIL) [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] build
a generator and a discriminator to find a strategy that matches the distribution of state-action pairs of experts
and does not require any assumptions about the environment. The common problem of IRL and GAIL is their
poor scalability in real-world settings and expensive computation. On-policy approaches have been proposed
to address the compounding error problem in off-policy IL. DAgger [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] (Data Aggregation) is one of the
classical on-policy solutions. By continuously querying the experts for new corrections during execution, it can
make the robot’s execution trajectory distribution closer to the supervisors’. which has been demonstrated to
reduce compounding error and learn robust policies. However, DAgger suffers from several limitations: 1) it
is difficult for human experts to provide enough labeling; 2) visiting highly sub-optimal states is potentially
dangerous for a robot in real settings [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]; 3) it is computationally expensive to iteratively update the policy.
To alleviate the computation burden, [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] tries to train a classifier to predict whether or not is safe during the
robot’s execution. However, this approach adds challenges to computational efficiency and human experts.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Learning-based Navigation</title>
        <p>
          Learning-based techniques, especially the deep learning, approach the visual navigation problem in an end-to-end
fashion. There are mainly two types of learning based visual navigation methods: 1) RL based methods [
          <xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6, 7, 8, 9</xref>
          ]:
RL based methods are divided into two stages: an exploration stages, where the environment’s map information
is implicitly gathered; and an exploitation step, where the map information is used to navigate efficiently [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ],
The paper [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] explored deep RL for target-driven navigation—navigating from a current location to a target
position. The main objective of goal-driven navigation is to find the minimal sequence of actions from its current
location to a target. However, this method requires 100 million frames to converge, which is too difficult to
train; [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] applied an LSTM extension to DRL for 3D maze navigation. 2) IL-based methods: [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] proposed an
imitation learning method for autonomous control of an aerial vehicle. [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] explored 3D navigation tasks such as
2D grid navigation, target-reaching and line-following with deep IL. [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] applied the IL to autonomous navigation
in complex natural terrain. [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ] proposed zero-shot IL for visual navigation. All of the IL-based methods suffer
from either unsafe problem due to data distribution mismatch or poor computation efficiency.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Uncertainty Estimation in Deep Learning</title>
        <p>There are two types of uncertainty in deep learning: 1) Aleatoric uncertainty, or statistical uncertainty, is resulted
from the intrinsic data distribution themselves [46,47]. There are some approaches [48] had been used to model
aleatoric uncertainty, but it cannot be alleviated by simply adding more samples or prior knowledge to the deep
learning system. 2) Epistemic uncertainty, as known as systematic uncertainty, come from the model parameters
themselves due to limited it theoretically could be eliminated by giving additional training samples or offering
more prior knowledge. [50] provide a deep analysis of estimating uncertainty in deep neural networks. One of the
most popularly used frameworks for estimating epistemic uncertainty in deep learning is Bayesian approximation.
Such as MC-dropout [46, 49] and Bayesian ensemble [51]. In this work, we applied MC- Dropout to estimate
the model uncertainty due to its plug-and- play advantages in existing deep network architectures. The only
thing we need to do is to pass the same input to the deep network multiple times with random dropout rate and
compute the entropy or variance of the outputs, which could be used to represent the uncertainty.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>SAIL METHOD</title>
      <p>In this section, the SAIL framework for context-adaptive navigation is described in detail. Firstly, we introduce
context-aware deep neural network design for goal-driven navigation, then the safety aware imitation learning
using uncertainty estimation with MC-Dropout is presented.
3.1</p>
      <sec id="sec-3-1">
        <title>Context-Aware Neural Network Design</title>
        <p>
          For network design, as shown in Fig.2, we consider the spatial relevance between the observations and goal image.
In each time step, both of the current observation and goal image are fed into the network. The intuition of doing
this is inspired by [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], [
          <xref ref-type="bibr" rid="ref38">40</xref>
          ], which can be used to improve the model’s generalization performance on unseen goals.
When applying the IL approach to unseen goals, the network needs no re-training for each goal. Furthermore,
we also take into account the temporal relevance of state-action pairs before the current observation. This
technique works especially on scenarios where the robot can hardly observe all of the details of the current state
(or Partially-Observable Markovain Decision-Making). Recurrent Neural Network (RNN) is used to address this
problem. The goal of the network is to estimate the current action (such as moving forward or turning right)
from all of the history states and goal together. To be specific, this network can be divided into three parts.
        </p>
        <p>
          CNN Feature Extractor: To extract the image features of observations and targets respectively. we use
two separate weights-shared CNN streams—ResNet-50 [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]. which are used to transform the two images into
the same embedding space. ResNet-50 is pre-trained on ImageNet [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] and finetuned on our dataset before the
training stage. The outputs of the ResNet-50 are projected into a 512-dimension space.
        </p>
        <p>Embedding and FC Layer: this layer is used to fuse the observation feature and goal image feature to
a 1024- dimension joint representation and then projected it to a new 512-dimension vector on fully-connected
fusion layers. We use two separate fully-connected layers to learn the joint feature representation of observation
and goal image.
We can predict an optimal action at∧ with Pθ(at|ht).
where
where
Given the training set D = (si, ai) The goal is to maximize the log-likelihood of the output action sequences.</p>
        <p>at = πθ(st, g)
st+1 = env(st, at, g)
ht =</p>
        <p>s0
fθ(ht1 , st)
t = 0
t &gt;= 1
at∧ = arg max Pθ(at|ht)</p>
        <p>a</p>
        <p>Pθ(at|ht) = sof tmax(ht)
θ∗ = arg max log
θ</p>
        <p>X
(si,ai)∈DPθ(ai|si)
log Pθ(a|s) =</p>
        <p>X log Pθ(at|st)
t
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
RNN Context Modeling: To learn the temporal relevance of observation sequences. RNN Layers is
added right after the embedding and FC layer. Gradient Vanishing is a key problem of RNN with long-term
dependencies. Although LSTMs [38] are more prevalent in addressing this problem in past literature, we make
use of GRUs [37] that have smaller number of parameters, simpler to use and are generally faster to train than
LSTMs.</p>
        <p>A sequence-to-sequence IL model is built as Fig 2. At the time step t = 0, an initial goal g and a start state
s0 are fed into the network, then in each time step t &gt; 0, the agent takes an action at with its current policy πθ.</p>
        <p>Then the environment changes its state to a new state st+1 according to the transition dynamic env.</p>
        <p>The current hidden vector ht is a function fθ on current observation st, current goal g and the previous hidden
vector ht−1, θ is the parameter of function fθ.</p>
        <p>We minimize the cross-entropy loss between predicted action and corrective action made by experts with
Adam optimization algorithm.</p>
        <p>L(θ) = X</p>
        <p>X[πE (a|s, g) log(πθ(a|s, g))]
τinD a
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Safety-Aware Imitation Learning</title>
        <p>
          Considering the safety issues of off-policy IL due to the data distribution mismatching. UAIL was proposed
by considering model uncertainty and combining it with vanilla on-policy training methods—Data Aggregation
(DAgger) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. UAIL iteratively query the expert but not frequently as DAgger for new correctives samples
to train IL model. The uncertainty information in the current model is used to guide UAIL whether or not to
ask the expert for new sample labeling. Therefore, this combined on-policy IL method is query-efficient, only
when the uncertainty value is greater than a certain threshold, then we ask the expert for labeling and vice versa.
when the accumulated new labeled data is enough for training a new policy (the number of new labeled data is
larger than another threshold), we aggregate the new labeled data with the old one and retraining our network
using the new aggregated dataset. A basic idea to estimate the model uncertainty in neural networks is using
the entropy of softmax output in the categorical neural network. This method is very na¨ıve because it always
happened that the softmax entropy is large but the real uncertainty is small and vice versa.
        </p>
        <p>Uncertainty Estimation with MC-Dropout: To bridge the gap between uncertainty estimation and
deep neural network, we use MC-Dropout, a Bayesian approximation of uncertainty estimation in deep learning
to represent the confidence of action to be executed. Basically, it has been demonstrated that using Dropout
at inference time in the deep neural network is equivalent to doing Bayesian approximation in Bayesian deep
learning. The key idea here is letting dropout doing the same thing in both training and testing time. At test
time, we will repeat beta times in passing the same input to the network with a random dropout value.
MCDropout provides an efficient way to estimate uncertainty with minimal changes in most existing deep networks.
It provides a plug-and-play module to deep learning for uncertainty estimation.</p>
        <p>Safety-Aware Imitation Learning: To improve the training efficiency of the on-policy training approach,
we intend to reduce the number of queries and aggregation times for data aggregation. We propose an
uncertaintyaware imitation learning method (similar to active learning). It initializes model’s weights on initial dataset with
supervised learning, then executes the current policy until the learner’s confidence falls below a solid threshold, at
which point it queries the expert for corrective action. Uncertainty based approach may decide to stop asking for
queries once the confidence exceeds the threshold in all states. which make use of the complementary advantages
of humans and robots. Algorithm 1 summarizes the training procedure of Safe IL with uncertainty estimation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>4.1</p>
      <sec id="sec-4-1">
        <title>Dataset</title>
        <p>
          In this section, we evaluate the SAIL approach in the simulation environment—AI2-THOR1 [
          <xref ref-type="bibr" rid="ref37">39</xref>
          ]. We start out
by introducing the dataset and experiment setup, then present extensive experimental results on several metrics.
AI2-THOR framework is used to collect the training data. AI2-THOR is an excellent photo-realistic interactive
3D simulator for AI agents, it provides a 3D environment that looks similar to the real-world scenes. There
are totally 120 scenes in the AI2-THOR environment covering 4 different environment types, including kitchens,
living rooms, bedrooms, and bathrooms, and each room type consists of 30 specific scenes. In AI2-THOR, the
agent’s positions are sampled from the discrete grids with 4 action spaces (move forward, move back, turn right
and turn left). Each image consists of the agent first-person view in the rooms.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Experiment Setup</title>
        <p>In this experiment, the authors use 4 different scenes to train our models (bathrooms, bedrooms, kitchens, and
living rooms). For expert’s policy generation, we use the A* search algorithm to find the shortest path from the
start location to the target location while avoiding collisions simultaneously. In order to narrow the situation
gap between different room types, we trained our model on each scene separately. In each scene, the authors
set 10 remarkable goal locations, such as the laptops, chairs and table lamps. For each goal, this work runs 50
stochastic starting points to evaluate the performance. To validate the proposed approach, the authors compare
and evaluate the following 4 baselines and 2 proposed methods.</p>
        <p>(1) BC (off-policy): an off-policy IL algorithm that learns a mapping from states to actions directly by
supervised learning without any demonstrations sampling.</p>
        <p>(2) DART (off-policy): an improved BC method that randomly injects noise into the expert’s trajectory.
(3) DAgger (on-policy): an on-policy training approach by iteratively querying experts for the new correct
sample and aggregating the new dataset for training.</p>
        <p>(4) SaferDAgger (on-policy): an extension of vanilla DAgger method that minimizes the number of queries to
a reference policy both during training and testing stage.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Evaluations and Results</title>
        <p>The ultimate goal of goal-driven navigation is to find a given target with minimum steps while avoiding collisions
simultaneously. Although these two facts can be affected by each other, having fewer moving steps is not equal
to having fewer collisions in real scenarios. Therefore, we evaluate SAIL’s performance from several different
metrics.</p>
        <p>Expected Trajectory Steps. We evaluate the navigation performance by expected trajectory steps, the
expected trajectory steps are defined as the total number of steps taken to navigate from an initial start position
to a given target location. The goal of this task is to minimize the expected trajectory steps, the closer the
expected navigation steps to the shortest path is, the better the navigation’s performance is. For four different
scenes, the authors compare the training results of 4 different scenes with shortest path steps, as shown in Table
1. It can be seen that SAIL with MC-Dropout achieve shorter expected moving steps than vanilla off-policy and
on-policy IL approaches.</p>
        <p>Navigation Safety Evaluation. In order to evaluate the robot’s safety awareness in goal-directed
navigation, the expected collision rate is defined as the mean percentages of robot’s collisions with obstacles when
navigating from different start locations to different target positions. In this work, the agent is allowed to collide
with the environment, when the agent collides with the environment, we recognize it as one navigation step. As
shown in Table 2. SAIL with MC-Dropout can significantly reduce the collision rate of goal-driven navigation.</p>
        <p>Navigation Success Rate in Unseen Scenes. Another metric to evaluate navigation performance is the
navigation success rate in unseen scenes. Since there are 120 different scenes in each types of environments, we
set 100 scenes as training scenes and 20 scenes as testing scenes. Experiment result (see Table 3) show that
the performance of SAIL in unseen scenes was farther markedly improved than the baseline IL model. This can
be explained by the safer policy learned by SAIL and improved perception ability due to context-aware neural
network design.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This work proposes the SAIL approach for goal-driven context-adaptive visual navigation. This approach
addresses the robot’s safety issues due to data distribution mismatch and poor performance on partially-observable
visual navigation. To alleviate the safety issue, we model the policy uncertainty in deep learning with
MCDropout and combine it with the on-policy IL approach. To enhance the agent’s perception capability and
context awareness in partially-observable environments, we build a sequence-to-sequence deep neural network
with both spatial and temporal relevance taken into consideration. Experiments on simulator demonstrated that
the proposed SAIL method had better navigation performance on several evaluation metrics, compared with the
state-of-the-art IL methods. With the safer policy, SAIL had shown successfully be adapted to unseen scenarios
and goals with fewer collisions and minimal navigation performance loss.</p>
      <sec id="sec-5-1">
        <title>Acknowledgement</title>
        <p>We are thankful to Ai Shi who moderated this paper and, in that line improved the manuscript significantly.
[38] Sak H, Senior A, Beaufays F. Long short-term memory recurrent neural network architectures for large scale
acoustic modeling[C]//Fifteenth annual conference of the international speech communication association.
2014.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Mur-Artal</surname>
            <given-names>R</given-names>
          </string-name>
          , Tard´
          <string-name>
            <surname>os J D.</surname>
          </string-name>
          Orb-slam2:
          <article-title>An open-source slam system for monocular, stereo, and rgb-d cameras[J]</article-title>
          .
          <source>IEEE Transactions on Robotics</source>
          ,
          <year>2017</year>
          ,
          <volume>33</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1255</fpage>
          -
          <lpage>1262</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Davison</surname>
            <given-names>A J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reid</surname>
            <given-names>I D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molton N D</surname>
          </string-name>
          , et al.
          <article-title>MonoSLAM: Real-time single camera SLAM[J]</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis &amp; Machine Intelligence</source>
          ,
          <year>2007</year>
          (6):
          <fpage>1052</fpage>
          -
          <lpage>1067</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Engel</surname>
            <given-names>J</given-names>
          </string-name>
          , Sch¨ops
          <string-name>
            <given-names>T</given-names>
            ,
            <surname>Cremers</surname>
          </string-name>
          <string-name>
            <given-names>D.</given-names>
            <surname>LSD-SLAM</surname>
          </string-name>
          :
          <article-title>Large-scale direct monocular</article-title>
          SLAM[C]//European Conference on
          <source>Computer Vision</source>
          . Springer, Cham,
          <year>2014</year>
          :
          <fpage>834</fpage>
          -
          <lpage>849</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Endres</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hess</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sturm</surname>
            <given-names>J</given-names>
          </string-name>
          , et al. 3
          <article-title>-D mapping with an RGB-D camera[J]</article-title>
          .
          <source>IEEE Transactions on Robotics</source>
          ,
          <year>2014</year>
          ,
          <volume>30</volume>
          (
          <issue>1</issue>
          ):
          <fpage>177</fpage>
          -
          <lpage>187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Yu</surname>
          </string-name>
          , Chao, Liu, Zuxin, Liu, Xinjun, et al.
          <article-title>DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments</article-title>
          [J].
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Zhu</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mottaghi</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolve</surname>
            <given-names>E</given-names>
          </string-name>
          , et al.
          <article-title>Target-driven visual navigation in indoor scenes using deep reinforcement learning</article-title>
          [C]//Robotics and Automation (ICRA),
          <source>2017 IEEE International Conference on. IEEE</source>
          ,
          <year>2017</year>
          :
          <fpage>3357</fpage>
          -
          <lpage>3364</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mnih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Badia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Graves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. P.</given-names>
            <surname>Lillicrap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Harley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          .
          <article-title>Asynchronous methods for deep reinforcement learning</article-title>
          .
          <source>In International Conference on Machine Learning</source>
          ,
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Springenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Boedecker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Burgard</surname>
          </string-name>
          .
          <article-title>Deep reinforcement learning with successor features for navigation across similar environments</article-title>
          .
          <source>arXiv preprint arXiv:1612.05533</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Tai,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiong</surname>
          </string-name>
          , M. Liu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Boedecker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Burgard</surname>
          </string-name>
          .
          <article-title>Vr goggles for robots: Real-to-sim domain adaptation for visual control</article-title>
          .
          <source>arXiv preprint arXiv:1802.00265</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bojarski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Del Testa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dworakowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Firner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Flepp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Jackel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Monfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Muller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.
          <article-title>End to end learning for self-driving cars</article-title>
          .
          <source>arXiv preprint:1604.07316</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Giusti</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guzzi</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciresan D C</surname>
          </string-name>
          , et al.
          <article-title>A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots[J]</article-title>
          .
          <source>IEEE Robotics and Automation Letters</source>
          ,
          <year>2016</year>
          ,
          <volume>1</volume>
          (
          <issue>2</issue>
          ):
          <fpage>661</fpage>
          -
          <lpage>667</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Ratliff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Bagnell</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Srinivasa</surname>
          </string-name>
          .
          <article-title>Imitation learning for locomotion and manipulation</article-title>
          .
          <source>In International Conference on Humanoid Robots</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Bain</surname>
          </string-name>
          and
          <string-name>
            <given-names>Claude</given-names>
            <surname>Sommut</surname>
          </string-name>
          .
          <article-title>A framework for behavioural claning</article-title>
          .
          <source>Machine Intelligence</source>
          <volume>15</volume>
          ,
          <fpage>15</fpage>
          :
          <fpage>103</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Gordon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Bagnell</surname>
          </string-name>
          .
          <article-title>A reduction of imitation learning and structured prediction to no-regret online learning</article-title>
          .
          <source>arXiv preprint arXiv:1011.0686</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Codevilla</surname>
            <given-names>F</given-names>
          </string-name>
          , Mu¨ller M,
          <string-name>
            <surname>Dosovitskiy</surname>
            <given-names>A</given-names>
          </string-name>
          , et al.
          <article-title>End-to-end driving via conditional imitation learning</article-title>
          [J].
          <source>arXiv preprint arXiv:1710.02410</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Pan</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cheng</surname>
            <given-names>C A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saigol</surname>
            <given-names>K</given-names>
          </string-name>
          , et al. Agile
          <string-name>
            <surname>Off-Road Autonomous Driving Using</surname>
          </string-name>
          End-to-
          <source>End Deep Imitation Learning[J]. arXiv preprint arXiv:1709.07174</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Giusti</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guzzi</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciresan D C</surname>
          </string-name>
          , et al.
          <article-title>A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots[J]</article-title>
          .
          <source>IEEE Robotics and Automation Letters</source>
          ,
          <year>2016</year>
          ,
          <volume>1</volume>
          (
          <issue>2</issue>
          ):
          <fpage>661</fpage>
          -
          <lpage>667</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Melik-Barkhudarov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Shankar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wendel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Bagnell</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hebert</surname>
          </string-name>
          .
          <article-title>Learning monocular reactive UAV control in cluttered natural environments</article-title>
          .
          <source>In ICRA</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Pathak</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahmoudieh</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            <given-names>G</given-names>
          </string-name>
          , et al.
          <article-title>Zero-shot visual imitation</article-title>
          [C]//International Conference on Learning Representations.
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Tai</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>M</given-names>
          </string-name>
          , et al.
          <article-title>Socially-compliant navigation through raw depth inputs with generative adversarial imitation learning</article-title>
          [J].
          <source>arXiv preprint arXiv:1710.02543</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Ratliff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Bagnell</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Srinivasa</surname>
          </string-name>
          .
          <article-title>Imitation learning for locomotion and manipulation</article-title>
          .
          <source>In International Conference on Humanoid Robots</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>P.</given-names>
            <surname>Englert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paraschos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peters</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Deisenroth</surname>
          </string-name>
          .
          <article-title>Model-based imitation learning by probabilistic trajectory matching</article-title>
          .
          <source>In ICRA</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ross</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Bagnell</surname>
          </string-name>
          .
          <article-title>Efficient reductions for imitation learning</article-title>
          .
          <source>In International Conference on Artificial Intelligence and Statistics</source>
          , pages
          <fpage>661</fpage>
          -
          <lpage>668</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Laskey</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            <given-names>R</given-names>
          </string-name>
          , et al. Dart:
          <article-title>Noise injection for robust imitation learning</article-title>
          [J].
          <source>arXiv preprint arXiv:1703.09327</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Abbeel</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            <given-names>A Y</given-names>
          </string-name>
          .
          <article-title>Inverse reinforcement learning[M] //Encyclopedia of machine learning</article-title>
          . Springer, Boston, MA,
          <year>2011</year>
          :
          <fpage>554</fpage>
          -
          <lpage>558</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Ziebart</surname>
            <given-names>B D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maas</surname>
            <given-names>A L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagnell</surname>
            <given-names>J A</given-names>
          </string-name>
          , et al. Maximum Entropy Inverse Reinforcement Learning[C]//AAAI.
          <year>2008</year>
          ,
          <volume>8</volume>
          :
          <fpage>1433</fpage>
          -
          <lpage>1438</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Ramachandran</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amir</surname>
            <given-names>E.</given-names>
          </string-name>
          <article-title>Bayesian inverse reinforcement learning</article-title>
          [J].
          <source>Urbana</source>
          ,
          <year>2007</year>
          ,
          <volume>51</volume>
          (
          <issue>61801</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Ho</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ermon</surname>
            <given-names>S.</given-names>
          </string-name>
          <article-title>Generative adversarial imitation learning[</article-title>
          <source>C]//Advances in Neural Information Processing Systems</source>
          .
          <year>2016</year>
          :
          <fpage>4565</fpage>
          -
          <lpage>4573</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          .
          <article-title>Query-efficient imitation learning for end-to-end autonomous driving</article-title>
          .
          <source>arXiv preprint arXiv:1605.06450</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Dhiman</surname>
            <given-names>V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banerjee</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Griffin</surname>
            <given-names>B</given-names>
          </string-name>
          , et al.
          <article-title>A Critical Investigation of Deep Reinforcement Learning for Navigation[J]</article-title>
          .
          <source>arXiv preprint arXiv:1802.02274</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Piotr</surname>
            <given-names>Mirowski</given-names>
          </string-name>
          , Razvan Pascanu, Fabio Viola, Hubert Soyer,
          <string-name>
            <given-names>Andrew J.</given-names>
            <surname>Ballard</surname>
          </string-name>
          , Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, and
          <string-name>
            <given-names>Raia</given-names>
            <surname>Hadsell</surname>
          </string-name>
          .
          <article-title>Learning to navigate in complex environments</article-title>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Sammut</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurst</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kedzier</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michie</surname>
            <given-names>D</given-names>
          </string-name>
          et al (
          <year>1992</year>
          )
          <article-title>Learning to fly</article-title>
          .
          <source>In: Proceedings of the ninth international workshop on machine learning</source>
          , pp
          <fpage>385</fpage>
          -
          <lpage>393</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Hussein</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elyan</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaber</surname>
            <given-names>M M</given-names>
          </string-name>
          , et al.
          <article-title>Deep imitation learning for 3D navigation tasks</article-title>
          [J].
          <source>Neural computing and applications</source>
          ,
          <year>2018</year>
          ,
          <volume>29</volume>
          (
          <issue>7</issue>
          ):
          <fpage>389</fpage>
          -
          <lpage>404</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Silver</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagnell</surname>
            <given-names>J A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stentz</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Applied imitation learning for autonomous navigation in complex natural terrain</article-title>
          [C]//Field and Service Robotics. Springer, Berlin, Heidelberg,
          <year>2010</year>
          :
          <fpage>249</fpage>
          -
          <lpage>259</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Szegedy</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ioffe</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoucke</surname>
            <given-names>V</given-names>
          </string-name>
          , et al.
          <article-title>Inception-v4, inception-resnet and the impact of residual connections on learning</article-title>
          [C]//AAAI.
          <year>2017</year>
          ,
          <volume>4</volume>
          :
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Olga</surname>
            <given-names>Russakovsky</given-names>
          </string-name>
          , Jia Deng,
          <string-name>
            <given-names>Hao</given-names>
            <surname>Su</surname>
          </string-name>
          , et al. ImageNet
          <string-name>
            <surname>Large Scale Visual Recognition Challenge</surname>
          </string-name>
          [J].
          <source>International Journal of Computer Vision</source>
          ,
          <year>2015</year>
          ,
          <volume>115</volume>
          (
          <issue>3</issue>
          ):
          <fpage>211</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Kolve</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mottaghi</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gordon</surname>
            <given-names>D</given-names>
          </string-name>
          , et al. AI2
          <article-title>-THOR: An interactive 3d environment for visual AI[J]</article-title>
          .
          <source>arXiv preprint arXiv:1712.05474</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [40]
          <string-name>
            <surname>Sadeghi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toshev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jang</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Levine</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          (pp.
          <fpage>4691</fpage>
          -
          <lpage>4699</lpage>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>