<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Intelligent Battery Management for aquatic drones based on Task Difficulty driven POMDPs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alberto Castellini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jason J. Blum</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domenico D. Bloisi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Farinelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Verona University, Department of Computer Science</institution>
          ,
          <addr-line>Strada Le Grazie 15, 37134 Verona</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Autonomous drones typically have limited battery capacity, which presents a major challenge for persistent, long-term deployments. In the context of water quality monitoring, aquatic drones must navigate rivers and lakes to collect real-time data, where the battery consumption is heavily influenced by dynamic environmental factors such as flowing current and wind. Intelligent battery management is a requirement for the success of these missions. We propose a formalization of the battery management problem in terms of Partially Observable Markov Decision Processes (POMDPs). We model the problem as a POMDP where the agent modulates the energy provided to its propellers. The drone can estimate the “difficulty” to traverse an interval by observing the environment. An important aspect of this formalization is the prediction of future intervals' difficulty values based on current observations by exploiting a priori geometric information, which we call Task Difficulty Propagation (TDP). We investigate variations of this approach and analyze related performance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Autonomous aquatic drones are increasingly used for water monitoring. A key factor
for the success of data acquisition campaigns is mission awareness, which is composed
of three main elements [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]: knowledge of mission objectives, internal self-situational
awareness, and external self-situational awareness. Due to the limited energy on-board
and the requirements for long-term autonomy and reliability, the ability to forecast
the mission-specific energy consumption is necessary to complete missions reliably,
smoothly, and efficiently. In this work we focus on battery management, which is an
aspect of self-situational awareness.
      </p>
      <p>
        Existing approaches for estimating the battery consumption in autonomous systems
can be divided into two main categories, namely decision [
        <xref ref-type="bibr" rid="ref2 ref3 ref8">3,8,2</xref>
        ] and predictive models
[
        <xref ref-type="bibr" rid="ref11 ref4 ref9">4,11,9</xref>
        ]. All these approaches deal with different aspects of a common problem:
making optimal decisions for battery management in uncertain environments. In this paper,
we propose a methodology based on Partially Observable Markov Decision Processes
(POMDPs) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], a decisional model that combines the strength of Hidden Markov
Models with that of Markov Decision Process by capturing both dynamics that depend on
unobserved states and effects of decisions over time.
      </p>
      <p>
        We apply our method to the following case study: given i) a path made of four
perpendicular segments (Figure 1.c), in which each segment has a specific degree of
difficulty (i.e., a measure of the forward speed achieved for a given amount of energy
consumption) and ii) an initial amount of battery energy, the agent has to make decisions
about the amount of engine power needed at each time instant to minimize the time
spent to reach the end of the path and maximize the probability to complete the mission
before battery depletion. To this end POMDPs are extended with a novel simulation
strategy called Task Difficulty Propagation (TDP) which is based on Markov Random
Fields (MRF), and allows to represent the dependencies between segment difficulties
and to propagate this information for improving the belief over the difficulties in the
entire path. First experiments show promising enhancements to mission performance.
Future developments aim at using the method on the INTCATCH [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] drones (Fig. 1.a).
A POMDP is a tuple (S; A; O; T ; Z; R; ), where S is a finite set of states, A is a finite
set of actions, Z is a finite set of observations, T : S A ! (S) is the state-transition
model, O: S A ! (Z) is the observation model, R: S A ! R is the reward
function and 2 [0; 1) is the discount factor.
      </p>
      <p>The goal of an agent based on POMDPs is to maximize its expected total reward
E[Pt1=0 tR(st; at)], by choosing the best action at in each state st at time t.</p>
      <p>Figure 1.d shows the proposed POMDP for battery management as a graph, where
states are represented by circles, actions by rectangles and conditional dependencies
between states by arrows. The states are: i) battery voltage interval B 2 fb0; b1; b2; b3g,
where b0 represents empty battery, ii) speed interval V 2 fv1; v2; v3g, iii) distance to
path end D 2 fd0; d1; :::; d8g, where d8 is the starting interval of the path (i.e., the
maximal distance) and d0 is the end point of the path (i.e., the goal), iv) time interval T 2
ft1; t2; t3; t4; t5g, where t1 is the time interval in which the path starts, v) current
segment S 2 fs1; s2; s3; s4g, vi) difficulty of segments F1; F2; F3; F4 j Fj 2 fL; M; Hg,
where L=low, M=medium, H=high, and difficulties may depend on water flow or other
environmental factors, vii) difficulty of current segment Fc 2 fL; M; Hg, viii) engine
power P 2 fL; M; Hg, which corresponds to a uniform discretization of the combined
effort of the drone’s propellers.
( 2413 ; : : : ; 2413 ).</p>
      <p>
        States B; V; D; T and S are observable, while states F1; : : : ; F4 and Fc are hidden,
hence the model has a mixed observability [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Rewards depend on engine power (i.e.,
-0.01 for low power, -0.02 for medium power and -0.05 for high power), time of arrival
(i.e., +500 for arriving at time t5, +1000 for arriving at time t4, +1500 for arriving
at time t3, +2000 for arriving at time t2, +2500 for arriving at time t1) and battery
depletion before arrival (i.e., -5000). The discount factor is = 0:95. The five hidden
variables can assume three possible values each (i.e., L, M or H), for a total of 35 = 243
configurations of the hidden state (e.g., f = (L; M; H; L; L)). The belief bt provides the
probability over all possible configurations at time t. It is a vector (bt1; : : : ; bt243), such
that btj 0 and Pj2=431 btj = 1. We set the initial belief b0 to the uniform distribution
      </p>
      <p>In order to represent dependencies between segment difficulties we define two
elements: i) a cosine similarity matrix which provides a measure of geometric similarity
between pairs of segments in the path (see matrix K in Figure 1.c), ii) a pairwise MRF
to compute a joint probability function over configurations of segment difficulties (see
the graph in Figure 1.c). The MRF is used to factorize the joint probability of
segment difficulties as a product of potential functions over the maximal cliques of the
graph. The joint probability is then written as p(f j ) = Z(1 ) Qq2Q q(fqj q), where
f is a vector of difficulties (e.g., (H; M; L; L)), is a parametrization of the MRF,
Q is the set of (maximal) cliques of the graph, q(fqj q) is a potential function for
clique q and Z( ) is a normalization factor called partition function. We use a
simplified version of MRF called pairwise MRF in which the parametrization is restricted
to the edges of the graph rather than to the maximal cliques and we conveniently
express potentials as exponentials with Boltzmann distribution. In this way the product of
potentials in p(f j ) can be computed by adding the energies of each of the maximal
cliques. Since our difficulty variables are discrete, we represent the energy functions
as tables of (non-negative) numbers representing the relative “compatibility” between
the different assignments of segment difficulties. In particular, given a pair of segments
(si; sj ) j i; j 2 f1; 2; 3; 4g having cosine similarity equal to 1, 0 or -1, the energy
functions of the corresponding pair of difficulties (fi; fj ) j fi; fj 2 fL; M; Hg are defined
as E1(v; w), E0(v; w) and E 1(v; w) where v = I(fi), w = I(fj ) and I( ) is the
index function. To compute the energy function of couples of segments having generic
cosine similarity k 2 [ 1; 1] we interpolate the parameters in E1(v; w), E0(v; w) and
E 1(v; w) by quadratic polynomials.</p>
      <p>The joint probability computed by the MRF is finally used to update the beliefs
generated step-by-step by the TDP simulator according to the formula btj = btj +
(1 ) btj p(f j ), where t 2 N is the time step, j 2 f1; : : : ; 243g is the index of
the difficulty configuration (i.e., hidden state), 2 [0; 1] is a weighting factor, btj is the
probability of the j-th difficulty configuration provided by standard simulator, p(f j ))
is the joint probability of the j-th difficulty configuration provided by the MRF, and btj
is the probability of the j-th difficulty configuration provided by TDP simulator.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <p>
        We implemented the POMDP model in a POMDPX XML file [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and computed a
policy for this model using function pomdpsol of the SARSOP solver [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] in the APPL
Toolkit [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Then we simulated the model 100 times using the function pomdpsim of
the APPL toolkit, thus obtaining 100 time-evolutions of the system. In all these
simulations the true hidden state was manually fixed to f = (H; H; L; L). We call these
time evolutions standard (STD) simulations since they were computed by the standard
simulator and do not consider the path geometry to improve the belief over segment
difficulties. Each simulation has 300 time steps and stores information about the evolution
of observable state variables, belief over hidden state, actions and rewards.
      </p>
      <p>Subsequently, we modified the pomdpsim simulator in two ways: i) by forcing the
belief to the true hidden state (i.e., the probability distribution has all the mass into the
real state and zero probability for other states), we call this simulator oracle (ORC)
since it has complete information about the true hidden state, ii) by implementing the
TDP approach presented above, we call this simulator TDP since it considers path
geometry to improve the belief update over segment (i.e., task) difficulties. We performed
100 simulations by ORC and TDP simulators, respectively, and compared results with
those achieved by STD simulator, in terms of belief evolution, average number of steps
to reach the goal, and average number of battery failures.</p>
      <p>
        Figure 2.a shows the evolution of beliefs in the three simulation cases: columns
represent different simulation strategies, namely, STD, TDP and ORC, while rows
represent segments 1, 2, 3 and 4 of the square path. Each chart shows the average
belief (over 100 simulations) for a particular simulation strategy in a specific segment.
A consequence of the improved belief update strategy in TDP is the improvement of
performance in terms of (reduced) number of steps to reach the end of the path. This is
observable in Figure 2.b which shows the average number of steps required by each
approach to reach the end of segments s1; s2; s3 and s4. We observe that using TDP (blue
line) the agent is able to reach the end of the path with less steps than using STD (red
line) and with a similar failure rate (i.e., the percentage of times in which the battery
ends before the end of the path). ORC has the best performance with 96.4 steps, on
average, to reach the end of the square and 24% of failures. The second best performance
is that of TDP with 102.1 steps and a failure rate of 22%. Both approaches outperform
STD which needs 111.6 steps and has a failure rate of 22%. A key extension of the
proposed approach concerns scaling on more complex scenarios. To this end we are
currently exploring the usage of Partially Observable Monte Carlo Planning (POMCP)
methods [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>This work is partially funded by the European Union’s Horizon 2020 research and
innovation programme under grant agreement No 689341.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>APPL</given-names>
            <surname>Toolkit</surname>
          </string-name>
          . http://bigbird.comp.nus.edu.sg/pmwiki/farm/appl/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>V.</given-names>
            <surname>Berenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tanaka</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Suzuki</surname>
          </string-name>
          .
          <article-title>Autonomous battery management for mobile robots based on risk and gain assessment</article-title>
          .
          <source>Artificial Intelligence Review</source>
          ,
          <volume>37</volume>
          (
          <issue>3</issue>
          ):
          <fpage>217</fpage>
          -
          <lpage>237</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>F.</given-names>
            <surname>Dressler</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Fuchs</surname>
          </string-name>
          .
          <article-title>Energy-aware operation and task allocation of autonomous robots</article-title>
          .
          <source>In Proc. of the 5th Int. Workshop on Robot Motion and Control</source>
          , pages
          <fpage>163</fpage>
          -
          <lpage>168</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Hamza</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Ayanian</surname>
          </string-name>
          .
          <article-title>Forecasting battery state of charge for robot missions</article-title>
          .
          <source>In SAC</source>
          , pages
          <fpage>249</fpage>
          -
          <lpage>255</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>5. INTCATCH H2020 EU project website</article-title>
          . http://www.intcatch.eu/.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Kaelbling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Littman</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Cassandra</surname>
          </string-name>
          .
          <article-title>Planning and acting in partially observable stochastic domains</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>101</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>99</fpage>
          -
          <lpage>134</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>H.</given-names>
            <surname>Kurniawati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hsu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. S.</given-names>
            <surname>Lee</surname>
          </string-name>
          . Sarsop:
          <article-title>Efficient point-based pomdp planning by approximating optimally reachable belief spaces</article-title>
          .
          <source>In Robotics: Science and Systems</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Malrey</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Mahmoud</given-names>
            <surname>Tarokh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Cross</surname>
          </string-name>
          .
          <article-title>Fuzzy logic decision making for multirobot security systems</article-title>
          .
          <source>Artificial Intelligence Review</source>
          ,
          <volume>34</volume>
          (
          <issue>2</issue>
          ):
          <fpage>177</fpage>
          -
          <lpage>194</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>J. LeSage</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Longoria</surname>
          </string-name>
          .
          <article-title>Characterization of load uncertainty in unstructured terrains and applications to battery remaining run-time prediction</article-title>
          .
          <source>J. Field Rob</source>
          .,
          <volume>30</volume>
          (
          <issue>3</issue>
          ):
          <fpage>472</fpage>
          -
          <lpage>487</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S. C. W.</given-names>
            <surname>Ong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Png</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hsu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. S.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Planning under uncertainty for robotic tasks with mixed observability</article-title>
          .
          <source>Int. J. of Robotics Research</source>
          ,
          <volume>29</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1053</fpage>
          -
          <lpage>1068</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>A.</given-names>
            <surname>Sadrpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Ulsoy</surname>
          </string-name>
          .
          <article-title>Mission energy prediction for unmanned ground vehicles</article-title>
          .
          <source>In ICRA</source>
          , pages
          <fpage>2229</fpage>
          -
          <lpage>2234</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Veness</surname>
          </string-name>
          .
          <article-title>Monte-carlo planning in large pomdps</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          <volume>23</volume>
          , pages
          <fpage>2164</fpage>
          -
          <lpage>2172</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>