<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Guaranteeing Deep Neural Network Outputs in a Feasible Region</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hiroshi Maruyama</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Preferred Networks</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Inc. Tokyo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Japan hm</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>@preferred.jp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fig. 3. Possible DNN Outputs and Feasible Region</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>40</fpage>
      <lpage>42</lpage>
      <abstract>
        <p>-A deep neural network (DNN) maps a point in the Rm input space into a point in the Rn output space. Any point in the output space may appear depending on the combination of the training data set, the input data point, and the hyper parameters of the DNN. In real applications, some of these output points may not be feasible solutions and should not be used, for example, as an input to a safety-critical system. We propose a post-DNN transformation that maps the Rn into the feasible space. By making this transformation differentiable, the teacher signals can be applied in this transformed space. Index Terms-machine learning, safety, feasible solutions</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>II. POLICY FILTER</title>
      <p>In our examples below we assume n = 2, that is, the output
of our DNN is in the 2-dimensional Euclid space but the same
argument applies to any n &gt; 1. We further assume that the
feasible region is a convex set.</p>
      <p>Figure 2 shows an example. The blue convex set shows the
feasible region and the red dots represents possible outputs
from our DNN (either in the training time or in the inference
time). Note that some red points fall outside of the feasible
region.</p>
      <p>The simplest way for guaranteeing that the output is always
feasible is to remove infeasible solutions by a policy filter as
shown in Fig. 3.</p>
      <p>Let the DNN output be y^. The filtered output y is defined
as follows:
y =
y^
?
if P (y^)</p>
      <p>Otherwise</p>
      <p>Here ? represents that no output is generated. This strategy
works when there is a back-up mechanism that generates an
alternative solution (e.g., there is a default position where the
drone has to go, or a human operator supplies the destination
in case the DNN fails to do so).</p>
      <p>If no such back-up mechanism is available, we may be able
to use the nearest point in the feasible region as the output.
y =
y^
g(y^)
if P (y^)</p>
      <p>Otherwise</p>
      <p>Here, g(y^) is a function that returns the nearest feasible
solution. Fig. 5 shows a nonfeasible solution that is to be
moved into the feasible region.</p>
      <p>Filtering is simple and effective, but it fails to reflect the
fact that the DNN proposed a non-feasible solution. The
DNN should have inferred some patterns (i.e., a mathematical
model) during the training, and there must be reasons why the
DNN generated such outputs even though they are undesirable
or unsafe. For example, the drone may travel faster if the
given reference point is further away. A human operator may
give such an out-of-bound reference point with an intention
to move the reference point back to the feasible region in
the subsequent control inputs. For a safety-critical,
fullyautonomous system, however, it may be required that all the
reference points are within the feasible region.</p>
    </sec>
    <sec id="sec-2">
      <title>III. TRANSFORMATION TO FEASIBLE SPACE</title>
      <p>We propose a transformation from the Rn space to the
feasible region. This transformation is done in the following
two steps. First, the unbounded Rn space is transformed into
the n-dimensional hypercube (0; 1)n. One way to do this is to
use the Sigmoid function &amp; (x).</p>
      <p>&amp; (x) =</p>
      <p>1
1 + e x</p>
      <p>The results of this bounding transformation are shown in
Fig. 6.</p>
      <p>Next, we select one interior point in the feasible region. We
call it the pivot. Without loss of generality, we can assume
that the pivot coincides with the origin of the hypercube.</p>
      <p>For every bounded point P = &amp; (y^), we draw a half line
from the origin O to P , and let the intersections of OP with
the feasible region boundary and the hypercube be Q and R,
respectively (see Fig. 7). We move P to another point S on
this half line so that jOP j : jORj = jOSj : jOQj.</p>
      <p>The results of this two-step transformation is shown in 8.
With this transformation, every transformed point is in the
interior of the feasible region for any combination of the input
point, the training data set, and the hyperparameters.</p>
      <p>Note that our transformation is continuous and almost
everywhere differentiable. Thus, we can supply teacher signal
in the transformed space and back-propagate the error through
this transformation.</p>
      <p>The discussion so far assumed that the feasible region is
a convex set. Actually, the same discussion holds when the
feasible region is star-shaped as in Fig. 9. A star-shaped region
has at least one interior point O such that for any interior point
X , the line segment OX is included in the region. In this case,
we use O as the pivot in our transformation.</p>
    </sec>
    <sec id="sec-3">
      <title>IV. RELATED WORKS</title>
      <p>
        The general AI safety issues have been discussed especially
in the context of reinforcement learning for controlling
realworld machines (e.g., robots and drones). There are many
sources of unsafe behaviours such as having a wrong reward
function and exploring unsafe regions during the learning
process, and they are extensively discussed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Some approaches such as [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] express the constraints within
the reward function (e.g., giving exponentially large penalties
when the control input gets closer to the safety boundary).
Other approaches include searching the worst-case inputs
(formulated as an optimization problem) for a given DNN [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
and estimating possible behaviour of the current DNN using
a Bayesian validation method [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>These approaches are to give probabilistic guarantees to the
safety, which may not be acceptable for highly critical systems.
Also they assume that the pretrained model is given, which
requires that if the training data set or the hyper parameters are
changed the varification results cannot be used. Our method
guarantees that no infeasible solutions are produced no matter
how the training is done. In addition, our method is very
simple to implement, which is a very important factor for
safty-critical systems.</p>
      <p>
        Statistical machine learning can be seen as a new paradigm
in programming – instead of building a software system by the
traditional top-down approach (i.e., starting from the
requirement and gradually breaking down it to smaller components),
machine learning enables a bottom-up, inductive approach to
build a software system. Just as we studied the knowledge on
how traditional software development can be done in a safe
and effective way and organized the knowledge as Software
Engineering, we believe that now is the time to start a new
engineering discipline Machine Learning Systems Engineering
(MLSE) so that we organize the knowledge on how ML-based
systems are safely and effectively developed and maintained
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The work reported here will be one of the first attempts
of the upcoming MLSE studies.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Olah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Steinhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Christiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schulman</surname>
          </string-name>
          , and D. Mane´, “Concrete problems in ai safety,
          <source>” arXiv preprint arXiv:1606.06565</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Geibel</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wysotzki</surname>
          </string-name>
          , “
          <article-title>Risk-sensitive reinforcement learning applied to control under constraints</article-title>
          ,
          <source>” Journal of Artificial Intelligence Research</source>
          , vol.
          <volume>24</volume>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>108</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Dvijotham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stanforth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gowal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Kohli</surname>
          </string-name>
          , “
          <article-title>A dual approach to scalable verification of deep networks</article-title>
          ,” arXiv preprint arXiv:
          <year>1803</year>
          .06567,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Fisac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Akametalu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Zeilinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kaynama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gillula</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Tomlin</surname>
          </string-name>
          , “
          <article-title>A general safety framework for learning-based control in uncertain robotic systems</article-title>
          ,
          <source>” arXiv preprint arXiv:1705.01292</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Maruyama</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Kido</surname>
          </string-name>
          , “
          <article-title>Machine learning engineering and reuse of ai work products</article-title>
          ,
          <source>” in The First International Workshop on Sharing and Reuse of AI Work Products</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>