<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Testing of Deep Learning Systems</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Jianjun Zhao Kyushu University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>-Deep learning has achieved great success in many application domains such as image processing, speech recognition, and autonomous vehicles. However, how to ensure the reliability of deep learning systems remains an open problem. In this keynote, I introduce several automated testing techniques to ensure the reliability of deep learning systems. Index Terms-Deep learning, software testing, system reliabil-</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>ity</p>
      <p>
        Deep learning (DL) has achieved great success in many
application domains such as image processing, speech
recognition, and autonomous vehicles. However, how to ensure the
reliability and security of deep learning systems remains an
open problem. For example, an attacker could add adversarial
perturbations often imperceptible to human eyes to an image
to cause a deep neural network (DNN) to misclassify perturbed
images. Traditional software represents its logic as control
flows crafted by human knowledge, while a DNN characterizes
its behaviors by the weights of neuron edges and the
nonlinear activation functions (determined by the training data).
Therefore, detecting erroneous behaviors in DNNs is different
from those of traditional software in nature, which necessitates
effective analysis, testing and verification approaches [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We
plan to take a multi-pronged approach to explore a deeper
understanding of defects and adversarial examples in a DL
system and propose some methods to guarantee its reliability
and security. We next briefly introduce several automated
testing techniques to ensure the reliability of DL systems.
      </p>
      <p>
        Test Coverage Criteria for DL Systems. Currently, the
testing adequacy of a DL system is usually measured by the
accuracy of test data. Considering the limitation of accessible
high-quality test data, good accuracy performance on test data
can hardly provide confidence to the testing adequacy and
generality of DL systems. Unlike traditional software systems
that have clear and controllable logic and functionality, the
lack of interpretability in a DL system makes system analysis
and defect detection difficult, which could potentially hinder
its real-world deployment. To this end, we propose
DeepGauge [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a set of multi-granularity testing criteria for DL
systems, which aims at rendering a multi-faceted portrayal of
the testbed. The in-depth evaluation of our proposed testing
criteria is demonstrated on two well-known datasets, five
DL systems, and with four state-of-the-art adversarial attack
techniques against DL. The potential usefulness of DeepGauge
sheds light on the construction of more generic and robust DL
systems.
      </p>
      <p>ACKNOWLEDGMENT</p>
      <p>This work was partially supported by 973 Program in China
(No. 2015CB352203) and JSPS KAKENHI Grant 18H04097.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Jana</surname>
          </string-name>
          , “Deepxplore:
          <source>Automated whitebox testing of deep learning systems,” in Proc. 26th Symposium on Operating Systems Principles</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Juefei-Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , “
          <article-title>Deepgauge: Comprehensive and multi-granularity testing criteria for gauging the robustness of deep learning systems,”</article-title>
          <source>in Proc. 33th IEEE/ACM Conference on Automated Software Engineering (ASE</source>
          <year>2018</year>
          ),
          <year>2018</year>
          , pp.
          <fpage>120</fpage>
          -
          <lpage>131</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Juefei-Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>See</surname>
          </string-name>
          , “Deephunter:
          <article-title>Hunting deep neural network defects via coverage-guided fuzzing</article-title>
          ,” in arXiv:
          <year>1809</year>
          .01266,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Sun,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Juefei-Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , “
          <article-title>Deepmutation: Mutation testing of deep learning systems,”</article-title>
          <source>in Proc. 29th IEEE International Symposium on Software Reliability Engineering (ISSRE</source>
          <year>2018</year>
          ),
          <year>2018</year>
          , pp.
          <fpage>120</fpage>
          -
          <lpage>131</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>