<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparing Capability of Static Analysis Tools to Detect Security Weaknesses in Mobile Applications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tosin Daniel Oyetoyan</string-name>
          <email>tosin.oyetoyan@sintef.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcos Lordello Chaim</string-name>
          <email>chaim@usp.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Software Engineering, Safety and Security SINTEF Digital</institution>
          ,
          <addr-line>Trondheim</addr-line>
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Software Analysis and Experimentation Group (SAEG) School of Arts, Sciences and Humanities University of Sao Paulo</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Smartphones are prevalent today and store sensitive and private data. Malicious applications are constant threats to user data on smartphones as they could sniff or manipulate them by exploiting software weaknesses in legitimate mobile applications. Static analysis tools can be used to reduce these risks during development. However, it is important to know the capability of these tools in order to make informed decisions and avoid false-sense of security. In this preliminary study we investigate the detection capability of mainstream vs. Android-specific tools to guide decision-making during tools' selection.</p>
      </abstract>
      <kwd-group>
        <kwd>Security</kwd>
        <kwd>Android</kwd>
        <kwd>Static analysis tools</kwd>
        <kwd>Mobile</kwd>
        <kwd>CWE</kwd>
        <kwd>OWASP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Smartphone devices are very popular today. These devices aggregate personal data
related to our lifestyle, relationships, finances, professions, locations, recordings,
conversations, preferences, videos and photos [ ]. These are very sensitive and
private data. A breach as a result of vulnerabilities in the mobile software
could have devastating impact on the user. Malicious mobile applications could
sniff and manipulate sensitive user data [ ] or even launch a denial-of-service
attacks [ ]. Despite these challenges, developers often do not code with a mindset
Copyright ©
purposes.</p>
      <p>by the paper’s authors. Copying permitted for private and academic
of attackers because they care more about functionalities. As a result, common
and inadvertent mistakes become exploitable vulnerabilities [ ].</p>
      <p>Static analysis of the application’s source or object code has been advocated
as a strategy to detect weaknesses [ ] during implementation. The goal is to
detect part of the code that could become vulnerable. Static analysis tools (SATs)
are utilized to support developers to identify security risks in their code. The
goal of this research is to assess tools that detect security-related weaknesses in
Android applications. We choose Android because of its open platform and market
dominance. Data from the third quarter of show Android with . % of
marketshare followed by Apple’s iOS with . % and others (e.g., Windows phone,
Symbian) with . % [ ]. In addition, other smartphone platforms have similar
security model, however, Android is claimed to have the most sophisticated
application communication system [ ].</p>
      <p>In Android, user-installed applications are sandboxed, each runs in a dedicated
process, each has its own private data directory, and employs the least privilege
principle [ ]. Android defines four types of components: Activity (user interface),
Service that executes processes in the background, Content Provider for data
sharing, and Broadcast Receiver that responds asynchronously to
systemwide messages. Communication between applications are achieved through a
message passing mechanism (Intent messages). Configuration of application
components are done in the mandatory manifest file. In order to protect applications,
Android defines four types of permissions: Normal, Dangerous, Signature, and
SignatureOrSystem.</p>
      <p>Specific challenges in Androids make static analysis different from regular
Java applications [ ]. Android apps run in a special virtual machine named
Dalvik that generate bytecodes differently from regular Java virtual machine.
As a result, static analysis tools must be able to analyze the Dalvik bytecode
when Java source code is not provided. Further, Android apps could have many
entry (Main) points which make them different from regular Java applications.
Additionally, in Android apps, different components have their own lifecycle.
Because these lifecycle methods are not directly linked to the execution flow,
they limit the soundness of some analysis scenarios.</p>
      <p>Organizations develop both standard desktop and mobile applications, and
also manage them in a similar Software Configuration Management environment.
Moreover, in agile development and DevOp environments, tools are success factors
that ensure continuous deployment and fast delivery [ ]. The tendency is to run
one type of SAT across the code base during a build operation. In our experience,
a common question that practioners have asked us is whether mainstream SATs
are good enough for scanning mobile applications. We are thus interested to
compare non-specific and Android-specific SATs in their capability in terms of
strengths and limitations to detect relevant mobile-related weaknesses. This is
relevant to allow users make informed decisions about what tools to use, how to
use them, and what results to expect. Our mainstream tools are chosen from the
open source community based on availability and accessibility. In this preliminary
study, we concern ourselves with the scope of weaknesses that can be found
by Android-specific SATs and mainstream SATs. The following two research
questions summarize the problems we partly address in this paper:
RQ . What are the similarities and differences between mainstream SATs and</p>
      <p>Android SATs in the type of weaknesses they detect?
RQ . What are the runtime costs of executing SATs in mobile apps?</p>
      <p>We have used the combination of common weaknesses and enumeration (CWE)
dictionary by MITRE [ ] and OWASP top data for our assessment.</p>
      <p>The remainder of the paper is organized as follows: In Section , we discuss
the approach we have used in this study. Section presents our preliminary
results and provides some discussions of the results. In Section , we present an
overview of related studies. Section discusses the limitations and threat to the
validity of our work. Finally, we conclude the paper in Section .</p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <p>Based on OWASP top , the most common security risks in mobile
applications are: ( ) Improper platform use, ( ) Insecure data storage, ( ) Insecure
communications, ( ) Insecure authentication, ( ) Insuficient cryptography, ( )
Insecure authorization, ( ) Client code quality issues, ( ) Code tampering, ( )
Reverse engineering, and ( ) Extraneous functionality [ ]. Many empirical
studies have as well validated the existent of these risks in many real-world
Android applications. (see [ , , , ])</p>
      <p>In this preliminary assessment, we have used weaknesses [ ] categories to
assess the selected static analysis tools. Three categories are specific to Android
applications. The rest are general quality weaknesses applicable to all applications.
The rationale behind this choice is to investigate how the tools could detect
weaknesses in the different categories. Additionally, we mapped the selected
CWEs to the OWASP’s top security risk categories.</p>
      <p>CWE- : Use of Implicit Intent for Sensitive Communication
(# ) An implicit intent can be used to transmit data without specifying the
receiver. It is possible for any application to process the intent by using an Intent
Filter for the intent.</p>
      <p>CWE- : Improper Export of Android Application Components
(# ) Android application components (Activity, Service, or Content Provider)
are exported through the manifest file. Exporting components without proper
restriction as to which applications can launch or access the data could result
into integrity, confidentiality and availability issues.</p>
      <p>CWE- : Unencrypted Socket (# ) The study by Enck et al. [ ]
shows that certain Android applications include code that use the Socket class</p>
      <p>OWASP – Open Web Application Security Project (https://www.owasp.org).
directly. Java sockets are potential attack surface as they represent an open
interface to external services.</p>
      <p>CWE- : Storage of Sensititve Data in a Mechanism without
Access Control (# ) This weakness occurs when applications store sensitive
information in file systems or devices that are not protected. Examples include
memory cards or USB devices.</p>
      <p>CWE- : Exposure of Private Information (‘Privacy Violation’)
(# ) Accessing private data such as passwords or credit card numbers need
explicit authorization. Privacy violation could occur when unauthorized entities
have access to data.</p>
      <p>CWE- : Missing Default Case in Switch Statement (# ) This
weakness occurs when code that uses switch statement omit the default case.
Execution logic may be altered when the system encounters variable value not
handled in the logic. Security issues may happen, if switch logic is used to handle
security decision or is linked to other aspects of code where security decision
happens.</p>
      <p>CWE- : Improper Restriction of XML External Entity
Reference (’XXE’) (# ) Applications that process XML documents could be
vulnerable to XXE-attacks if proper validations and sanitations are not put in place.
An example is the CVE- - XML External Entity(XXE) attack in the
SAP Business One Android Application.</p>
      <p>Debug Mode Activated (DMA) (# ) There are cases where production
code is shipped with developer’s configuration. An example is when debug option
is enabled which can lead to disclosure of confidential and senstitive data.</p>
      <p>Selection of tools and applications
Our tool selection was guided by the tools’ availability and ease of use. Both
Emanuelsson and Nilsson [ ] and Hofer [ ] report on installation as a seemingly
important metric when choosing a static analysis tool. Practitioners can be wary
of tools that are very complicated to set up and use. As a result, the selected
tools are open-source or those available for use without cost and are also easy to
install and use.</p>
      <p>We selected FindBugs and FindSecBugs as mainstream tools as they are
widely available and used to assess code weaknesses at industrial settings. We
selected Android SATs that have pre-built libraries and can be easily configured
and executed. Table lists the Android SATs with their URLs. The techniques
utilized by the tools to scrutinize a mobile app are listed in column “Technique”.
The idea of selecting tools using different techniques is to assess their ability to
identify the CWEs related to OWASP’s top risk categories and also evaluate the
runtime costs of each technique.</p>
      <p>We choose open-source real mobile applications for assessment. In Table ,
we present the apps, a short description, the size of the object code, and the
volume of downloads. We selected apps from different domains (e.g., secure
communication, content management, graphics manipulation), and with fairly
large size ( . M to . M), to expose the tools to a variety of contexts. Moreover,
they are largely used apps: two of them have more than M, one more than K,
and two more than M downloads. Thus, they are real-world apps which are
being used by users.
We run each tool against the selected Android applications. The results of the
tools are generated in different formats. This presents enormous challenge for
tools’ comparison. In addition, there is no pre-CWE mappings for the
Androidspecific tools. As a result, we manually inspect the tools’ messages and map
them to an appropriate CWE wherever applicable. We did not check whether
the result is false positive or not in this study as we are concerned only with
the identification of weakness types identified by each tool. Lastly, we manually
search for the occurence of each weakness categories in the tools’ result for each
application.</p>
    </sec>
    <sec id="sec-3">
      <title>Preliminary Results and Discussion</title>
      <p>We summarise the initial results of our assessments in Table . The first column
describes the CWE that is investigated. The second column (merged) lists the
CWE
CWE- : Use of
Implicit Intent for
Sensitive Communication
CWE- : Improper
Export of Android
Application Components
CWE- :
Unencrypted Socket
CWE- : Storage of
Sensititve Data in a
Mechanism without
Access Control
CWE- : Exposure
of Private Information
(’Privacy Violation’)
CWE- : Missing
Default Case in Switch
Statement
CWE- : Improper
Restriction of XML
External Entity Reference
(’XXE’)
Debug Mode Activated
(DMA)
5
X
X
X
X
X
5
. CWE Detected by Tools</p>
      <p>Tools
FindSecBugs FindBugs AndroidLint Amandroid AndroBugs JAADS</p>
      <p>X 5 5 5 5 X
5
5
5
5
X
5
5</p>
      <p>X
5
5
5
5
5
5</p>
      <p>X
5
X
5
5
5
5</p>
      <p>X
5
X
5
5
5
X
5
5
X
5
5
5
X</p>
      <p>Apps
iFixit,
AntennaPod,
Conversations
AntennaPod,
iFixit, Zxing
iFixit
All apps
Conversations
Zxing
Keepassdroid,
Zxing
All apps</p>
      <p>FindSecBugs FindBugs AndroidLint Amandroid
min sec min sec min sec h min
min sec min sec min sec h min
min sec min sec min sec h min
min sec min sec min sec h min
min sec min sec min sec h min
min sec min sec min sec h min
min sec min sec min sec h min
tools and indicate whether the tool finds the stated CWE. The third column lists
the applications where the stated CWE is found. For example, only FindSecBugs
found CWE- in iFixit whereas none of the other tools found this weakness in
iFixit. In addition, the weakness was not spotted in the rest of the apps.</p>
      <p>From the results, we make the following observations: AndroBugs checks
intercomponent communication-based, configuration and deployment weaknesses.
JAADS checks inter-component, communication-based and configuration
weaknesses. Amandroid analyses inter-component communication; however, users have
to reason about the results. FindSecBugs is tailored for security audit in general
with limited extension to Android applications. For example, it does not analyse
the AndroidManifest.xml file. AndroidLint reports on many quality issues from
its report but miss many specific security issues.</p>
      <p>RQ . What are the similarities and differences between mainstream
SATs and Android SATs in the type of weaknesses they detect?</p>
      <p>In this preliminary study, we found FindSecBugs to cover a wide range of the
weakeness categories but missed the topmost important risk (CWE- ) and the
OWASP top # (Debug Mode Activated). Mainstream SATs are therefore useful
and necessary to uncover relevant mobile-specific weaknesses but they are not
sufficient. Furthermore, general quality issues can sometimes be very important
when they occur where security decision is being taken (e.g. Missing Default in
Switch). Android-specific tools could not detect the above weakness. In addition,
the Android SATs did not detect CWE- (Unencrypted Socket),
CWE(Exposure of Private Information), and CWE- (Improper Restriction of XML
External Entity Reference). Both OWASP top # and # are not detected by
any of the mainstream tools but are detected by some Android-specific tools. In
addition, to check the weakness categories requires at least combination of
tools from the mainstream and Android SATs, as a result, we conclude that one
tool is not enough to catch the whole range of weaknesses.</p>
      <p>Nevertheless, it would be possible for FindSecBugs and FindBugs to detect
some Android-specific CWEs if the manifest file were analyzed and patterns
particular to Android applications were supported. These relatively simple
modifications would have a beneficial impact on the development of more secure
Android mobile apps because FindSecBugs and FindBugs are widely known and
used at industrial settings.</p>
      <p>RQ .What are the costs of running SATs in mobile apps? Tools’
performance depends on the technique utilized. Taint analysis is more costly than
code scanning for bug patterns. The time to run the selected SATs are presented
in Table . We have used a computer running Ubuntu . LTS equipped with
Intel Core i - , GHz CPU, and . GBytes of RAM. All tools were run
three times and the average time are reported in Table . The data for Amandroid
represents the time to run the five different taint analysis provided by the tool.</p>
      <p>We report the user value of the Linux time command for all SATs, which
represents the user CPU time. The exception is AndroindLint for which we
used a stop watch. For AntennaPod (in row one of Table ), on the average,
FindSecBugs took five minutes and seconds, FindBugs took four minutes and
one second, AndroidLint, one minute and seven seconds, Amandroid, five hours,
minutes and seconds, AndroBugs, seconds, and JAADS, minutes
and seconds.</p>
      <p>The mainstream tools (FindBugs and FindSecBugs) and AndroidLint take at
most tens of minutes to analyze the code because they scan it for patterns of
possible vulnerabilities. AndroBugs scans the apk for particular patterns, but it
does not scan the whole code. As a result, it requires few seconds to obtain its
report. The most costly tools are those that utilize taint analysis. Amandroid
provides a thorough analysis, but it demands a high runtime cost to obtain the
data. JAADS taint analysis is much faster than Amandroid’s, but its report is
not as comprehensive.</p>
      <p>This preliminary data suggest that tools that scan the code for bug patterns
and perform light taint analysis can be utilized during development time. On the
other hand, thorough taint analysis is only fitting in a continuous integration
environment, especially, during overnight builds.</p>
    </sec>
    <sec id="sec-4">
      <title>Related work</title>
      <p>Empirical studies have been conducted to compare the strengths and shortcomings
of SATs [ , , , , ]. In general, they run SAT against a set of programs
with known vulnerabilities. Most of the studies assess performance such as the
precision, recall, true negative rate and accuracy of tools [ , ]; others assess
also the cost of running the tools, e.g. [ , ]. There are also efforts that have
quantitatively evaluated static analysis tools with regards to their performances
to detect security weaknesses in benchmark synthetic code. The Center for
Assured Software (CAS) [ ] developed a benchmark test cases with “good code”
and “flawed code” across different languages to evaluate the performance of
static analysis tools and assessed commercial tools. Goseva-Popstojanova and
Perhinschi [ ] investigated the capabilities of commercial tools. Their findings
show that the capability of the tools to detect vulnerabilities was close to or
worse than random guessing. Díaz and Bermejo [ ] compares the performance
of nine tools mostly commercial tools using the SAMATE security benchmark
test suites. They found an average recall of . and average precision of . .
They found also that the tools detected different kinds of weaknesses. Charest [ ]
compared tools against out of the CWEs in the SAMATE Juliet test case.
The best average performance in terms of recall is . for CWE with .
average precision. All these studies have used real or synthetic code with known
vulnerabilities to detect the performance of the tools. In this study, we have only
investigated whether the tools can detect certain weaknesses with mappings to
MITRE CWEs in the mobile apps.</p>
      <p>Android apps have been empirical studied [ , , , ] and various program
analysis techniques for security assessment in Android have been investigated [ ].
To the best of our knowledge, there are not studies that investigate similarities
and differences between mainstream and Android-specific SATs. We present the
first step of study to assess how mainstream vs. mobile-specific tools compare in
detecting top security risks in mobile apps. Currently, we are not focusing on
tools’ performance such as the recall or precision of tools but rather on whether
they are able to detect specific top risks vulnerabilities relevant to Android apps.
Additionally, we are interested in investigating their runtime costs.</p>
    </sec>
    <sec id="sec-5">
      <title>Limitations and Threats to Validity</title>
      <p>Our assessment could not cover the whole spectrum of CWEs that could map
to the OWASP top- . We have used a selected set of CWEs from MITRE
dictionary and map them to the OWASP top- lists. Possibilities exist that
other CWEs not in the list we assessed could map to any of the OWASP top- .</p>
      <p>This phase of our study did not focus on identifying false positives from the
results of the tools. In addition, information about performance metrics such as
recall or precision are not addressed in this study. In our future study, we plan
to identify real weaknesses and also seed artificial weaknesses in the apps to be
able to compute the performance metrics.</p>
      <p>We have performed only manual assessment of the tools. This limit the
precision and the scope of analysis we could perform. Our plan includes automatic
and statistic analysis in the next phase.</p>
      <p>The CWE we selected did not cover the entire spectrum of weaknesses relevant
for mobile applications beyond the OWASP top- . Our future work plans to
expand the scope of the CWE for our analysis.</p>
      <p>Finally, our preliminary result does not offer a strong conclusion regarding
any of the tools we have assessed. This is a limitation but also a cautious one
because we have not provided the actual performance of the tools but rather their
detection capabilities. However, the result does provide useful advice regarding
the possibilities of Android SATs and mainstream SATs for detecting weaknesses
in mobile applications.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>We report the initial assessment of the SATs capability to detect top security risks
in mobile applications. The verification of the CWEs detected by the tools were
carried out manually which constitutes a threat to the internal validity of the
results. Although we have selected apps with different characteristics, we caution
the reader not to expand the conclusions beyond the set of the selected apps.
In our future work, we plan to automate the collection and analysis of the data
from the apps to reduce the risks to internal and external validity. Additionally,
we intend to conduct statistical analysis of the results to support the conclusions.</p>
      <p>We presented the first step of a research on the capability of mainstream and
Android-specific static analysis tools to detect security weaknesses in mobile apps.
The results of a preliminary assessment of two mainstream tools (FindBugs and
FindSecBugs) and Android-specific tools (Amandroid, AndroBugs, AndroidLint,
and JAADS) are presented. These tools were run against real-world mobile
apps.</p>
      <p>In this preliminary study, we found that mainstream tools can cover a wide
range of the weakeness categories; however, important risks may go undetected if
the practitioner rely only on these tools. On the other hand, Android-specific
tools were able to detect top risk weakeness but also miss some general security
and quality issues. The runtime cost of the tools is dependent on the analysis
technique. As expected, data-flow based techniques (e.g., taint analysis) are
more costly than scanning for bug patterns. Our initial assessment indicates that
practitioner cannot prescind from the mainstream tools when developing mobile
apps. Nevertheless, she or he should consider adding Android-specific tools to
cover significant risk categories. In our future work, we aim to conduct a large
scale study of many Android applications and many static analysis tools. We are
also interested in assessing the quality of the tools’ results. For example, what
percentage of the detected OWASP Top risks are false positives.
Acknowledgments This research was carried out within the project “SoS-Agile:
Science of Security in Agile Software Development”, funded by the Research
Council of Norway, under the grant /O . Marcos L. Chaim’s was on a
research stay in Norway and was funded by a personal guest researcher scholarship
from the IKTPLUSS program.
. Avancini, A., Ceccato, M.: Security testing of the communication among android
applications. In: th International Workshop on Automation of Software Test
(AST). pp. – (May )
. Chan, P.P., Hui, L.C., Yiu, S.M.: Droidchecker: Analyzing android applications
for capability leak. In: Proceedings of the Fifth ACM Conference on Security and
Privacy in Wireless and Mobile Networks. pp. – . WISEC ’ , ACM, New
York, NY, USA ( ), http://doi.acm.org/ . / .
. Charest, N.R.T., Wu, Y.: Comparison of static analysis tools for java using the
juliet test suite. In: th International Conference on Cyber Warfare and Security.
pp. – ( )
. Chess, B., McGraw, G.: Static analysis for security. IEEE Security &amp; Privacy ( ),
– ( )
. Chin, E., Felt, A.P., Greenwood, K., Wagner, D.: Analyzing inter-application
communication in android. In: Proceedings of the th international conference on
Mobile systems, applications, and services. pp. – . ACM ( )
. Corporation, I.D.: Smartphone os market share, q ( ), http://www.idc.</p>
      <p>com/promo/smartphone-market-share/os, visited on June,
. Díaz, G., Bermejo, J.R.: Static analysis of source code security: Assessment of tools
against samate tests. Information and software technology ( ), – ( )
. Dybå, T., Dingsøyr, T.: Empirical studies of agile software development: A
systematic review. Information and software technology ( ), – ( )
. Elenkov, N.: Android security internals: An in-depth guide to Android’s security
architecture. No Starch Press ( )
. Emanuelsson, P., Nilsson, U.: A comparative study of industrial static analysis
tools. Electronic notes in theoretical computer science , – ( )</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>