Introduction

Comparing Capability of Static Analysis Tools to Detect Security Weaknesses in Mobile Applications

Tosin Daniel Oyetoyan

tosin.oyetoyan@sintef.no 0

Marcos Lordello Chaim

chaim@usp.br 1 0 Department of Software Engineering, Safety and Security SINTEF Digital , Trondheim Norway 1 Software Analysis and Experimentation Group (SAEG) School of Arts, Sciences and Humanities University of Sao Paulo

Smartphones are prevalent today and store sensitive and private data. Malicious applications are constant threats to user data on smartphones as they could sniff or manipulate them by exploiting software weaknesses in legitimate mobile applications. Static analysis tools can be used to reduce these risks during development. However, it is important to know the capability of these tools in order to make informed decisions and avoid false-sense of security. In this preliminary study we investigate the detection capability of mainstream vs. Android-specific tools to guide decision-making during tools' selection.

Security Android Static analysis tools Mobile CWE OWASP

Introduction

Smartphone devices are very popular today. These devices aggregate personal data related to our lifestyle, relationships, finances, professions, locations, recordings, conversations, preferences, videos and photos [ ]. These are very sensitive and private data. A breach as a result of vulnerabilities in the mobile software could have devastating impact on the user. Malicious mobile applications could sniff and manipulate sensitive user data [ ] or even launch a denial-of-service attacks [ ]. Despite these challenges, developers often do not code with a mindset Copyright © purposes.

by the paper’s authors. Copying permitted for private and academic of attackers because they care more about functionalities. As a result, common and inadvertent mistakes become exploitable vulnerabilities [ ].

Static analysis of the application’s source or object code has been advocated as a strategy to detect weaknesses [ ] during implementation. The goal is to detect part of the code that could become vulnerable. Static analysis tools (SATs) are utilized to support developers to identify security risks in their code. The goal of this research is to assess tools that detect security-related weaknesses in Android applications. We choose Android because of its open platform and market dominance. Data from the third quarter of show Android with . % of marketshare followed by Apple’s iOS with . % and others (e.g., Windows phone, Symbian) with . % [ ]. In addition, other smartphone platforms have similar security model, however, Android is claimed to have the most sophisticated application communication system [ ].

In Android, user-installed applications are sandboxed, each runs in a dedicated process, each has its own private data directory, and employs the least privilege principle [ ]. Android defines four types of components: Activity (user interface), Service that executes processes in the background, Content Provider for data sharing, and Broadcast Receiver that responds asynchronously to systemwide messages. Communication between applications are achieved through a message passing mechanism (Intent messages). Configuration of application components are done in the mandatory manifest file. In order to protect applications, Android defines four types of permissions: Normal, Dangerous, Signature, and SignatureOrSystem.

Specific challenges in Androids make static analysis different from regular Java applications [ ]. Android apps run in a special virtual machine named Dalvik that generate bytecodes differently from regular Java virtual machine. As a result, static analysis tools must be able to analyze the Dalvik bytecode when Java source code is not provided. Further, Android apps could have many entry (Main) points which make them different from regular Java applications. Additionally, in Android apps, different components have their own lifecycle. Because these lifecycle methods are not directly linked to the execution flow, they limit the soundness of some analysis scenarios.

Organizations develop both standard desktop and mobile applications, and also manage them in a similar Software Configuration Management environment. Moreover, in agile development and DevOp environments, tools are success factors that ensure continuous deployment and fast delivery [ ]. The tendency is to run one type of SAT across the code base during a build operation. In our experience, a common question that practioners have asked us is whether mainstream SATs are good enough for scanning mobile applications. We are thus interested to compare non-specific and Android-specific SATs in their capability in terms of strengths and limitations to detect relevant mobile-related weaknesses. This is relevant to allow users make informed decisions about what tools to use, how to use them, and what results to expect. Our mainstream tools are chosen from the open source community based on availability and accessibility. In this preliminary study, we concern ourselves with the scope of weaknesses that can be found by Android-specific SATs and mainstream SATs. The following two research questions summarize the problems we partly address in this paper: RQ . What are the similarities and differences between mainstream SATs and

Android SATs in the type of weaknesses they detect? RQ . What are the runtime costs of executing SATs in mobile apps?

We have used the combination of common weaknesses and enumeration (CWE) dictionary by MITRE [ ] and OWASP top data for our assessment.

The remainder of the paper is organized as follows: In Section , we discuss the approach we have used in this study. Section presents our preliminary results and provides some discussions of the results. In Section , we present an overview of related studies. Section discusses the limitations and threat to the validity of our work. Finally, we conclude the paper in Section .

Approach

Based on OWASP top , the most common security risks in mobile applications are: ( ) Improper platform use, ( ) Insecure data storage, ( ) Insecure communications, ( ) Insecure authentication, ( ) Insuficient cryptography, ( ) Insecure authorization, ( ) Client code quality issues, ( ) Code tampering, ( ) Reverse engineering, and ( ) Extraneous functionality [ ]. Many empirical studies have as well validated the existent of these risks in many real-world Android applications. (see [ , , , ])

In this preliminary assessment, we have used weaknesses [ ] categories to assess the selected static analysis tools. Three categories are specific to Android applications. The rest are general quality weaknesses applicable to all applications. The rationale behind this choice is to investigate how the tools could detect weaknesses in the different categories. Additionally, we mapped the selected CWEs to the OWASP’s top security risk categories.

CWE- : Use of Implicit Intent for Sensitive Communication (# ) An implicit intent can be used to transmit data without specifying the receiver. It is possible for any application to process the intent by using an Intent Filter for the intent.

CWE- : Improper Export of Android Application Components (# ) Android application components (Activity, Service, or Content Provider) are exported through the manifest file. Exporting components without proper restriction as to which applications can launch or access the data could result into integrity, confidentiality and availability issues.

CWE- : Unencrypted Socket (# ) The study by Enck et al. [ ] shows that certain Android applications include code that use the Socket class

OWASP – Open Web Application Security Project (https://www.owasp.org). directly. Java sockets are potential attack surface as they represent an open interface to external services.

CWE- : Storage of Sensititve Data in a Mechanism without Access Control (# ) This weakness occurs when applications store sensitive information in file systems or devices that are not protected. Examples include memory cards or USB devices.

CWE- : Exposure of Private Information (‘Privacy Violation’) (# ) Accessing private data such as passwords or credit card numbers need explicit authorization. Privacy violation could occur when unauthorized entities have access to data.

CWE- : Missing Default Case in Switch Statement (# ) This weakness occurs when code that uses switch statement omit the default case. Execution logic may be altered when the system encounters variable value not handled in the logic. Security issues may happen, if switch logic is used to handle security decision or is linked to other aspects of code where security decision happens.

CWE- : Improper Restriction of XML External Entity Reference (’XXE’) (# ) Applications that process XML documents could be vulnerable to XXE-attacks if proper validations and sanitations are not put in place. An example is the CVE- - XML External Entity(XXE) attack in the SAP Business One Android Application.

Debug Mode Activated (DMA) (# ) There are cases where production code is shipped with developer’s configuration. An example is when debug option is enabled which can lead to disclosure of confidential and senstitive data.

Selection of tools and applications Our tool selection was guided by the tools’ availability and ease of use. Both Emanuelsson and Nilsson [ ] and Hofer [ ] report on installation as a seemingly important metric when choosing a static analysis tool. Practitioners can be wary of tools that are very complicated to set up and use. As a result, the selected tools are open-source or those available for use without cost and are also easy to install and use.

We selected FindBugs and FindSecBugs as mainstream tools as they are widely available and used to assess code weaknesses at industrial settings. We selected Android SATs that have pre-built libraries and can be easily configured and executed. Table lists the Android SATs with their URLs. The techniques utilized by the tools to scrutinize a mobile app are listed in column “Technique”. The idea of selecting tools using different techniques is to assess their ability to identify the CWEs related to OWASP’s top risk categories and also evaluate the runtime costs of each technique.

We choose open-source real mobile applications for assessment. In Table , we present the apps, a short description, the size of the object code, and the volume of downloads. We selected apps from different domains (e.g., secure communication, content management, graphics manipulation), and with fairly large size ( . M to . M), to expose the tools to a variety of contexts. Moreover, they are largely used apps: two of them have more than M, one more than K, and two more than M downloads. Thus, they are real-world apps which are being used by users. We run each tool against the selected Android applications. The results of the tools are generated in different formats. This presents enormous challenge for tools’ comparison. In addition, there is no pre-CWE mappings for the Androidspecific tools. As a result, we manually inspect the tools’ messages and map them to an appropriate CWE wherever applicable. We did not check whether the result is false positive or not in this study as we are concerned only with the identification of weakness types identified by each tool. Lastly, we manually search for the occurence of each weakness categories in the tools’ result for each application.

Preliminary Results and Discussion

We summarise the initial results of our assessments in Table . The first column describes the CWE that is investigated. The second column (merged) lists the CWE CWE- : Use of Implicit Intent for Sensitive Communication CWE- : Improper Export of Android Application Components CWE- : Unencrypted Socket CWE- : Storage of Sensititve Data in a Mechanism without Access Control CWE- : Exposure of Private Information (’Privacy Violation’) CWE- : Missing Default Case in Switch Statement CWE- : Improper Restriction of XML External Entity Reference (’XXE’) Debug Mode Activated (DMA) 5 X X X X X 5 . CWE Detected by Tools

Tools FindSecBugs FindBugs AndroidLint Amandroid AndroBugs JAADS

X 5 5 5 5 X 5 5 5 5 X 5 5

X 5 5 5 5 5 5

X 5 X 5 5 5 5

X 5 X 5 5 5 X 5 5 X 5 5 5 X

Apps iFixit, AntennaPod, Conversations AntennaPod, iFixit, Zxing iFixit All apps Conversations Zxing Keepassdroid, Zxing All apps

FindSecBugs FindBugs AndroidLint Amandroid min sec min sec min sec h min min sec min sec min sec h min min sec min sec min sec h min min sec min sec min sec h min min sec min sec min sec h min min sec min sec min sec h min min sec min sec min sec h min tools and indicate whether the tool finds the stated CWE. The third column lists the applications where the stated CWE is found. For example, only FindSecBugs found CWE- in iFixit whereas none of the other tools found this weakness in iFixit. In addition, the weakness was not spotted in the rest of the apps.

From the results, we make the following observations: AndroBugs checks intercomponent communication-based, configuration and deployment weaknesses. JAADS checks inter-component, communication-based and configuration weaknesses. Amandroid analyses inter-component communication; however, users have to reason about the results. FindSecBugs is tailored for security audit in general with limited extension to Android applications. For example, it does not analyse the AndroidManifest.xml file. AndroidLint reports on many quality issues from its report but miss many specific security issues.

RQ . What are the similarities and differences between mainstream SATs and Android SATs in the type of weaknesses they detect?

In this preliminary study, we found FindSecBugs to cover a wide range of the weakeness categories but missed the topmost important risk (CWE- ) and the OWASP top # (Debug Mode Activated). Mainstream SATs are therefore useful and necessary to uncover relevant mobile-specific weaknesses but they are not sufficient. Furthermore, general quality issues can sometimes be very important when they occur where security decision is being taken (e.g. Missing Default in Switch). Android-specific tools could not detect the above weakness. In addition, the Android SATs did not detect CWE- (Unencrypted Socket), CWE(Exposure of Private Information), and CWE- (Improper Restriction of XML External Entity Reference). Both OWASP top # and # are not detected by any of the mainstream tools but are detected by some Android-specific tools. In addition, to check the weakness categories requires at least combination of tools from the mainstream and Android SATs, as a result, we conclude that one tool is not enough to catch the whole range of weaknesses.

Nevertheless, it would be possible for FindSecBugs and FindBugs to detect some Android-specific CWEs if the manifest file were analyzed and patterns particular to Android applications were supported. These relatively simple modifications would have a beneficial impact on the development of more secure Android mobile apps because FindSecBugs and FindBugs are widely known and used at industrial settings.

RQ .What are the costs of running SATs in mobile apps? Tools’ performance depends on the technique utilized. Taint analysis is more costly than code scanning for bug patterns. The time to run the selected SATs are presented in Table . We have used a computer running Ubuntu . LTS equipped with Intel Core i - , GHz CPU, and . GBytes of RAM. All tools were run three times and the average time are reported in Table . The data for Amandroid represents the time to run the five different taint analysis provided by the tool.

We report the user value of the Linux time command for all SATs, which represents the user CPU time. The exception is AndroindLint for which we used a stop watch. For AntennaPod (in row one of Table ), on the average, FindSecBugs took five minutes and seconds, FindBugs took four minutes and one second, AndroidLint, one minute and seven seconds, Amandroid, five hours, minutes and seconds, AndroBugs, seconds, and JAADS, minutes and seconds.

The mainstream tools (FindBugs and FindSecBugs) and AndroidLint take at most tens of minutes to analyze the code because they scan it for patterns of possible vulnerabilities. AndroBugs scans the apk for particular patterns, but it does not scan the whole code. As a result, it requires few seconds to obtain its report. The most costly tools are those that utilize taint analysis. Amandroid provides a thorough analysis, but it demands a high runtime cost to obtain the data. JAADS taint analysis is much faster than Amandroid’s, but its report is not as comprehensive.

This preliminary data suggest that tools that scan the code for bug patterns and perform light taint analysis can be utilized during development time. On the other hand, thorough taint analysis is only fitting in a continuous integration environment, especially, during overnight builds.

Related work

Empirical studies have been conducted to compare the strengths and shortcomings of SATs [ , , , , ]. In general, they run SAT against a set of programs with known vulnerabilities. Most of the studies assess performance such as the precision, recall, true negative rate and accuracy of tools [ , ]; others assess also the cost of running the tools, e.g. [ , ]. There are also efforts that have quantitatively evaluated static analysis tools with regards to their performances to detect security weaknesses in benchmark synthetic code. The Center for Assured Software (CAS) [ ] developed a benchmark test cases with “good code” and “flawed code” across different languages to evaluate the performance of static analysis tools and assessed commercial tools. Goseva-Popstojanova and Perhinschi [ ] investigated the capabilities of commercial tools. Their findings show that the capability of the tools to detect vulnerabilities was close to or worse than random guessing. Díaz and Bermejo [ ] compares the performance of nine tools mostly commercial tools using the SAMATE security benchmark test suites. They found an average recall of . and average precision of . . They found also that the tools detected different kinds of weaknesses. Charest [ ] compared tools against out of the CWEs in the SAMATE Juliet test case. The best average performance in terms of recall is . for CWE with . average precision. All these studies have used real or synthetic code with known vulnerabilities to detect the performance of the tools. In this study, we have only investigated whether the tools can detect certain weaknesses with mappings to MITRE CWEs in the mobile apps.

Android apps have been empirical studied [ , , , ] and various program analysis techniques for security assessment in Android have been investigated [ ]. To the best of our knowledge, there are not studies that investigate similarities and differences between mainstream and Android-specific SATs. We present the first step of study to assess how mainstream vs. mobile-specific tools compare in detecting top security risks in mobile apps. Currently, we are not focusing on tools’ performance such as the recall or precision of tools but rather on whether they are able to detect specific top risks vulnerabilities relevant to Android apps. Additionally, we are interested in investigating their runtime costs.

Limitations and Threats to Validity

Our assessment could not cover the whole spectrum of CWEs that could map to the OWASP top- . We have used a selected set of CWEs from MITRE dictionary and map them to the OWASP top- lists. Possibilities exist that other CWEs not in the list we assessed could map to any of the OWASP top- .

This phase of our study did not focus on identifying false positives from the results of the tools. In addition, information about performance metrics such as recall or precision are not addressed in this study. In our future study, we plan to identify real weaknesses and also seed artificial weaknesses in the apps to be able to compute the performance metrics.

We have performed only manual assessment of the tools. This limit the precision and the scope of analysis we could perform. Our plan includes automatic and statistic analysis in the next phase.

The CWE we selected did not cover the entire spectrum of weaknesses relevant for mobile applications beyond the OWASP top- . Our future work plans to expand the scope of the CWE for our analysis.

Finally, our preliminary result does not offer a strong conclusion regarding any of the tools we have assessed. This is a limitation but also a cautious one because we have not provided the actual performance of the tools but rather their detection capabilities. However, the result does provide useful advice regarding the possibilities of Android SATs and mainstream SATs for detecting weaknesses in mobile applications.

Conclusions and Future Work

We report the initial assessment of the SATs capability to detect top security risks in mobile applications. The verification of the CWEs detected by the tools were carried out manually which constitutes a threat to the internal validity of the results. Although we have selected apps with different characteristics, we caution the reader not to expand the conclusions beyond the set of the selected apps. In our future work, we plan to automate the collection and analysis of the data from the apps to reduce the risks to internal and external validity. Additionally, we intend to conduct statistical analysis of the results to support the conclusions.

We presented the first step of a research on the capability of mainstream and Android-specific static analysis tools to detect security weaknesses in mobile apps. The results of a preliminary assessment of two mainstream tools (FindBugs and FindSecBugs) and Android-specific tools (Amandroid, AndroBugs, AndroidLint, and JAADS) are presented. These tools were run against real-world mobile apps.

In this preliminary study, we found that mainstream tools can cover a wide range of the weakeness categories; however, important risks may go undetected if the practitioner rely only on these tools. On the other hand, Android-specific tools were able to detect top risk weakeness but also miss some general security and quality issues. The runtime cost of the tools is dependent on the analysis technique. As expected, data-flow based techniques (e.g., taint analysis) are more costly than scanning for bug patterns. Our initial assessment indicates that practitioner cannot prescind from the mainstream tools when developing mobile apps. Nevertheless, she or he should consider adding Android-specific tools to cover significant risk categories. In our future work, we aim to conduct a large scale study of many Android applications and many static analysis tools. We are also interested in assessing the quality of the tools’ results. For example, what percentage of the detected OWASP Top risks are false positives. Acknowledgments This research was carried out within the project “SoS-Agile: Science of Security in Agile Software Development”, funded by the Research Council of Norway, under the grant /O . Marcos L. Chaim’s was on a research stay in Norway and was funded by a personal guest researcher scholarship from the IKTPLUSS program. . Avancini, A., Ceccato, M.: Security testing of the communication among android applications. In: th International Workshop on Automation of Software Test (AST). pp. – (May ) . Chan, P.P., Hui, L.C., Yiu, S.M.: Droidchecker: Analyzing android applications for capability leak. In: Proceedings of the Fifth ACM Conference on Security and Privacy in Wireless and Mobile Networks. pp. – . WISEC ’ , ACM, New York, NY, USA ( ), http://doi.acm.org/ . / . . Charest, N.R.T., Wu, Y.: Comparison of static analysis tools for java using the juliet test suite. In: th International Conference on Cyber Warfare and Security. pp. – ( ) . Chess, B., McGraw, G.: Static analysis for security. IEEE Security & Privacy ( ), – ( ) . Chin, E., Felt, A.P., Greenwood, K., Wagner, D.: Analyzing inter-application communication in android. In: Proceedings of the th international conference on Mobile systems, applications, and services. pp. – . ACM ( ) . Corporation, I.D.: Smartphone os market share, q ( ), http://www.idc.

com/promo/smartphone-market-share/os, visited on June, . Díaz, G., Bermejo, J.R.: Static analysis of source code security: Assessment of tools against samate tests. Information and software technology ( ), – ( ) . Dybå, T., Dingsøyr, T.: Empirical studies of agile software development: A systematic review. Information and software technology ( ), – ( ) . Elenkov, N.: Android security internals: An in-depth guide to Android’s security architecture. No Starch Press ( ) . Emanuelsson, P., Nilsson, U.: A comparative study of industrial static analysis tools. Electronic notes in theoretical computer science , – ( )