Reverse Engineering Android Apps With CodeInspect Siegfried Rasthofer1 , Steven Arzt1 , Marc Miltenberger1 , and Eric Bodden2 1 Fraunhofer SIT & TU Darmstadt, Darmstadt, Germany 2 Paderborn University & Fraunhofer IEM, Paderborn, Germany Abstract is crucial for mass analysis, these tools face challenges for highly obfuscated state-of-the-art malware and is While the Android operating system is popu- usually completely ineffective for novel or targeted at- lar among users, it has also attracted a broad tacks. In these cases, to understand the behavior of a variety of miscreants and malware. New sam- given sample the analyst must resort to manual labor. ples are discovered every day. Purely auto- Furthermore, she usually needs to gather additional matic analysis is often not enough for un- information such as potential hints on the miscreants derstanding current state-of-the-art Android behind the malware. Remote URLs, telephone num- malware, though. Miscreants obfuscate and bers, e-mail addresses, or even coding patterns can give encrypt their code, or hide secrets in native valuable insights to defenders and prosecutors alike. code. Precisely identifying the malware’s be- Though approaches exist to extract information from havior and finding information about its po- apps automatically [11, 10, 14, 7], gaining a complete tential authors requires tools that assist hu- understanding of a malware sample usually requires man experts in a manual investigation. In this manual inspection. paper, we present CodeInspect, a novel reverse engineering tool for Android app that opti- With today’s numbers of new samples arriving ev- mally supports investigators and analysts in ery day, it has become of utmost importance to make that task. manual investigations as efficient as possible. The analysis tool should thus support the human expert to reduce the mechanical parts of the investigation, allow- 1 Introduction ing the human to focus on understanding the threat. Mobile devices such as smartphones and tablets are In this paper, we present CodeInspect, a novel reverse- increasingly used in everyday life and have long since engineering tool for Android applications. CodeIn- become essential tools. This success is primarily due spect features an expressive intermediate language to the availability of apps for almost every need. While with type information for local variables, an interac- this abundance is helpful for users, it also attracts mis- tive debugger, and various Android-specific analyses creants. Stealing sensitive user information or directly such as data-flow tracking and permissions-usage scan- incurring charges on them is a profitable, albeit illegal ning. We show how CodeInspect can be used to analyze business model. As Android has the largest market a complex real-world malware sample [13] within less share among mobile operating systems [8], most mal- than one hour. ware is developed for Android as well. The rate with The remainder of this paper is structured as follows. which new malware appears in the wild increases by In Section 2, we introduce the malware that will serve the year [12]. as a running example in this paper. Afterwards, we Many approaches for automatically detecting An- give an overview over CodeInspect in Section 3. In Sec- droid malware have been proposed in the academic tion 4, we show how we used CodeInspect to reverse en- literature [1, 6, 5] and implemented into practical tools gineer the malware, before we conclude in Section 5. such as Drebin [1] or Chabada [6]. While automation 2 Android/BadAccent Malware Copyright c by the paper’s authors. Copying permitted for private and academic purposes. This volume is published and The Android/BadAccents malware family was discov- copyrighted by its editors. ered in 2015 [13] and is a banking trojan that uses In: D. Aspinall, L. Cavallaro, M. N. Seghir, M. Volkamer (eds.): Proceedings of the Workshop on Innovations in Mobile Privacy different obfuscation techniques. Its design is modu- and Security IMPS at ESSoS’16, London, UK, 06-April-2016, lar with several features such as SMS stealing, social published at http://ceur-ws.org engineering, and uninstalling AV apps. Furthermore, 1 1 the malware is designed to evade automatic detection in detail. Bundled third party libraries such as adver- approaches. To completely understand the behavior of tisement libraries are usually considered safe, although the malware, a manual investigation is necessary. Out they might pose a security or privacy risk to the user of of the various components, we focus on the SMS In- the application. The library code is often not available terception component, which intercepts incoming SMS and, thus, cannot be checked by the app developers. messages and forwards them to the attacker. This CodeInspect, however, enables developers to validate is done in an attempt to obtain mobile transaction the behavior of the compiled application including the numbers (mTAN), which can then be used to conduct actions performed by the libraries. Similar challenges fraudulent transaction at the user’s expense. For the arise when outsourcing app development to third par- investigator, it is important to understand where the ties that only deliver the binaries of the developed app, stolen information is sent as any target address may but not the source code. In that case, the purchasing give clues on the identity of the miscreant running the company also need powerful analysis tools to look into scam. From a previous investigation, we knew that the the delivered black box. Otherwise, that black box de- malware sends some information via e-mail. It was, velopment could contain serious security flaws or even however, unclear where the e-mails were sent to, and malicious code that goes undetected. whether additional channels existed. Therefore, find- ing the target mail address and possible other channels 3.1 Jimple Intermediate Representation was the focus of the manual investigation at hand. CodeInspect relies on the Soot framework for program 3 CodeInspect Overview analysis and transformation [9]. The Soot framework takes an Android application as input and converts it As seen in the introduction, some situations require into a human readable type-based intermediate repre- binary software to be inspected manually. Analysts sentation called Jimple [15]. From now on, all code can use existing command-line tools such as APKtool analyses and transformations are performed on this to decompile the binary APK file into readable text intermediate representation rather than the original source. This tool, however, creates smali code. Smali bytecode. Soot also offers the possibility to convert is an untyped assembly language, leaving the analyst the (potentially modified) Jimple code back into an with the challenging task as making sense to regis- Android binary. CodeInspect inherits this feature and ters operations and explicit reference management on allows the analyst to modify the app, for instance to the heap. Filling data structures, for instance, is a remove emulator checks or other challenges to the anal- complex set of heap navigation instructions in smali. ysis. The human expert can also refactor the app to Furthermore, disassembly files only give a static look integrate conclusions she has already drawn about the on the malware. They do not easily allow for runtime app, e.g., by renaming methods to what their actual inspection. task is instead of some obfuscated name. The analyst A powerful IDE such as CodeInspect, on the other can also merge additional Jimple or Java code into the hand, is much more convenient to use. The tool is app. With this feature, she can, for instance, imple- based on the Eclipse [4] IDE, so that developers usu- ment a decryption method for some obfuscated strings ally have an intuition on how to use it. CodeInspect in Java and use them during a dynamic analysis to bet- converts the APK file into a typed, higher-level in- ter understand the original data processed inside the termediate representation that is much more conve- app. CodeInspect automatically merges the original nient to read than smali. The code editor provides app code and the new additions at compile time. the analyst with syntax highlighting and navigation Although actual Java source code might be even capabilities that allow the analyst to e.g., jump to the easier to understand than Jimple, it is not always pos- definition of a symbol of interest. The CodeInspect sible to decompile an app’s bytecode back into valid IDE allows the analyst to work with the decompiled Java code. The Dalvik bytecode language that An- code on a semantic, rather than just a textual level. droid uses allows for constructs that have no equiva- If the analyst, e.g., searches for a specific method, she lent in Java, such as unconditional nested jumps (goto will only find occurrences of that method name, not instructions). Existing obfuscators [3] allow to easily arbitrary strings that happen to have the same name. transform an app into such a non-reversible form. As CodeInspect can import APK file either directly if they long as the app still contains valid bytecode (i.e., runs are available on the analyst’s machine, or it can load on the device), it can, however, be represented in Jim- them from a real tablet or phone on which the respec- ple. This makes Jimple the ideal middle ground be- tive apps are installed. tween bytecode and Java source code. It also makes Besides reverse engineering for malware analysis, sure that CodeInspect can re-compile every app (with CodeInspect can also be used to analyze benign apps potential changes from the analyst) and inspect its be- 2 2 The first part contains the variable declarations. The second part contains the actual Jimple instructions. In this example, we see a read access to a field (urlServer) in the first line and a method call (DownloadFile) in the second line. In total, this example loads a file from a server on the web and specifies the user name and password required to access the file. 3.2 Project Explorer In the normal Eclipse project explorer, CodeInspect lists all parts of the decompiled Android app. This includes not only the code, but also the manifest xml file (in human-readable form), the assets bundled with the app, the native libraries, and the layout XML files. The user is free to inspect and modify all of these Figure 1: Jimple Search files. For opening the manifest or the layout XML files, CodeInspect contains the Android ADT compo- nents for Eclipse. A layout file will thus be shown in graphical UI editor in addition to the plain XML rep- resentation. All the different editors are linked; if the user clicks on a class name in one editor (e.g., the man- ifest file), she can directly jump to the respective code in the Jimple code editor. These links are also auto- matically updated when the code is changed. If the analyst, for instance, renames an activity, the respec- tive manifest entry will also be adapted automatically. Figure 2: Jimple Code Snippet 3.3 Debugging havior dynamically. For those cases in which a Java de- To debug the application, it is not required to root compilation is possible, CodeInspect integrates an ex- the phone. Debugging may be performed on an emu- isting state-of-the-art Java decompiler in a best-effort lator or a real device; the only requirement is that the approach. Developer mode is activated on the device. The Auto CodeInspect’s Jimple editor behaves similarly to a Stepper view automates stepping through the appli- normal Eclipse code editor for Java. The user may cation under analysis. If activated, it steps through modify the code as she wishes. When she saves her the code in a predefined frequency. Of course, it is changes, the code is automatically recompiled. As a also possible to manually set breakpoints, jump into result, she receives a modified application as a new or over method calls, jump back from a method to APK file, which behaves like the original application, its caller, or drop the execution pointer to the current but with the changed code. stack frame. CodeInspect’s debugger is as powerful as the original Android debugger that app developers use In the same way, due to Soot’s class file feature, a on their source code. main method can be written in Java, which calls the aforementioned ”decryption” method from Java code, which runs on the JVM on the computer. In case the 3.4 Android-Specific Analyses app loads a dex file at runtime, one can also inspect CodeInspect extends the normal Eclipse IDE with vari- its code and debug it via the Dex file merge feature. ous Android-specific analyses that give a human expert It merges the code from an available dex file and adds more information on the current app. These analyses it to an existing project. are implemented as additional views. The Permis- The user also may search for specific fields, meth- sion Usage View (Figure 3) lists all the permissions ods, classes as well as their usages using the Jimple requested by the app. For every permission, it also re- search, as shown in figure 1. ports all locations in the code where the respective Figure 2 shows the Jimple representation of a permission is required. More precisely, CodeInspect method taken from a real-world malware app. The identifies all API calls that would fail if the respec- body a Jimple method can be divided into two parts. tive permission were not present. This information 3 3 Figure 3: Permission Usage View Figure 5: Call Hierarchy View sponsible for the information flow and shows all rele- vant statements in the Calltrace View. With a click, the user can jump from the flow results directly into the source code. To fully understand an Android app, it is often im- portant to know certain runtime values. If the ana- lyst has found out that a malware application leaks data via SMS, she must then find the target tele- phone number. For communication with a remote Figure 4: Type Hierarchy View command&control server, she is interested in the URL can give the analyst a first hint to potentially mali- of that server. Such values are, however, often not cious behavior in the app, for instance in the case of available in the app as plain text, but are obfuscated. SMS trojans. Their final value only gets decrypted or computed at Furthermore, the normal Java-based code analy- runtime. Manually undoing obfuscations is a cum- ses known from Eclipse are also available in CodeIn- bersome and inefficient task. We therefore provide spect. Similar to the Java Eclipse IDE Plug-In (JDT), Harvester [11], an approach that fully automatically CodeInspect contains a type hierarchy view, which can extracts such runtime values from Android applica- be used to examine inheritance and interface imple- tions. It can also be used for e.g., deobfuscating re- mentation relationships. Figure 4 shows the type hier- flective method calls by first finding the correct tar- archy of the Service class, thus showing all Services of get method signature and then replacing the reflective an application. Besides, we have implemented a call call with a direct one. Harvester is well-integrated into hierarchy view. Figure 5 shows all potential callers of CodeInspect. The user can select a code positions (e.g. method getSDPath1. This might give clues on how the method arguments) from which she wants to extract respective method is used. In case of the getSDPath1 runtime values. Harvester will then fully automati- method, the view shows that this method gets called cally extract possible runtime values for this particular from deleteFoder1, hinting that the app is trying to delete set of variables. Similarly, she can select the reflective a folder. method calls she wants to simplify. Harvester then looks for the receiver method and injects a direct call. CodeInspect’s plugin interface is also open to other 3.5 Plugins developers who want to extend the tool with additional As CodeInspect has been developed at an academic in- functionality. As CodeInspect is based on Eclipse, nor- stitution, we have also compiled our research results mal Eclipse plugins can be used to provide new fea- into CodeInspect plugins. FlowDroid [2] is a popu- tures such as support for more version control systems lar static information flow tracker for Java and An- or specific file type editors. droid, which can be used to detect unwanted or po- tentially dangerous data flows. FlowDroid is tightly integrated into CodeInspect through CodeInspect’s ex- tensible plug-in interface. This allows the analyst to configure and conduct data flow analyses easily and efficiently. The results are graphically displayed inside CodeInspect as shown in Figure 6. The user can select different data flows, represented by data sources and sinks. After selecting a particular data flow, the plug- in highlights the corresponding statements that are re- 4 4 Figure 6: FlowDroid Plug-In 4 Reverse Engineering Bad/Accents has an onReceive() method that gets executed whenever Malware With CodeInspect the device receives a new SMS message. Note that the receiver also reacts on other actions besides the In this Section, we explain how CodeInspect can be SMS_RECEIVED, but these actions are less important. The used in a malware investigation. Such a manual re- main activity and the receiver class are two interesting verse engineering task is usually important if auto- parts, which needs to be analyzed in more detail. mated approaches, such as a behavior analysis in a Besides the manifest, one can also use CodeInspect’s sandbox [10, 14], provide no or only little evidence on search function. Since we are looking for user e-mail questions such as which data is stolen?, or where is credentials, a simple lookup for “password” or “user” the data sent to?. In such cases, a manual inspection might return interesting code positions. Indeed, An- is usually the only solution. We take the malware in- droid/BadAccents contains a few code statements that vestigation [13] of the Android/BadAccents malware contain the two search words. The most interesting as an example to demonstrate the need for manual re- code location is the one shown in the lower part of verse engineering. An automated pre-analysis of this Figure 7. It contains two api calls stringUser() and malware family hinted at a banking trojan, but it was stringPassword() which are call native methods. These not clear to the malware analyst where the stolen data methods take no parameters and return strings, i.e., from the SMS intercepting component of the malware are simple native getters. The code section was exe- was sent to. In the following we will explain in detail cuted in the onCreate() method of the main activity and how one could use CodeInspect during investigations the api calls were directly executed without any special such as the one on Android/BadAccents. The goal is triggering. to answer detailed questions such as the receivers of Now, one could either reverse engineer the native stolen data on potentially obfuscated code. implementation of the two api calls or proceed with a dynamic analysis. We could have extracted the native 4.1 APK Overview code into a new app and called the two methods there to find the returned string values. Debugging the orig- The Android manifest provides a good starting point inal malware app, however, required even less effort as in a manual malware investigation since it contains in- we explain in the next section. formation about the different components of the app along with additional meta-information. In the con- text of the Android/BadAccents malware, one can identify that com.shit.MainActivity is the class of the main activity, which gets called first when the user opens an application. Furthermore, the manifest also contains a broadcast receiver called com.a.a.AR which, in turn, 5 5 Figure 9: Call Hierarchy View for the MailSend Method Figure 7: Access to Email Credentials in the Main Activity Figure 10: Sending an SMS Message to the Emulator via CodeInspect Figure 8: Runtime Values of Variables “+861111” and text “Hello World!”) to the emulator as shown in Figure 10. After single stepping through 4.2 Detailed Manual Analysis the Jimple code, we hit an interesting Code section (see Figure 11), which checks whether the incoming One of the most powerful features of CodeInspect is number starts with “+86” or “+82” which shows that the interactive debugger. It allows a human analyst the malware is expecting SMS messages from China or to peform single-step debugging on the Jimple code. South Korea. This is an interesting result from the in- Whenever the execution is halted, she can examine vestigation which shows that the malware is especially the current contents of all variables in scope in the live targeting users from China or South Korea. variables view. This was very useful for extracting the Further stepping through the code leads to another concrete username and password which are returned interesting code section as shown in Figure 12. Here from the two native API calls. A breakpoint in line 264 we can see that the malware expects a certain SMS in Figure 7 stops the program at that point and gives text “ak40 1”. The purpose of this special command the analyst the possibility to view the runtime values is to activate and deactive stealing the incoming SMS of the variable $String and $String2. These two values messages. After sending the “ak40 1” command to the are temporarily stored on the file system and are later emulator, all further incoming SMS messages are in- used for sending the stolen data to the attacker’s email tercepted and sent to the attacker. Figure 13 shows account via the SMTP protocol. This answers the first the usage of the “MailSend” method which leaks the question in our investigation. We have information incoming SMS message to the attacker with the cre- about email credentials, which were hidden in native code. However, we do not have a proof that these credentials are actually used for authentification. A quick search for “mail” in the Jimple code shows that the application contains a method called “MailSend” which sounds like the method which is re- ponsible for sending emails to the attacker. We use the “Open Call Hierarchy” feature of CodeInspect (see Fig- ure 9) and discover that the “MailSend” method gets triggered by the “onReceive” method which gets exe- cuted once the application receives an SMS (see Sec- tion 4.1). This shows that SMS data is indeed stolen and leaked via e-mail. As a next step, we send an SMS message (number Figure 11: Incoming SMS Number Check 6 6 References [1] Daniel Arp, Michael Spreitzenbarth, Malte Hub- ner, Hugo Gascon, and Konrad Rieck. Drebin: Ef- fective and explainable detection of android mal- ware in your pocket. In NDSS. The Internet So- ciety, 2014. [2] Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick Mc- Daniel. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In Proceedings of the 35th ACM SIGPLAN conference on Programming language design and implementation (PLDI). ACM, June 2014. Figure 12: Activation Command for Stealing Incoming [3] Michael Batchelder and Laurie J. Hendren. Ob- SMS Messages fuscating java: The most pain for the least gain. In Shriram Krishnamurthi and Martin Odersky, editors, CC, volume 4420 of Lecture Notes in Computer Science, pages 96–110. Springer, 2007. [4] W. Beaton and J. d. Rivieres. Eclipse platform technical overview. Technical report, The Eclipse Foundation, 2006. [5] Saurabh Chakradeo, Bradley Reaves, Patrick Traynor, and William Enck. Mast: Triage for market-scale mobile malware analysis. In Pro- ceedings of the Sixth ACM Conference on Security and Privacy in Wireless and Mobile Networks, WiSec ’13, pages 13–24, New York, NY, USA, 2013. ACM. Figure 13: Variables View of the Debugger at the [6] Alessandra Gorla, Ilaria Tavecchia, Florian Gross, MailSend API Call and Andreas Zeller. Checking app behavior against app descriptions. In ICSE’14: Proceedings dentials that we have previously identified. This con- of the 36th International Conference on Software cludes the investigation on the malware’s e-mail inter- Engineering, 2014. face. We have all information which were necessary to proof that the Android/BadAccents steals incom- [7] Johannes Hoffmann, Martin Ussath, Thorsten ing SMS messages and leaks them to the attacker via Holz, and Michael Spreitzenbarth. Slicing droids: email. Program slicing for smali code. In Proceedings of the 28th Annual ACM Symposium on Ap- plied Computing, SAC ’13, pages 1844–1851, New 5 Conclusion York, NY, USA, 2013. ACM. In this paper, we have presented CodeInspect, a novel [8] Kantar. Android returns to growth in europe’s tool for manually reverse engineering malicious An- big five markets. whitepaper, 2015. droid apps. The tool supports the human expert with an expressive, typed intermediate representation, [9] Patrick Lam, Eric Bodden, Ondrej Lhotak, and an interactive debugger, and various Android-specific Laurie Hendren. The soot framework for java pro- analyses. It greatly reduces the effort of the inves- gram analysis: a retrospective. In Cetus Users tigation. As future work, we plan to integrate more and Compiler Infastructure Workshop (CETUS analysis techniques and views into the tool. 2011), Oktober 2011. 7 7 [10] Martina Lindorfer, Matthias Neugschwandner, [13] Stephan Huber Siegfried Rasthofer, Irfan Asrar Lukas Weichselbaum, Yanick Fratantonio, Vic- and Eric Bodden. How current android malware tor van der Veen, and Christian Platzer. An- seeks to evade automated code analysis. In 9th drubis - 1,000,000 Apps Later: A View on Cur- International Conference on Information Security rent Android Malware Behaviors. In Proceed- Theory and Practice (WISTP’2015), 2015. ings of the International Workshop on Build- ing Analysis Datasets and Gathering Experi- [14] Michael Spreitzenbarth, Felix Freiling, Florian ence Returns for Security (BADGERS), Wroclaw, Echtler, Thomas Schreck, and Johannes Hoff- Poland, September 2014. mann. Mobile-sandbox: Having a deeper look into android applications. In Proceedings of the 28th [11] Siegfried Rasthofer, Steven Arzt, Marc Mil- Annual ACM Symposium on Applied Computing, tenberger, and Eric Bodden. Harvesting run- SAC ’13, pages 1808–1815, New York, NY, USA, time values in android applications that feature 2013. ACM. anti-analysis techniques. In 2016 Network and [15] Raja Vallee-Rai and Laurie J. Hendren. Jimple: Distributed System Security Symposium (NDSS), Simplifying java bytecode for analyses and trans- 2016. formations, 1998. [12] Pulse Secure. Mobile threat report 2015. whitepa- per, 2015. 8 8