=Paper= {{Paper |id=Vol-1575/invited_paper_2 |storemode=property |title=Reverse Engineering Android Apps With CodeInspect (invited paper) |pdfUrl=https://ceur-ws.org/Vol-1575/invited_paper_2.pdf |volume=Vol-1575 |authors=Siegfried Rasthofer,Steven Arzt,Marc Miltenberger,Eric Bodden |dblpUrl=https://dblp.org/rec/conf/essos/RasthoferAMB16 }} ==Reverse Engineering Android Apps With CodeInspect (invited paper)== https://ceur-ws.org/Vol-1575/invited_paper_2.pdf
    Reverse Engineering Android Apps With CodeInspect

              Siegfried Rasthofer1 , Steven Arzt1 , Marc Miltenberger1 , and Eric Bodden2
                            Fraunhofer SIT & TU Darmstadt, Darmstadt, Germany
                         Paderborn University & Fraunhofer IEM, Paderborn, Germany

                          Abstract                                     is crucial for mass analysis, these tools face challenges
                                                                       for highly obfuscated state-of-the-art malware and is
    While the Android operating system is popu-
                                                                       usually completely ineffective for novel or targeted at-
    lar among users, it has also attracted a broad
                                                                       tacks. In these cases, to understand the behavior of a
    variety of miscreants and malware. New sam-
                                                                       given sample the analyst must resort to manual labor.
    ples are discovered every day. Purely auto-
                                                                       Furthermore, she usually needs to gather additional
    matic analysis is often not enough for un-
                                                                       information such as potential hints on the miscreants
    derstanding current state-of-the-art Android
                                                                       behind the malware. Remote URLs, telephone num-
    malware, though. Miscreants obfuscate and
                                                                       bers, e-mail addresses, or even coding patterns can give
    encrypt their code, or hide secrets in native
                                                                       valuable insights to defenders and prosecutors alike.
    code. Precisely identifying the malware’s be-
                                                                       Though approaches exist to extract information from
    havior and finding information about its po-
                                                                       apps automatically [11, 10, 14, 7], gaining a complete
    tential authors requires tools that assist hu-
                                                                       understanding of a malware sample usually requires
    man experts in a manual investigation. In this
                                                                       manual inspection.
    paper, we present CodeInspect, a novel reverse
    engineering tool for Android app that opti-                           With today’s numbers of new samples arriving ev-
    mally supports investigators and analysts in                       ery day, it has become of utmost importance to make
    that task.                                                         manual investigations as efficient as possible. The
                                                                       analysis tool should thus support the human expert to
                                                                       reduce the mechanical parts of the investigation, allow-
1    Introduction                                                      ing the human to focus on understanding the threat.
Mobile devices such as smartphones and tablets are                     In this paper, we present CodeInspect, a novel reverse-
increasingly used in everyday life and have long since                 engineering tool for Android applications. CodeIn-
become essential tools. This success is primarily due                  spect features an expressive intermediate language
to the availability of apps for almost every need. While               with type information for local variables, an interac-
this abundance is helpful for users, it also attracts mis-             tive debugger, and various Android-specific analyses
creants. Stealing sensitive user information or directly               such as data-flow tracking and permissions-usage scan-
incurring charges on them is a profitable, albeit illegal              ning. We show how CodeInspect can be used to analyze
business model. As Android has the largest market                      a complex real-world malware sample [13] within less
share among mobile operating systems [8], most mal-                    than one hour.
ware is developed for Android as well. The rate with                      The remainder of this paper is structured as follows.
which new malware appears in the wild increases by                     In Section 2, we introduce the malware that will serve
the year [12].                                                         as a running example in this paper. Afterwards, we
    Many approaches for automatically detecting An-                    give an overview over CodeInspect in Section 3. In Sec-
droid malware have been proposed in the academic                       tion 4, we show how we used CodeInspect to reverse en-
literature [1, 6, 5] and implemented into practical tools              gineer the malware, before we conclude in Section 5.
such as Drebin [1] or Chabada [6]. While automation
                                                                       2   Android/BadAccent Malware
the malware is designed to evade automatic detection            in detail. Bundled third party libraries such as adver-
approaches. To completely understand the behavior of            tisement libraries are usually considered safe, although
the malware, a manual investigation is necessary. Out           they might pose a security or privacy risk to the user of
of the various components, we focus on the SMS In-              the application. The library code is often not available
terception component, which intercepts incoming SMS             and, thus, cannot be checked by the app developers.
messages and forwards them to the attacker. This                CodeInspect, however, enables developers to validate
is done in an attempt to obtain mobile transaction              the behavior of the compiled application including the
numbers (mTAN), which can then be used to conduct               actions performed by the libraries. Similar challenges
fraudulent transaction at the user’s expense. For the           arise when outsourcing app development to third par-
investigator, it is important to understand where the           ties that only deliver the binaries of the developed app,
stolen information is sent as any target address may            but not the source code. In that case, the purchasing
give clues on the identity of the miscreant running the         company also need powerful analysis tools to look into
scam. From a previous investigation, we knew that the           the delivered black box. Otherwise, that black box de-
malware sends some information via e-mail. It was,              velopment could contain serious security flaws or even
however, unclear where the e-mails were sent to, and            malicious code that goes undetected.
whether additional channels existed. Therefore, find-
ing the target mail address and possible other channels         3.1   Jimple Intermediate Representation
was the focus of the manual investigation at hand.
                                                                CodeInspect relies on the Soot framework for program
3   CodeInspect Overview                                        analysis and transformation [9]. The Soot framework
                                                                takes an Android application as input and converts it
As seen in the introduction, some situations require            into a human readable type-based intermediate repre-
binary software to be inspected manually. Analysts              sentation called Jimple [15]. From now on, all code
can use existing command-line tools such as APKtool             analyses and transformations are performed on this
to decompile the binary APK file into readable text             intermediate representation rather than the original
source. This tool, however, creates smali code. Smali           bytecode. Soot also offers the possibility to convert
is an untyped assembly language, leaving the analyst            the (potentially modified) Jimple code back into an
with the challenging task as making sense to regis-             Android binary. CodeInspect inherits this feature and
ters operations and explicit reference management on            allows the analyst to modify the app, for instance to
the heap. Filling data structures, for instance, is a           remove emulator checks or other challenges to the anal-
complex set of heap navigation instructions in smali.           ysis. The human expert can also refactor the app to
Furthermore, disassembly files only give a static look          integrate conclusions she has already drawn about the
on the malware. They do not easily allow for runtime            app, e.g., by renaming methods to what their actual
inspection.                                                     task is instead of some obfuscated name. The analyst
    A powerful IDE such as CodeInspect, on the other            can also merge additional Jimple or Java code into the
hand, is much more convenient to use. The tool is               app. With this feature, she can, for instance, imple-
based on the Eclipse [4] IDE, so that developers usu-           ment a decryption method for some obfuscated strings
ally have an intuition on how to use it. CodeInspect            in Java and use them during a dynamic analysis to bet-
converts the APK file into a typed, higher-level in-            ter understand the original data processed inside the
termediate representation that is much more conve-              app. CodeInspect automatically merges the original
nient to read than smali. The code editor provides              app code and the new additions at compile time.
the analyst with syntax highlighting and navigation                Although actual Java source code might be even
capabilities that allow the analyst to e.g., jump to the        easier to understand than Jimple, it is not always pos-
definition of a symbol of interest. The CodeInspect             sible to decompile an app’s bytecode back into valid
IDE allows the analyst to work with the decompiled              Java code. The Dalvik bytecode language that An-
code on a semantic, rather than just a textual level.           droid uses allows for constructs that have no equiva-
If the analyst, e.g., searches for a specific method, she       lent in Java, such as unconditional nested jumps (goto
will only find occurrences of that method name, not             instructions). Existing obfuscators [3] allow to easily
arbitrary strings that happen to have the same name.            transform an app into such a non-reversible form. As
CodeInspect can import APK file either directly if they         long as the app still contains valid bytecode (i.e., runs
are available on the analyst’s machine, or it can load          on the device), it can, however, be represented in Jim-
them from a real tablet or phone on which the respec-           ple. This makes Jimple the ideal middle ground be-
tive apps are installed.                                        tween bytecode and Java source code. It also makes
    Besides reverse engineering for malware analysis,           sure that CodeInspect can re-compile every app (with
CodeInspect can also be used to analyze benign apps             potential changes from the analyst) and inspect its be-

                                                               The first part contains the variable declarations. The
                                                               second part contains the actual Jimple instructions. In
                                                               this example, we see a read access to a field (urlServer)
                                                               in the first line and a method call (DownloadFile) in the
                                                               second line. In total, this example loads a file from
                                                               a server on the web and specifies the user name and
                                                               password required to access the file.

                                                               3.2   Project Explorer
                                                               In the normal Eclipse project explorer, CodeInspect
                                                               lists all parts of the decompiled Android app. This
                                                               includes not only the code, but also the manifest xml
                                                               file (in human-readable form), the assets bundled with
                                                               the app, the native libraries, and the layout XML files.
                                                               The user is free to inspect and modify all of these
               Figure 1: Jimple Search
                                                               files. For opening the manifest or the layout XML
                                                               files, CodeInspect contains the Android ADT compo-
                                                               nents for Eclipse. A layout file will thus be shown in
                                                               graphical UI editor in addition to the plain XML rep-
                                                               resentation. All the different editors are linked; if the
                                                               user clicks on a class name in one editor (e.g., the man-
                                                               ifest file), she can directly jump to the respective code
                                                               in the Jimple code editor. These links are also auto-
                                                               matically updated when the code is changed. If the
                                                               analyst, for instance, renames an activity, the respec-
                                                               tive manifest entry will also be adapted automatically.

           Figure 2: Jimple Code Snippet                       3.3   Debugging

havior dynamically. For those cases in which a Java de-        To debug the application, it is not required to root
compilation is possible, CodeInspect integrates an ex-         the phone. Debugging may be performed on an emu-
isting state-of-the-art Java decompiler in a best-effort       lator or a real device; the only requirement is that the
approach.                                                      Developer mode is activated on the device. The Auto
    CodeInspect’s Jimple editor behaves similarly to a         Stepper view automates stepping through the appli-
normal Eclipse code editor for Java. The user may              cation under analysis. If activated, it steps through
modify the code as she wishes. When she saves her              the code in a predefined frequency. Of course, it is
changes, the code is automatically recompiled. As a            also possible to manually set breakpoints, jump into
result, she receives a modified application as a new           or over method calls, jump back from a method to
APK file, which behaves like the original application,         its caller, or drop the execution pointer to the current
but with the changed code.                                     stack frame. CodeInspect’s debugger is as powerful as
                                                               the original Android debugger that app developers use
    In the same way, due to Soot’s class file feature, a
                                                               on their source code.
main method can be written in Java, which calls the
aforementioned ”decryption” method from Java code,
which runs on the JVM on the computer. In case the             3.4   Android-Specific Analyses
app loads a dex file at runtime, one can also inspect          CodeInspect extends the normal Eclipse IDE with vari-
its code and debug it via the Dex file merge feature.          ous Android-specific analyses that give a human expert
It merges the code from an available dex file and adds         more information on the current app. These analyses
it to an existing project.                                     are implemented as additional views. The Permis-
    The user also may search for specific fields, meth-        sion Usage View (Figure 3) lists all the permissions
ods, classes as well as their usages using the Jimple          requested by the app. For every permission, it also re-
search, as shown in figure 1.                                  ports all locations in the code where the respective
    Figure 2 shows the Jimple representation of a              permission is required. More precisely, CodeInspect
method taken from a real-world malware app. The                identifies all API calls that would fail if the respec-
body a Jimple method can be divided into two parts.            tive permission were not present. This information

           Figure 3: Permission Usage View
                                                                              Figure 5: Call Hierarchy View
                                                                  sponsible for the information flow and shows all rele-
                                                                  vant statements in the Calltrace View. With a click,
                                                                  the user can jump from the flow results directly into
                                                                  the source code.
                                                                     To fully understand an Android app, it is often im-
                                                                  portant to know certain runtime values. If the ana-
                                                                  lyst has found out that a malware application leaks
                                                                  data via SMS, she must then find the target tele-
                                                                  phone number. For communication with a remote
            Figure 4: Type Hierarchy View
                                                                  command&control server, she is interested in the URL
can give the analyst a first hint to potentially mali-            of that server. Such values are, however, often not
cious behavior in the app, for instance in the case of            available in the app as plain text, but are obfuscated.
SMS trojans.                                                      Their final value only gets decrypted or computed at
   Furthermore, the normal Java-based code analy-                 runtime. Manually undoing obfuscations is a cum-
ses known from Eclipse are also available in CodeIn-              bersome and inefficient task. We therefore provide
spect. Similar to the Java Eclipse IDE Plug-In (JDT),             Harvester [11], an approach that fully automatically
CodeInspect contains a type hierarchy view, which can             extracts such runtime values from Android applica-
be used to examine inheritance and interface imple-               tions. It can also be used for e.g., deobfuscating re-
mentation relationships. Figure 4 shows the type hier-            flective method calls by first finding the correct tar-
archy of the Service class, thus showing all Services of          get method signature and then replacing the reflective
an application. Besides, we have implemented a call               call with a direct one. Harvester is well-integrated into
hierarchy view. Figure 5 shows all potential callers of           CodeInspect. The user can select a code positions (e.g.
method getSDPath1. This might give clues on how the               method arguments) from which she wants to extract
respective method is used. In case of the getSDPath1              runtime values. Harvester will then fully automati-
method, the view shows that this method gets called               cally extract possible runtime values for this particular
from deleteFoder1, hinting that the app is trying to delete       set of variables. Similarly, she can select the reflective
a folder.                                                         method calls she wants to simplify. Harvester then
                                                                  looks for the receiver method and injects a direct call.
                                                                     CodeInspect’s plugin interface is also open to other
3.5   Plugins
                                                                  developers who want to extend the tool with additional
As CodeInspect has been developed at an academic in-              functionality. As CodeInspect is based on Eclipse, nor-
stitution, we have also compiled our research results             mal Eclipse plugins can be used to provide new fea-
into CodeInspect plugins. FlowDroid [2] is a popu-                tures such as support for more version control systems
lar static information flow tracker for Java and An-              or specific file type editors.
droid, which can be used to detect unwanted or po-
tentially dangerous data flows. FlowDroid is tightly
integrated into CodeInspect through CodeInspect’s ex-
tensible plug-in interface. This allows the analyst to
configure and conduct data flow analyses easily and
efficiently. The results are graphically displayed inside
CodeInspect as shown in Figure 6. The user can select
different data flows, represented by data sources and
sinks. After selecting a particular data flow, the plug-
in highlights the corresponding statements that are re-

                                               Figure 6: FlowDroid Plug-In
4   Reverse Engineering Bad/Accents                           has an onReceive() method that gets executed whenever
    Malware With CodeInspect                                  the device receives a new SMS message. Note that
                                                              the receiver also reacts on other actions besides the
In this Section, we explain how CodeInspect can be            SMS_RECEIVED, but these actions are less important. The
used in a malware investigation. Such a manual re-            main activity and the receiver class are two interesting
verse engineering task is usually important if auto-          parts, which needs to be analyzed in more detail.
mated approaches, such as a behavior analysis in a               Besides the manifest, one can also use CodeInspect’s
sandbox [10, 14], provide no or only little evidence on       search function. Since we are looking for user e-mail
questions such as which data is stolen?, or where is          credentials, a simple lookup for “password” or “user”
the data sent to?. In such cases, a manual inspection         might return interesting code positions. Indeed, An-
is usually the only solution. We take the malware in-         droid/BadAccents contains a few code statements that
vestigation [13] of the Android/BadAccents malware            contain the two search words. The most interesting
as an example to demonstrate the need for manual re-          code location is the one shown in the lower part of
verse engineering. An automated pre-analysis of this          Figure 7. It contains two api calls stringUser() and
malware family hinted at a banking trojan, but it was         stringPassword() which are call native methods. These
not clear to the malware analyst where the stolen data        methods take no parameters and return strings, i.e.,
from the SMS intercepting component of the malware            are simple native getters. The code section was exe-
was sent to. In the following we will explain in detail       cuted in the onCreate() method of the main activity and
how one could use CodeInspect during investigations           the api calls were directly executed without any special
such as the one on Android/BadAccents. The goal is            triggering.
to answer detailed questions such as the receivers of            Now, one could either reverse engineer the native
stolen data on potentially obfuscated code.                   implementation of the two api calls or proceed with a
                                                              dynamic analysis. We could have extracted the native
4.1 APK Overview                                              code into a new app and called the two methods there
                                                              to find the returned string values. Debugging the orig-
The Android manifest provides a good starting point
                                                              inal malware app, however, required even less effort as
in a manual malware investigation since it contains in-
                                                              we explain in the next section.
formation about the different components of the app
along with additional meta-information. In the con-
text of the Android/BadAccents malware, one can
identify that com.shit.MainActivity is the class of the main
activity, which gets called first when the user opens an
application. Furthermore, the manifest also contains
a broadcast receiver called com.a.a.AR which, in turn,

                                                                 Figure 9:   Call Hierarchy View for the MailSend

Figure 7: Access to Email Credentials in the Main

                                                                 Figure 10: Sending an SMS Message to the Emulator
                                                                 via CodeInspect

        Figure 8: Runtime Values of Variables                    “+861111” and text “Hello World!”) to the emulator
                                                                 as shown in Figure 10. After single stepping through
4.2   Detailed Manual Analysis                                   the Jimple code, we hit an interesting Code section
                                                                 (see Figure 11), which checks whether the incoming
One of the most powerful features of CodeInspect is
                                                                 number starts with “+86” or “+82” which shows that
the interactive debugger. It allows a human analyst
                                                                 the malware is expecting SMS messages from China or
to peform single-step debugging on the Jimple code.
                                                                 South Korea. This is an interesting result from the in-
Whenever the execution is halted, she can examine
                                                                 vestigation which shows that the malware is especially
the current contents of all variables in scope in the live
                                                                 targeting users from China or South Korea.
variables view. This was very useful for extracting the
                                                                     Further stepping through the code leads to another
concrete username and password which are returned
                                                                 interesting code section as shown in Figure 12. Here
from the two native API calls. A breakpoint in line 264
                                                                 we can see that the malware expects a certain SMS
in Figure 7 stops the program at that point and gives
                                                                 text “ak40 1”. The purpose of this special command
the analyst the possibility to view the runtime values
                                                                 is to activate and deactive stealing the incoming SMS
of the variable $String and $String2. These two values
                                                                 messages. After sending the “ak40 1” command to the
are temporarily stored on the file system and are later
                                                                 emulator, all further incoming SMS messages are in-
used for sending the stolen data to the attacker’s email
                                                                 tercepted and sent to the attacker. Figure 13 shows
account via the SMTP protocol. This answers the first
                                                                 the usage of the “MailSend” method which leaks the
question in our investigation. We have information
                                                                 incoming SMS message to the attacker with the cre-
about email credentials, which were hidden in native
code. However, we do not have a proof that these
credentials are actually used for authentification.
   A quick search for “mail” in the Jimple code
shows that the application contains a method called
“MailSend” which sounds like the method which is re-
ponsible for sending emails to the attacker. We use the
“Open Call Hierarchy” feature of CodeInspect (see Fig-
ure 9) and discover that the “MailSend” method gets
triggered by the “onReceive” method which gets exe-
cuted once the application receives an SMS (see Sec-
tion 4.1). This shows that SMS data is indeed stolen
and leaked via e-mail.
   As a next step, we send an SMS message (number                      Figure 11: Incoming SMS Number Check

