Getting Started
This article walks through the process of extracting Android applications from an un-rooted device, reverse engineering those applications, and using auxiliary tools such as automation frameworks and malware analysis tools to identify if the application’s configuration is identified as malware. During this article the following tools are showcased and used:
DROID DETECTIVE
DroidDetective is a Python tool for analysing Android applications (APKs) for potential malware related behaviour and configurations. When provided with a path to an application (APK file) Droid Detective will make a prediction (using it’s ML model) of if the application is malicious.
AUTO DROID
AutoDroid is a Python tool for programmatically scripting bulk interactions with one or more Android devices. Useful for downloading and extracting all APKs from all connected devices, testing a developed application on multiple devices at once, and more.
ADB
Android Debug Bridge (adb) is a command-line interface tool for communicating with Android devices. The adb command allows for a plethora of device interaction types – including acquiring a shell, installing / uninstalling apps, and interacting with the screen and other hardware accessories.
Structure of an Application (APK)
To get things started it’s first important to understand the structure of Android applications. Android application’s are commonly written in either Java or Kotlin. When a software engineer wants to create an APK (the Android pacKage), that contains the code and materials that are run on an Android device, they will need to compile that Java or Kotlin source code to a Dalvik executable/ bytecode. This Dalvik executable is a binary that is run on the Android device. This works where each process on the device uses its own virtual machine (VM) which segregates applications. Prior to Android API level 21 (Android 5), this would have been a Dalvik Virtual Machine, and in later versions will instead use the Android Runtime (ART). Both operate in similar fashions, where they simulate a device’s CPUs, registers, and other features while running an application’s compiled Dalvik bytecode.
While it is the Dalvik bytecode that needs to be run on a device, this is not human readable and so if we are to reverse engineer an application we’ll need to decompile it back into a human readable form. This is where Jadx comes in. Using Jadx we can decompile the Dalvik bytecode back into Java. This is often called pseudo Java, as it is not a one for one representation of what the original source code would have been, and instead is the decompiler’s best guess.
Android APK files also include a file detailing the application configuration, called AndroidManifest.xml. The Android manifest includes information such as:
- Package name and application ID
- Application components
- Intent filters
- Icons and labels information
- Permissions
- Device compatibility information
Retrieving an Application From A Device
Being able to retrieve applications from a device is key in identifying if one of those applications is potentially malware. Before continuing ensure that ADB is enabled on the device being tested – This can be done by going to Settings, About phone, and by tapping Build number seven times. After this go to developer settings and enable USB debugging. Now connect the device and accept any prompts that are displayed.
Android application’s are not encrypted at rest and so if an APK’s location on a device can be identified it can be retrieved. There are two shell commands that can be used when using ADB on a device (via adb shell
) to help with this. These being the pm list packages
command which will list all packages on the target device, and pm path <package name>
which will return the path to that package’s apk file on device. Once the path has been located the adb pull <path to apk>
command can be used to retrieve the APK from a device.
Automating Retrieving and reverse engineering apks
AutoDroid wraps the ability to retrieve Android applications from a device, along with other functionality, to allow for the configurable bulk interaction with an Android device. AutoDroid is configurable with a JSON file, using the below configuration all applications from all connected devices will be extracted from the device, and their manifest files extracted and saved locally to an XML format.
{ "devices": ["*"], "apps": ["*"], "commands": { "get_app": ["!adb_connect !app_path !app_id.apk"], "reverse_app":["reverse: !app_id.apk;manifest"] } }
Ensure all AutoDroid dependencies are installed by running the below installation command:
pip install -r REQUIREMENTS.txt
After creating a JSON config file, AutoDroid can be run by providing the path to the config file as a command line parameter:
python AutoDroid.py <JSON config path>
It would now be possible to iterate through these manifest files one by one to identify trends and malicious configurations commonly seen in malware. As, on average, most users have upwards of 80 applications on a single device this would, however, take a considerable amount of time. In the next section machine learning is used to combat this issue and automate the analysis of these APKs.
Using machine learning to identify malware
As mentioned previously, the manual process of reviewing every single APK on an Android device can be tedious. This article pitches using machine learning to serve as a first pass to help save some of this analysis time. DroidDetective is a Python tool for analysing Android applications (APKs) for potential malicious configurations in the AndroidManifest.xml
file.
Dependencies for DroidDetective are installed in the same fashion to AutoDroid. DroidDetective also requires an apk_malware.model (the pre-trained ML model) at the execution root.
pip install -r REQUIREMENTS.txt
After this DroidDetective can be run as follows
python AutoDroid.py <path to APK> <optional JSON output file>
DroidDetective works by training a Random Forest binary classifier on information derived from both known malware APKs and legitimate APKs. This tooling comes pre-trained, however, the model can be re-trained on a new dataset at any time. This model currently uses permissions from an APKs AndroidManifest.xml
file as a feature set. This works by creating a dictionary of each standard Android permission and setting the feature to 1
if the permission is present in the APK. Similarly, a feature is added for the amount of permissions in use in the manifest and for the amount of unidentified permissions found in the manifest.
Putting it all together
Using what we’ve implemented so far in this article, DroidDetective can be used alongside AutoDroid to automatically retrieve applications from a device and identify if they contain malicious configurations in their manifest file.
For this, ensure that all requirements and required files are present, and run AutoDroid with the following configuration:
{ "devices": ["*"], "apps": ["*"], "commands": ["!adb_connect pull !app_path !app_id.apk", "python DroidDetective.py !app_id.apk output.json"] }
This will result in all APKs on the target device(s) being analysed by DroidDetective. As an optional json output file is provided all of these results are appended to output.json
. An example of this ca be seen below:
{ "com.google.android.uvexposurereporter": false, "com.google.android.networkstack.tethering": false, "com.amazon.mShop.android.shopping": false, "com.google.omadm.trigger": false, ... "com.google.android.apps.cultural": false, "com.android.companiondevicemanager": false, "com.verizon.obdm_permissions": false, "com.android.mms.service": false, "com.google.android.apps.docs.editors.sheets": false }
Learn More On Android Internals
In 2021 I released my first book with Apress publishing, Android Software Internals Quick Reference. If you work with or find programming and Android interesting please consider picking the book up for yourself!
10% off Android Malware Reverse Engineering Cheat Sheets
Free and premium resources, available on everything from Android and iOS security fundamentals, reverse engineering basics, and study guides for my Udemy courses. Use code ‘MALWARE-ARTICLE’ for 10% off on the Android Malware Reverse Engineering Cheat Sheet.