Accessing local variables in ProGuarded Android apps

Debugging applications without access to the source code always has its problems, especially with debuggers that were built with developers in mind, who obviously don’t have this restriction. In one of our Android app security projects, we had to attach a debugger to the app to step through heavily obfuscated code.

For the sake of simplicity (and because of the NDAs involved), I’ll demonstrate the case with a simple application. By skipping some parts of the reverse engineering process, we’ll start with the knowledge that the method hu.silentsignal.pgdebug.poc.a.doSecret is something we’re interested in. To be precise, we’d like to know what parameters are passed to the method, which should be pretty easy in JDB; in case of Android, just forward the JDWP to a local TCP port.

$ adb shell ps | grep silentsignal
u0_a131   31275 3711  1493396 43620 SyS_epoll_ 0000000000 S hu.silentsignal.pgdebug.poc
$ adb forward tcp:7777 jdwp:31275
$ ss -lntp |fgrep 7777
LISTEN     0      0      127.0.0.1:7777                     *:*                   users:(("adb",pid=15778,fd=11))
$ jdb -attach localhost:7777
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
Initializing jdb ...
> stop in hu.silentsignal.pgdebug.poc.a.doSecret
Set breakpoint hu.silentsignal.pgdebug.poc.a.doSecret
>
Breakpoint hit: "thread=main", hu.silentsignal.pgdebug.poc.a.doSecret(), line=-1 bci=0

main[1] locals
No local variables

As it can be seen above, JDB couldn’t find any variables (which include parameters as well). I had to dig deeper, so I went to see how the Dalvik implementation of JDWP worked. Around line 840, the implementation states (emphasis mine)

We could return ERR_ABSENT_INFORMATION here if the DEX file was built without local variable information. That will cause Eclipse to make a best-effort attempt at displaying local variables anonymously. However, the attempt isn’t very good, so we’re probably better off just not showing anything.

Diffing the low-level smali textual representation (the assembly of Dalvik) revealed that such information contains only one thing: the name of the parameter. Since reverse engineers hardly need to know the exact name of symbols, a simple Python script was written that reads each Smali file and populates this metadata.

You might ask: how does it know how many parameters belong to the method. The answer is simple: Java solves overloading with name mangling (a bit like C++), so the type and number of parameters can be extracted from the mangled function name found in the bytecode.

METHOD_RE = re.compile(br'^\.method([a-z ]*) ([^ (]+)\((.*?)\).*\n\s*\.locals \d+$', re.MULTILINE)
PARAM_RE = re.compile(br'\[*(?:L.+?;|[^L])')
 
def find_methods(smali):
    for method in METHOD_RE.finditer(smali):
        is_static = b'static' in method.group(1)
        print('[-] |- Method:', method.group(2))
        params = PARAM_RE.findall(method.group(3))
        if params:
            yield method.end(), params, is_static

The above method is a so-called generator function, which means that each time the execution reaches the yield in the last line, a tuple describing a Dalvik method is given to the caller. The tuple contains the byte offset of the end of the method declaration (more precisely, the end of declaring the number of locals), the list of parameters and whether the Dalvik method is static or not. As it can be seen below, we use it to build a list, as we’d like to

check whether there are any methods to annotate at all and
serialize the process by reading through the file in one step and writing the output after reading has finished in a second step.

The code below uses pathlib from Python 3 to iterate simply over all smali files, which would be much messier in Python 2 (also, note the lack of direct path manipulations, such as os.path.join and friends). Each file is mapped to memory using the (also built-in) mmap module for simplicity and improved performance (the re library is implemented in C and thus it reads from the file directly, without involving any of the Python I/O library). After reading through the file, it’s renamed with a backup postfix (~); however the file is still open, so it doesn’t affect us reading from it.

The file is written in a more traditional way, copying the original contents from one offset to the next, and injecting parameter declarations in between. The encode/decode dance is required only since bytes instances (str in Python 2) have no format method. Since each iteration of the loop over the list of parameters does one offset-to-offset copy and an injection, the rest of the file from the last offset has to be copied after the loop, in a similar manner to joining a list of strings with a separator. The built-in function enumerate adds an index to an iterable, which needs to be offset depending on whether the method is static or not, as non-static (also known as instance) methods take the instance (this) as a “hidden” first (index = 0) parameter in Dalvik.

def process_dir(path):
    for entry in Path(path).rglob('*.smali'):
        original_path = entry.path
        print('[-] File name:', original_path)
        with entry.open() as f:
            with closing(mmap(f.fileno(), 0, access=ACCESS_READ)) as smali:
                if re.search(br'\.(?:param|local) ', smali):
                    raise RuntimeError('Parameters are already annotated')
                param_inserts = list(find_methods(smali))
                if not param_inserts:
                    continue
                entry.rename(original_path + '~')
                last_pos = 0
                with open(original_path, 'wb') as output:
                    for offset, params, is_static in param_inserts:
                        output.write(smali[last_pos:offset])
                        for n, t in enumerate(params, 0 if is_static else 1):
                            output.write('\n    .param p{0}, "p{0}"    # {1}'.format(n,
                                t.decode('ascii')).encode('ascii'))
                        last_pos = offset
                    output.write(smali[last_pos:])
        print('[+] Closed', original_path)

Running the script produces some verbose output:

$ python3 annotate.py smali
[-] File name: smali/hu/silentsignal/pgdebug/poc/a.smali
[-] |- Method: b'doSecret'
[+] Closed smali/hu/silentsignal/pgdebug/poc/a.smali
[-] File name: smali/hu/silentsignal/pgdebug/poc/Main.smali
[-] |- Method: b'<init>'
[-] |- Method: b'doSecret'
[-] |- Method: b'onCreate'
[+] Closed smali/hu/silentsignal/pgdebug/poc/Main.smali
</init>

Repackaging the app (with something like apktool, which is usually used to get Smali disassembly in the first place) and installing it on the device now results in a much better debugging experience, as it can be seen below.

$ jdb -attach localhost:7777
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
Initializing jdb ...
> stop in hu.silentsignal.pgdebug.poc.a.doSecret
Set breakpoint hu.silentsignal.pgdebug.poc.a.doSecret
>
Breakpoint hit: "thread=main", hu.silentsignal.pgdebug.poc.a.doSecret(), line=-1 bci=0

main[1] locals
Method arguments:
p0 = 1456926831238
Local variables:
main[1]

Of course, similar results could be achieved by decompiling straight to Java (our favorite tool for this is JADX) and recompiling it from scratch, this usually involves more risks as decompilers are far from perfect, so apart from simple demo applications, the assembly-like Smali level results in a more robust solution. The source code is available on GitHub under MIT license, pull requests are welcome!

Featured image is Android firewall by Uncalno Tekno, licensed under CC-BY 2.0.