Tag Archives: md5

An update on MD5 poisoning

Last year we published a proof-of-concept tool to demonstrate bypasses against security products that still rely on the obsolete MD5 cryptographic hash function.

Summary: The method allows bypassing malicious executable detection and whitelists by creating two executables with colliding MD5 hashes. One of the executables (“sheep”) is harmless and can even perform some useful task and is expected to be categorized as goodware by the victim. After the sheep is accepted by the victim, the colliding malicious version (“wolf”) is sent. Because affected products rely solely on the MD5 fingerprints to identify known good executables, wolf is already whitelisted and can run.

Although the reception of the research was generally positive, some were skeptical about the extent and even the validity of the issue. Although in the meantime we received information about more affected products, NDA’s prevented us from further demonstrating that the problem indeed exists and affects multiple vendors. 

Today we are able to share a demonstration of the problem affecting Panda Adaptive Defense 360. The issue is demonstrated against the stricter “Lock mode” of the product meaning that the Panda agent only allows known good executables to run (application whitelisting). For the sake of this video we manually unblock the harmless executable version (sheep4.exe) to speed up the process, as otherwise the analysis could take several hours to complete (it was confirmed that the “sheep” executables aren’t detected as malicious by the cloud scanner in case they are not manually unblocked):

(You can skip 01:00-01:55 if you are not interested in the policy update)

We notified Panda Security about this issue through their Hungarian partner (see the timeline at the end of this post). Panda responded that this is a known issue that is expected to be fixed in the next major version, but no ETA was provided. Panda stated that MD5 was used because of performance reasons. We informed Panda that the BLAKE2 hash function can provide higher level of security at better performance than MD5 (thanks to Tony Arcieri for this update!).

We’d like to stress that this research is not about individual vendors but about bad practices prevalent in the security industry. We now know of at least four vendors affected by the above problem and several others still provide MD5 fingerprints only in their tools and public reports. It is shameful that while hard work is put into phasing out SHA-1, in the security industry it is still generally accepted to use MD5, even after it was exploited in a real-world incident. We understand that there are more straightforward ways for evasion, but think that this issue is a good indicator of how security product development is often approached.

We should do better than this!


2016.08.30: Sending technical information to vendor.
2016.09.05: Vendor requests more information, including PCOPInfo logs collected during retest.
2016.09.06: Sending demo video and identification information about product instance. Requesting more information about PCOPInfo usage.
2016.09.06: Vendor responds with instructions about PCOPInfo.
2016.09.08: Sending PCOPInfo logs to vendor.
2016.09.19: Vendor responds that this is a known issue, replacement algorithm is expected to be implemented in the next version.
2016.09.27: Requesting negotiation about issue publication date.
2016.10.12: Requesting negotiation about issue publication date. Including notification about 90-day disclosure deadline in case no agreement would be achieved.
2016.10.19: Vendor responds, internal discussion is still in progress.
2016.11.16: Requesting information about acceptance of publication date.
2016.11.28: Public release.

Poisonous MD5 – Wolves Among the Sheep

MD5 is known to be broken for more than a decade now. Practical attacks have been shown since 2006, and public collision generator tools are also available since that time. The dangers of the developed collision attacks were demonstrated by academia and white-hat hackers too, but in case of the Flame malware we’ve also seen malicious parties exploiting the weaknesses in the wild.

And while most have already moved away from MD5, there is still a notable group that heavily uses this obsolete algorithm: security vendors. It seems that MD5 became the de-facto standard of fingerprinting malware samples and the industry doesn’t seem to be willing to move away from this practice. Our friend Zoltán Balázs collected a surprisingly long list of security vendors using MD5, including the biggest names of the field.

The list includes for example Kaspersky, the discoverer of Flame who just recently reminded us that MD5 is dead, but just a few weeks earlier released a report including  MD5 fingerprints only – ironically even the malware they analysed uses SHA-1 internally…

And in case you think that MD5 “good enough” for malware identification let’s take another example. The following picture shows the management console of a FireEye MAS – take a good look at the MD5 hases, the time delays and the status indicators:

fireeye_duplicateDownload sample files

As you can see, binaries submitted for analysis are identified by their MD5 sums and no sandboxed execution is recorded if there is a duplicate (thus the shorter time delay). This means that if I can create two files with the same MD5 sum – one that behaves in a malicious way while the other doesn’t – I can “poison” the database of the product so that it won’t even try to analyze the malicious sample!

After reading the post of Nat McHugh about creating colliding binaries I decided to create a proof-of-concept for this “attack”. Although Nat demonstrated the issue with ELF binaries, the concept is basically the same with Windows (PE) binaries that security products mostly target. The original example works by diverting the program execution flow based on the comparison of two string constants. The collision is achieved by adjusting these constants so that they match in one case, but not in the other.

My goal was to create two binaries with the same MD5 hash; one that executes arbitrary shellcode (wolf) and another that does something completely different (sheep). My implementation is based on the earlier work of Peter Selinger (the PHP script by Nat turned out to be unreliable across platforms…), with some useful additions:

  • A general template for shellcode hiding and execution;
  • RC4 encryption of the shellcode so that the real payload only appears in the memory of the wolf but not on the disk or in the memory of the sheep;
  • Simplified toolchain for Windows, making use of Marc Stevens fastcoll (Peter used a much slower attack, fastcoll reduces collision generation from hours to minutes);

The approach may work with traditional AV software too as many of these also use fingerprinting (not necessarily MD5) to avoid wasting resources on scanning the same files over and over (although the RC4 encryption results in VT 0/57 anyway…). It would be also interesting to see if “threat intelligence” feeds or reputation databases can be poisoned this way.

The code is available on GitHub. Please use it to test the security solutions in your reach and persuade vendors to implement up-to-date algorithms before compiling their next marketing APT report!

For the affected vendors: Stop using MD5 now! Even if you need MD5 as a common denominator, include stronger hashes in your reports, and don’t rely solely on MD5 for fingerprinting!