An update on MD5 poisoning

Author: b

Last year we published a proof-of-concept tool to demonstrate bypasses against security products that still rely on the obsolete MD5 cryptographic hash function.

Summary: The method allows bypassing malicious executable detection and whitelists by creating two executables with colliding MD5 hashes. One of the executables (“sheep”) is harmless and can even perform some useful task and is expected to be categorized as goodware by the victim. After the sheep is accepted by the victim, the colliding malicious version (“wolf”) is sent. Because affected products rely solely on the MD5 fingerprints to identify known good executables, wolf is already whitelisted and can run.

Although the reception of the research was generally positive, some were skeptical about the extent and even the validity of the issue. Although in the meantime we received information about more affected products, NDA’s prevented us from further demonstrating that the problem indeed exists and affects multiple vendors. 

Today we are able to share a demonstration of the problem affecting Panda Adaptive Defense 360. The issue is demonstrated against the stricter “Lock mode” of the product meaning that the Panda agent only allows known good executables to run (application whitelisting). For the sake of this video we manually unblock the harmless executable version (sheep4.exe) to speed up the process, as otherwise the analysis could take several hours to complete (it was confirmed that the “sheep” executables aren’t detected as malicious by the cloud scanner in case they are not manually unblocked):


(You can skip 01:00-01:55 if you are not interested in the policy update)

We notified Panda Security about this issue through their Hungarian partner (see the timeline at the end of this post). Panda responded that this is a known issue that is expected to be fixed in the next major version, but no ETA was provided. Panda stated that MD5 was used because of performance reasons. We informed Panda that the BLAKE2 hash function can provide higher level of security at better performance than MD5 (thanks to Tony Arcieri for this update!).

We’d like to stress that this research is not about individual vendors but about bad practices prevalent in the security industry. We now know of at least four vendors affected by the above problem and several others still provide MD5 fingerprints only in their tools and public reports. It is shameful that while hard work is put into phasing out SHA-1, in the security industry it is still generally accepted to use MD5, even after it was exploited in a real-world incident. We understand that there are more straightforward ways for evasion, but think that this issue is a good indicator of how security product development is often approached.

We should do better than this!

Timeline

2016.08.30: Sending technical information to vendor.
2016.09.05: Vendor requests more information, including PCOPInfo logs collected during retest.
2016.09.06: Sending demo video and identification information about product instance. Requesting more information about PCOPInfo usage.
2016.09.06: Vendor responds with instructions about PCOPInfo.
2016.09.08: Sending PCOPInfo logs to vendor.
2016.09.19: Vendor responds that this is a known issue, replacement algorithm is expected to be implemented in the next version.
2016.09.27: Requesting negotiation about issue publication date.
2016.10.12: Requesting negotiation about issue publication date. Including notification about 90-day disclosure deadline in case no agreement would be achieved.
2016.10.19: Vendor responds, internal discussion is still in progress.
2016.11.16: Requesting information about acceptance of publication date.
2016.11.28: Public release.


Sanitizing input with regex considered harmful

Author: dnet

Sanitizing input (as in trying to remove a subset of user input so that the remaining parts become “safe”) is hard to get right in itself. However, many developers doom their protection in the first place by choosing the wrong tool to get it done, in this case, regular expressions (regex for short). While they’re powerful for quite a few purposes, as the old saying goes,

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

During a recent pentest, we found an application that did this by stripping HTML tags from a string by replacing the regular expression <.*?> with an empty string. (Apparently, they haven’t read the best reaction to processing HTML with regexes.) For those wondering about the question mark after the star, it disables the default greedy behavior of the engine, so the expression matches a less-than sign, as few characters as possible of any kind, and a greater-than sign. At first sight, one might think that’s the definition of an HTML tag, and for a minute we also believed it was the case.

In regexes, the dot matches any character. However, the definition of any excludes newlines (ASCII 0x0a, \n) by default in most implementations, while the HTML standard allows for such characters inside tags, which gives us a specific class of tags that are valid in browsers but are not stripped by the above algorithm. Below are some examples of platforms used to implement web applications and their behavior regarding this “challenge”. Some libraries have similar solutions, but only one thing was common in these five languages; by default, the above expression fails the test. For the sake of brevity and readability, examples were produced in interactive shells (REPLs); in case of Java and .NET, Jython and IronPython were used, respectively.

Java

>>> from java.util.regex import Pattern
>>> p = Pattern.compile('<.*?>')
>>> p.matcher('<foobar>').replaceAll('')
u''
>>> p.matcher('<foo\nbar>').replaceAll('')
u'<foo\nbar>'

The official documentation states that dot matches “any character (may or may not match line terminators)”. The link points to a section that says “The regular expression . matches any character except a line terminator unless the DOTALL flag is specified.” [emphasis added] Adding the flag solves the problem, as it can be seen below.

>>> p = Pattern.compile('<.*?>', Pattern.DOTALL)
>>> p.matcher('<foobar>').replaceAll('')
u''
>>> p.matcher('<foo\nbar>').replaceAll('')
u''

Python

>>> import re
>>> re.sub('<.*?>', '', '<foobar>')
''
>>> re.sub('<.*?>', '', '<foo\nbar>')
'<foo\nbar>'

Python follows a similar path, even the flag is called the same: “In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline” [emphasis added].

>>> re.sub('<.*?>', '', '<foobar>', flags=re.DOTALL)
''
>>> re.sub('<.*?>', '', '<foo\nbar>', flags=re.DOTALL)
''

PHP

php > var_dump(preg_replace("/<.*?>/", "", "<foobar>"));
string(0) ""
php > var_dump(preg_replace("/<.*?>/", "", "<foo\nbar>"));
string(9) "<foo
bar>"

Although PHP has an interactive mode (php -a), return values are silently discarded, and var_dump doesn’t escape newlines. However, it clearly illustrates that it behaves just like the others, but PHP doesn’t mention this behavior in the official manual for preg_replace (even though a user comment points it out, it lacks the solution). The PCRE modifiers page has the answer, the s modifier should be used, and it even shows the longer name for it (PCRE_DOTALL), although there’s no way to use it, in contrast with Python’s solution (re.S is equivalent to re.DOTALL).

php > var_dump(preg_replace("/<.*?>/s", "", "<foobar>"));
string(0) ""
php > var_dump(preg_replace("/<.*?>/s", "", "<foo\nbar>"));
string(0) ""

.NET

>>> from System.Text.RegularExpressions import Regex
>>> Regex.Replace('<foobar>', '<.*?>', '')
''
>>> Regex.Replace('<foo\nbar>', '<.*?>', '')
'<foo\nbar>'

Of course, Microsoft surprises noone by having its own solution for the problem. In their documentation on regexes, they also mention that dot “matches any single character except \n”, but you have to figure it out yourself; there’s no link to the Singleline member of RegexOptions.

>>> from System.Text.RegularExpressions import RegexOptions
>>> r = Regex('<.*?>', RegexOptions.Singleline)
>>> r.Replace('<foobar>', '')
''
>>> r.Replace('<foo\nbar>', '')
''

Ruby

irb(main):001:0> "<foobar>".sub!(/<.*?>/, "")
=> ""
irb(main):002:0> "<foo\nbar>".sub!(/<.*?>/, "")
=> nil

Ruby performs as usual, having easy-to-write/hard-to-read shorthands, however, its solution is almost as dumbfounding as the above. Like PHP, it expects modifiers as lowercase characters after the trailing slash (/), but it interprets s as a signal to interpret the regex as SJIS encoding (I never knew it even existed), and wants you to use m (called MULTILINE by the official documentation, adding to the confusion), which is used for other purposes in other regular expression engines.

irb(main):005:0> "<foobar>".sub!(/<.*?>/m, "")
=> ""
irb(main):004:0> "<foo\nbar>".sub!(/<.*?>/m, "")
=> ""

Javascript

> "<foobar>".replace(/<.*?>/, "")
''
> "<foo\nbar>".replace(/<.*?>/, "")
'<foo\nbar>'
> "<foo\nbar>".replace(/<.*?>/m, "")
'<foo\nbar>'

JavaScript has three modifiers (igm), none of them useful for making dot match literally any character. The only solution is to do it explicitly, the best one of these seems to be matching the union of whitespace and non-whitespace characters.

> "<foo\nbar>".replace(/<[\s\S]*?>/, "")
''
> "<foobar>".replace(/<[\s\S]*?>/, "")
''

Conclusion

The above solutions address a single problem only (stripping HTML tags having line breaks), processing untrusted input is much more than this. If you build a web application that must display such content, use a proper library for this purpose, preferably a templating language that performs escaping by default. For other purposes, use a DOM and don’t forget to test for corner cases, including both valid and broken HTML.


WAF bypass made easy

Author: pz

In this post I will share my testing experiences about a web application protected by a web application firewall (WAF). The investigation of the parameters of web interfaces revealed that I can perform XSS attacks in some limited ways. The target implemented blacklist-based filtering that provided some HTML tag and event handler restriction. Since this restriction appeared at quite unusual places I suspected that there might be a WAF in front of the application. To verify my suspicion:

  • I tested the interfaces by random HTTP parameters which contained XSS payload. Most of the time a WAF checks all of the input parameters but the server side validation only checks the expected ones;
  • I used the Wafit tool.

These tests underpinned my assumption: there was indeed some application level firewall in front of my target.

In order to map the filter rules I created a list of HTML4 and HTML5 tags and event handlers based on the information of w3.org. After this I could easily test the allowed tags and event handlers with Burp Suite Professional:

HTML eventsHTML events (click to enlarge)

Untitled2

HTML tags (click to enlarge)

Based on the results we can perform only some browser specific attacks typically against IE/6.0. However, many web applications handle parameters passed by GET and POST methods in the same way. I performed the tests again, now submitting the crafted parameters in the HTTP POST body and found that the WAF didn’t interrupt my request and the application processed my input through the same vulnerable code path. This way I could perform XSS without limitation and surprisingly was able to find an SQL injection issue also.

When you try implement a WAF in your infrastructure the most important key concepts are:

  • Know your system: Collect the applications and HTTP parameters to be protected and understand how they are used (data types, expected format, etc.);
  • Consult somebody with an offensive attitude (like a penetration tester) before you implement the defense;
  • Be aware of the limits of blacklist-based filtering;

Improper WAF implementation can lead to false sense of security that can result in more damage than operating applications with known security issues.