Proxying nonstandard HTTPS traffic

Depending on the time spent in IT, most professionals have seen an instance of two where developers based their implementations on specific quirks and other non-standard behaviors, a well-known example is greylisting, another oft-used but less-known one is Wi-Fi band steering. In all these cases, the solution works within a range of implementations, which usually covers most client needs. However, just one step outside that range can result in lengthy investigations regarding how such a simple thing like sending an e-mail or joining a Wi-Fi network can go wrong.

During one of our application security assessments, we found an implementation which abused the way HTTP worked. The functionality required server push, but they probably decided to use HTTP anyway to get through corporate firewalls, and the application came from the age before WebSockets and HTTP2. So they came up with the idea of sending headers like the one below, then keeping the connection open, without sending any data.

HTTP/1.0 200 OK
Server: IDealRelay
Date: Wed, 16 Sep 2015 18:03:24 GMT
Content-length: 10000000
Connection: close
Content-type: text/plain

Googling the value of the Server header revealed mostly false positives, however as it turned out, there’s even a patent US 7734791 B2 with the title Asynchronous hypertext messaging that describes this behavior. Sending data to the server happened over separate HTTP channels, which themselves returned worthless responses, and the real response arrived in the first HTTP channel promising to deliver 10 megabytes in its Content-length header.

Some corporate proxies might have handled this well, however the Proxy module of Burp Suite just waited for the full 10 MB to arrive and just hung there, waiting for all eternity – and even if it managed to receive some data, its Scanner module couldn’t have handled the correlation between sending a payload in one request and getting response in another. In order to solve the problem, I decided to throw together a simple HTTPS proxy that doesn’t assume that much about the contents, just accepts CONNECT requests, performs a Man-in-the-Middle (MITM) attack and dumps the plaintext traffic into a file.

Multiple streams needed to be handled asynchronously, so I chose Erlang for its unique properties, and used built-in modules for everything except generating fake certificates for MITM. Since such certificates are cached, I chose to sign them using the command line version of OpenSSL for the sake of simplicity, so a whitelist is applied to the hostname to avoid command injection attacks – even though it’s supposed to be a tool used to debug “well behaving” applications, it never hurts to protect and setting good example. As an output format, PCAP was chosen as it’s simple (the PCAP writer itself is 32 lines of Erlang code) and widely supported by tools such as Wireshark.

The proxy logic fits into 124 lines of code, and waits for new TCP connections. By setting the socket into HTTP mode, the standard library does all the parsing for the CONNECT request. A standard response is sent, and from this point, the mode gets changed back to raw, and both the server and the client gets an SSL handshake from the proxy. Server certificates used for MITM are cached in an ETS (built-in high-performance in-memory database) table, shared between the lightweight thread-like Erlang processes, so certificates are signed only once per hostname, and the private key is the same for the CA and all the certificates.

After the handshake traffic is simply forwarded between the peers, and simultaneously written into the PCAP file. TCP/IP headers are added with the same port/address tuple as the original stream, while sequence/acknowledgement values are adjusted to reflect the plaintext content. This way, Wireshark doesn’t have any problems with reassembling the stream and can even detect and dissect known application protocols. Source code of the SSL proxy is available in our GitHub repository under MIT license, pull requests are welcome.

With the plaintext traffic in our hands, we managed to develop another tool to experiment with the service and as a result, we found several vulnerabilities, including critical ones. So the lesson to learn is that just because tools cannot intercept traffic out of the box, it doesn’t mean that the application is secure – existing tools are great for lots of purposes but one big difference between hackers and script kiddies is the ability of former to develop their own tools.