Code Review on the Cheap

b 2015-01-06    

At the 31. Chaos Communication Congress I had the pleasure to watch the presentation of Fabian Yamaguchi about the code analysis platform Joern. I’ve heard about this tool before at Hacktivity but this time I could have deeper view on the internals and the capabilities of the software that inspired me to clean up and release some piece of code I used for some years in code review projects.

My idea was quite similar to the one of Fabs: Recognize potentially vulnerable code patterns, and create a tool that is aware of the sytnax in order to look for the anomalies in the source efficiently. Yes, I could use regular expressions and grep, but eventually I faced several problems like:

  • Ugly source code
  • Keywords inside comments or strings
  • The complexity of the regex itself

Since I have to deal with several different languages I also had to come up with a solution that is easily applicable to new grammars. In the end I settled with a solution somehow between grep and Joern and implemented it in CROTCH (Code Review On The CHeap):

Instead of full-blown parsers CROTCH only uses lexers, which are readily available in all flavors in the Pygments Python package (intended to use for source code highlighting), providing support for almost every language one can think of. Other lexers can also be created very easily – although installing them requires some setuptools-fu. With these lexers we can tokenize the source code, splitting it to meaningful parts and assigning types (like scalar, string, comment, keyword, etc.) to them. This is of course far from having an AST for example, but allows us to start creating small, targeted “mini-parsers” which focus only to the relevant parts of the source.

These “mini-parsers” can be implemented through a general purpose state machine, usually in a few dozen lines of Python. The example provided on GitHub implements a state machine similar to the following to identify simple format string bugs without troubling with most of the grammar:

printf
printf

We successfully utilized this approach to find bugs in scale from enterprise Java applications to large PL/SQL projects, hope you’ll find CROTCH useful too! As always, issues, feature- and pull requests are welcome on our GitHub!