Dissecting LLVM Obfuscator Part 1

| Kareem El-Faramawi | Toshi Piazza

LLVM Obfuscator is an industry-grade obfuscator which we have encountered frequently in the past few years of CTFing. This blog post documents our work in understanding the design of the obfuscator itself, as well as any possible weaknesses in the implementations of the obfuscation passes. We use this work to automate the task of emitting cleaned and working binaries via Binary Ninja.


The open source LLVM Obfuscator manifests as 3 relatively disjoint LLVM passes, each implementing some sort of obfuscation that obscures the CFG or arithmetic computation of the original program in some way.

Also note that, due to the fact that these passes operate over LLVM IR, it supports almost every architecture under the sun.

Information for each of the three passes can be found here:

These are simply the documentation pages maintained by the authors for each respective pass. However, if the documentation is deficient, the source is also an obvious ground truth.

Unfortunately, the llvm-obfuscator repo maintains multiple branches, one for each version of LLVM that that branch targets, with a clone of the entire LLVM repository, so it’s easy to get lost in the code. The passes for the 4.0 branch can be found here, in the lib/Transforms folder, as is customary for LLVM passes.

This blog post will focus on the control flow flattening pass, as we find it to be one of the most interesting passes in the LLVM Obfuscator ecosystem, as well as the most effective.

Control Flow Flattening

We can visualize the effects of the Control Flow Flattening pass as the following CFG transformation:

  1. Collect all of the original basic blocks in a CFG
  2. Lay them down flat “at the bottom” of the CFG, and remove all original edges
  3. Add …
...Read more

Google CTF Quals 2017 - Moon

| Toshi Piazza

This writeup is for the reversing challenge “Moon” we solved during 2017 Google CTF Quals. This writeup and 3 others were also submitted to the Google CTF Writeup Competition.

Dealing with GLEW

A big problem we noticed early on was the use of GL3W, which generates code to lazily load all OpenGL functions at runtime—at all places where an OpenGL function was used we would simply see a call to some offset in the data section. Unfortunately we couldn’t find a script to re-symbolize function calls, but we plan to make one soon-ish :)

We can see the huge routine which calls LoadLibrary on every OpenGL function, at sub_4032c0.

GLEW lazy loading functions

Instead of symbolizing (by hand) all of the symbols, we only bothered to load ones which had valid XREFs to them, saving a bit of time.

Running the Program

Unfortunately all the RPI-sold computers from our year are not yet reported to support OpenGL 4.3, so first and foremost we had to patch the OpenGL verification check from 4.3 to 4.2. Surprisingly, this “just worked” despite the code making use of Compute Shaders which I had thought to be introduced in OpenGL 4.3.

The program simply opens up a window, and asks for a password. After we’ve entered 32 characters, the program either responds “good” (presumably), or “Nope”.

Running the program

When we XREF the string Nope, we see that it is used when constructing the texture to be printed for this SDL event loop iteration. Not too far from “Nope” do we find “Good”, and we notice that “Good” is only selected if a particular global variable is set. We trace this back to the following code in main:

We want should_compute here …

...Read more

Google CTF Quals 2017 - Food

| Toshi Piazza

This writeup is for the “Food” challenge found in Google CTF Quals 2017, from the reversing category. This writeup and others were also submitted to the Google CTF Writeup Competition.

JNI-Native Reversing

We are only given a food.apk, and from there we immediately unzip it and run dex2jar, followed by jd-gui to view the source code. Fortunately, the source is very succinct:

Although I’m not well versed in Android development, it looks like it’s loading a library libcook.so from either one of lib/{armeabi,x86}. We turn our attention to the arm library, and symbolize it using a script we found here. This resolves certain awkward indirect function calls that otherwise would appear as unnamed constant offsets into the JNI struct (e.g. JNI_FindClass).

The bizarre disassembly in JNI_Onload seems to indicate that all useful strings have been encrypted, and are only decrypted at runtime. One such example of this is shown below:


Note that decrypt_string here, as we learn later is a variadic function with the following signature: char *decrypt_string(int a1, ...). We implement the decrypt_string function in python so that we can determine all the strings used by the application:

We can now resolve many …

...Read more

NorthSec 2017 rao_bash

| Toshi Piazza

This is a posthumous writeup of the rao_bash challenge hosted by NorthSec 2017. It features a recompiled and backdoored bash with a broken ELF header. This was solved after the CTF, because a small hint was dropped to us afterwards.

Fixing the ELF Header

First and foremost, we’d like to fix the ELF header of the executable. This ELF header interferes with all of our tools, and even the venerable file is having trouble with it:

$ file ./rao_bash_orig
./rao_bash_orig: ELF 64-bit MSB *unknown arch 0x3e00* (SYSV)

If that’s not enough of a hint, readelf seems to think that the entrypoint is some very large number,

$ readelf -a ./rao_bash_orig
ELF Header:
  Magic:   7f 45 4c 46 02 02 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, big endian
  Machine:                           <unknown>: 0x3e00
  Version:                           0x1000000
  Entry point address:               0x300c420000000000

At this point, it should be obvious that the entry point is correct, if interpreted in big endian, and readelf is so nice to show as much that the data itself is all big endian. We simply revert this by flipping the “\x02” to a “\x01” in the 5th byte of the ELF header (see the magic line). Now, all of our tools work as appropriate.


Here is where the hint comes into place: bash is a huge executable, and unfortunately we could not find anything largely different between rao_bash and a bash we compiled ourselves with a similar compiler version, i.e. by using diaphora. However, it seems we missed a very quick-and-dirty xor, as seen below in the main function.

Quick and dirty xor

It looks like argv is checksummed, and if the checksum is valid it moves to sh_login_init(). Obviously, we don’t quite care about the checksum, as it’s very lossy …

...Read more