Dissecting LLVM Obfuscator Part 1

Kareem El-Faramawi, Toshi Piazza

LLVM Obfuscator is an industry-grade obfuscator which we have encountered frequently in the past few years of CTFing. This blog post documents our work in understanding the design of the obfuscator itself, as well as any possible weaknesses in the implementations of the obfuscation passes. We use this work to automate the task of emitting cleaned and working binaries via Binary Ninja.

Introduction

The open source LLVM Obfuscator manifests as 3 relatively disjoint LLVM passes, each implementing some sort of obfuscation that obscures the CFG or arithmetic computation of the original program in some way.

Also note that, due to the fact that these passes operate over LLVM IR, it supports almost every architecture under the sun.

Information for each of the three passes can be found here:

These are simply the documentation pages maintained by the authors for each respective pass. However, if the documentation is deficient, the source is also an obvious ground truth.

Unfortunately, the llvm-obfuscator repo maintains multiple branches, one for each version of LLVM that that branch targets, with a clone of the entire LLVM repository, so it’s easy to get lost in the code. The passes for the 4.0 branch can be found here, in the lib/Transforms folder, as is customary for LLVM passes.

This blog post will focus on the control flow flattening pass, as we find it to be one of the most interesting passes in the LLVM Obfuscator ecosystem, as well as the most effective.

Control Flow Flattening

We can visualize the effects of the Control Flow Flattening pass as the following CFG transformation:

  1. Collect all of the original basic blocks in a CFG
  2. Lay them down flat “at the bottom” of the CFG, and remove all original edges
  3. Add …
...Read more