Dissecting LLVM Obfuscator Part 1

| Kareem El-Faramawi | Toshi Piazza

LLVM Obfuscator is an industry-grade obfuscator which we have encountered frequently in the past few years of CTFing. This blog post documents our work in understanding the design of the obfuscator itself, as well as any possible weaknesses in the implementations of the obfuscation passes. We use this work to automate the task of emitting cleaned and working binaries via Binary Ninja.

Introduction

The open source LLVM Obfuscator manifests as 3 relatively disjoint LLVM passes, each implementing some sort of obfuscation that obscures the CFG or arithmetic computation of the original program in some way.

Also note that, due to the fact that these passes operate over LLVM IR, it supports almost every architecture under the sun.

Information for each of the three passes can be found here:

These are simply the documentation pages maintained by the authors for each respective pass. However, if the documentation is deficient, the source is also an obvious ground truth.

Unfortunately, the llvm-obfuscator repo maintains multiple branches, one for each version of LLVM that that branch targets, with a clone of the entire LLVM repository, so it’s easy to get lost in the code. The passes for the 4.0 branch can be found here, in the lib/Transforms folder, as is customary for LLVM passes.

This blog post will focus on the control flow flattening pass, as we find it to be one of the most interesting passes in the LLVM Obfuscator ecosystem, as well as the most effective.

Control Flow Flattening

We can visualize the effects of the Control Flow Flattening pass as the following CFG transformation:

  1. Collect all of the original basic blocks in a CFG
  2. Lay them down flat “at the bottom” of the CFG, and remove all original edges
  3. Add …
...Read more

DEFCON Finals 2017 - Intro & Rubix

| Toshi Piazza

Over the weekend, July 28-31 2017 RPISEC competed with Lab RATs and Techsec in DEFCON Finals, one of the most important CTFs of every year. Only 15 teams in the world get to qualify for the event each year, and our team under Lab RATs was able to earn the right to compete among 14 other globally professional teams.

This writeup covers the first challenge presented at DEFCON Finals, which kept us on our toes throughout the competition. It was deceptively simple, and also served as an introduction to cLEMENCy, a terrifying 9-bit middle endian architecture, as well as to the toolchain that was used to create all future binaries.

The majority of this writeup will involve getting our feet wet with cLEMENCy while setting up a comfortable environment for reverse engineering the later challenges released at DEFCON.

DEFCON Challenge Format

A link to the challenge binary can be found here

For the CTF we are given only an emulator and a debugger to solve each of the challenges; these challenges are compiled for a 9-bit middle endian architecture, cLEMENCy. When we first run the rubix challenge under this emulator, we note that only garbage gets printed to the terminal. This output doesn’t change each run, and it’s puzzling what exactly this is. What if it’s the emulator printing out 9-bit ascii instead of 8? Indeed, when we transform from 9-bit to 8-bit between the debugger and our terminal, we end up with the following:

This service implements a rubix cube. Solve the cube and win.

Give me 54 Bytes(ie: 53,12,5,etc): <input 54 comma-delimited chars>

                Top

             R6 R7 B0
             L1 T4 R5
             A0 B7 R2

  Left         Front       Right       Back

L8 L7 L0     T0 A7 F0    R8 F3 L6    B6 T3 T6
T5 L4 …
...Read more

Google CTF Quals 2017 - Moon

| Toshi Piazza

This writeup is for the reversing challenge “Moon” we solved during 2017 Google CTF Quals. This writeup and 3 others were also submitted to the Google CTF Writeup Competition.

Dealing with GLEW

A big problem we noticed early on was the use of GL3W, which generates code to lazily load all OpenGL functions at runtime—at all places where an OpenGL function was used we would simply see a call to some offset in the data section. Unfortunately we couldn’t find a script to re-symbolize function calls, but we plan to make one soon-ish :)

We can see the huge routine which calls LoadLibrary on every OpenGL function, at sub_4032c0.

GLEW lazy loading functions

Instead of symbolizing (by hand) all of the symbols, we only bothered to load ones which had valid XREFs to them, saving a bit of time.

Running the Program

Unfortunately all the RPI-sold computers from our year are not yet reported to support OpenGL 4.3, so first and foremost we had to patch the OpenGL verification check from 4.3 to 4.2. Surprisingly, this “just worked” despite the code making use of Compute Shaders which I had thought to be introduced in OpenGL 4.3.

The program simply opens up a window, and asks for a password. After we’ve entered 32 characters, the program either responds “good” (presumably), or “Nope”.

Running the program

When we XREF the string Nope, we see that it is used when constructing the texture to be printed for this SDL event loop iteration. Not too far from “Nope” do we find “Good”, and we notice that “Good” is only selected if a particular global variable is set. We trace this back to the following code in main:

We want should_compute here …

...Read more

Google CTF Quals 2017 - Food

| Toshi Piazza

This writeup is for the “Food” challenge found in Google CTF Quals 2017, from the reversing category. This writeup and others were also submitted to the Google CTF Writeup Competition.

JNI-Native Reversing

We are only given a food.apk, and from there we immediately unzip it and run dex2jar, followed by jd-gui to view the source code. Fortunately, the source is very succinct:

Although I’m not well versed in Android development, it looks like it’s loading a library libcook.so from either one of lib/{armeabi,x86}. We turn our attention to the arm library, and symbolize it using a script we found here. This resolves certain awkward indirect function calls that otherwise would appear as unnamed constant offsets into the JNI struct (e.g. JNI_FindClass).

The bizarre disassembly in JNI_Onload seems to indicate that all useful strings have been encrypted, and are only decrypted at runtime. One such example of this is shown below:

decrypt_string

Note that decrypt_string here, as we learn later is a variadic function with the following signature: char *decrypt_string(int a1, ...). We implement the decrypt_string function in python so that we can determine all the strings used by the application:

We can now resolve many …

...Read more

NorthSec 2017 rao_bash

| Toshi Piazza

This is a posthumous writeup of the rao_bash challenge hosted by NorthSec 2017. It features a recompiled and backdoored bash with a broken ELF header. This was solved after the CTF, because a small hint was dropped to us afterwards.

Fixing the ELF Header

First and foremost, we’d like to fix the ELF header of the executable. This ELF header interferes with all of our tools, and even the venerable file is having trouble with it:

$ file ./rao_bash_orig
./rao_bash_orig: ELF 64-bit MSB *unknown arch 0x3e00* (SYSV)

If that’s not enough of a hint, readelf seems to think that the entrypoint is some very large number,

$ readelf -a ./rao_bash_orig
ELF Header:
  Magic:   7f 45 4c 46 02 02 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, big endian
 ...
  Machine:                           <unknown>: 0x3e00
  Version:                           0x1000000
  Entry point address:               0x300c420000000000
...

At this point, it should be obvious that the entry point is correct, if interpreted in big endian, and readelf is so nice to show as much that the data itself is all big endian. We simply revert this by flipping the “\x02” to a “\x01” in the 5th byte of the ELF header (see the magic line). Now, all of our tools work as appropriate.

Reversing

Here is where the hint comes into place: bash is a huge executable, and unfortunately we could not find anything largely different between rao_bash and a bash we compiled ourselves with a similar compiler version, i.e. by using diaphora. However, it seems we missed a very quick-and-dirty xor, as seen below in the main function.

Quick and dirty xor

It looks like argv is checksummed, and if the checksum is valid it moves to sh_login_init(). Obviously, we don’t quite care about the checksum, as it’s very lossy …

...Read more