Thursday, July 6, 2017

Google CTF Quals 2017 - Web Assembly

This is a writeup on Web Assembly challenge from Google CTF Quals 2017, which took place on 6/16/17-6/18/17, and was categorized as a web challenge. This writeup was also submitted to the Google CTF Writeup Contest, and earned us $500. Thanks to Nick Burnett (irc: itszn) for solving the challenge and for authoring the writeup!

Challenge Description

This challenge takes place on a very retro looking page the lets you drag assembly instructions from the sidebar into the main page. One button lets us compile the code, and another lets us run it. There are also a list of test cases. A quick glance at the source shows that they have implemented a simple assembly architecture and vm on the client side.

We are given several unminimized javascript files. I will quickly list what each is responsible for:
  1. asm.js - This file parses the input, and converts it into a "bytecode", which encodes the instructions as raw bytes.
  2. vm.js - This file contains the VM implementation that decodes the bytecode and runs the instructions.
  3. test.js - This file contains code to run a webworker with the VM. It also gives it the testcase input, and compares the worker's output to the expected output.
  4. worker.js - This is run by the webworker. It takes the input, runs the VM, and then responds with the output.
One of the testcases checks our code's output against the flag. If it matches, it will print the flag out to us. Since we cannot know the flag to output it, we can assume that we need to find a bug in the VM and gain javscript execution.

Bytecode Compilation

The first step is in asm.js, where our data is parsed and compiled to a byte code. This process is fairly straight forward. There are three datatypes:
  • int is simply a 32 bit integer value
  • float is simply a 64 bit float value
  • string is a set of bytes, which is prefixed by the length as an integer. Strings are decoded back to normal javascript strings later one.
If a label is used, it looks up the location as an integer. However, it also sets the high bit of the byte that represents what data type it is. This is important for later, so I modified the code to let me set the bit by prepending the type with a *.

The actual instructions are also encoded as a byte. Finally, the 'data section' is encoded similarly to the data types, and stored at the beginning of the output byte array.

VM Implementation

The virtual machine first decodes all of the instructions. Each one is replaced by a function calls the action with the given arguments. Each argument is decoded, and passed to this function. If the high bit is set (marking it as a pointer), the argument is replaced by a function that offset of memory: 

Here is the code for two important opcodes:
  • mov - Moves the second value into some offset of memory

  • get - Call the function associated with a file descriptor, and store the value in memory

getValue() will recursively call its input until an error is thrown. This is used to either return the non-function for the normal values, or call the function for the pointer values: 

The other opcodes are fairly straight forward, but we will only need these two for the final exploit anyway. All these functions are put into an array called memory, with the data section starting at index 1.

Breaking out of the VM

The bug in this implementation is pretty simple. When we access some value in memory, we are not limited to numbers, since we can use force a string to be a pointer by setting the high bit. When we do this we gain access to all the attributes of the javascript array such as __proto__.

Doing something like mov int 0 *string __proto__ ultimately performs the operation memory[0] = memory['__proto__'].

Background: __proto__ 

In Javascript pretty much every type is an Object. Objects have attributes that define what they do, many of which are backed by the native interpreter, depending on the object's type. These attributes can be accessed with either the . operator, or ['key'] notation.

Objects also have __proto__ attribute (which is also an object), that defines all attributes for the class of the object. When you access an attribute that is not a direct property of the instance of the object, Javscript will try to access it on the object's __proto__. Of course if it isn't a direct attribute of the
__proto__, it will check the __proto__'s __proto__ (remember, __proto__ is just an object too!). This is how Javascript does inheritance. If the attribute is not found anywhere, and a null __proto__ is reached, then it returns it as undefined.

Note that the same
__proto__ is shared for a given class, so if you modify it, objects of the same type will also be affected by the changes.

Accessing a Function Constructor

In javascript there are many ways to try and escape sandboxes. Our eventual goal will be to call eval('our data') or Function('our data')().

If our goal is to run Function('our data')(), we need to be able to arbitrarily call Function, however we don't actually have a reference to it anywhere. Luckily, you can also use constructor, as long as you have the constructor reference from a function.

Unfortunately for our current situation, memory is an array, not a function, so memory['constructor'] will only ever create an array. To bypass this, we can change the __proto__ of memory. As I said above, javascript will recursively search __proto__ until it finds the attribute you are looking for. If we are asking for constructor, it will search memory.__proto__ for constructor, and if not found look for it in memory.__proto__.__proto__.

So what if we replace memory.__proto__ with some function? Well constructor will be found in memory.__proto__.__proto__ which will happen to be the function's original __proto__!

If so many __proto__s confuse you, the TL;DR is that we can turn memory into a function object temporally, allowing us to access a function constructor.

All we need to do is mov string __proto__ *string someArrayFunction which hopefully become memory['__proto__'] = memory['someArrayFunction'].

The only problem now, is getValue(). As we recall, getValue()
will continue to call what ever we try to access. If we want to store a function, we need getValue() to return a function. The only way to do that is to cause an exception. Looking at the array's functions, I found __defineGetter__, which takes two arguments. If only one is given, such as in getValue(), it throws an exception. Perfect! So far our exploit is:

Calling the Function's Constructor

First we want to grab the constructor from the now-function memory object with mov int 0 *string constructor, which will do memory[0] = memory['constructor'].

The next challenge is to actually call it with our payload. It is easy to call, all we need to do is mov int 0 *int 0 but this will end up doing memory['constructor'](memory) thanks to getValue(). Unfortunately this throws an exception as it tries to call memory.toString(), but toString() is function's toString(), which does not expect an array.

We can fix this by restoring memory's __proto__ with an array, much like how we made it a function before.

However, where do we get an array? We can't even call memory's constructor to make one, since it is a function now... Luckily we recall the get opcode, mentioned earlier.  fds is an array with a normal __proto__, so we can do get string __proto__ string constructor which will run memory['__proto__'] = fds['__constructor'](). This makes memory an array again.

memory.toString() works again, but what does it actually produce? For an array, it functions like memory.join(','). This will give us our data separated by commas.

For this to be valid javascript, we can stick our payload at the start, and comment out the rest:

To do this, we can simply stick our payload in the data section, and move it to index 0, while moving a */ to a very far off index. Here is our payload now:

All this to was done in order to call Function('PAYLOAD/*,,,,,,*/')()!

Passing the Flag Test

Now that we have arbitrary javascript running, we need to figure out how to get the flag. The code is running in a webworker, which is somewhat sandboxed. It cannot access the dom, nor the location of the old page, which is where the flag is located.

So now we can look at how the parent is reading the response from the worker. We can send any responses we want now, so there may be a bug there too.

test.js sets the worker, with a callback for onmessage

We can see that if the answer attribute of the returned data object is not equal to the expected output, it will reject, and terminate our worker. However, if we are able to cause an exception before worker.terminate();, we will be able to continue sending guesses.

Looking at TestCaseError we can see that data.test is appended to a string, meaning toString() will be called:

It is easy to cause an exception here, by making test.toString not a function: 

Now we can guess as many times as we want. As long as we get it right once this code will be called:

We can do this easily:

At this point we would have to make this code both small enough to make <400 bytes encoded, and also make it play nicly with the parser. I decided not to do this, and instead use a nice feature of the webworker, importScripts. importScripts will synchronously load and run a javascript file, which is nice, because we can make our payload as long as we want now. Here is the final payload (remember the `*` syntax is something I modified myself to set the pointer bit): 

Running this with the 'Guess The Flag' test causes all the test cases to pass, and have it print the flag.

Now we just need to submit it so it will run on the remote server. It took a few tries, because I kept getting 500 errors (although I knew it was working because I was getting requests for the payload file). Finally it went though:

The final flag is CTF{_r3m0v3_th3_c0mm4s_plz_kthxbye_}