Thursday, July 6, 2017

Google CTF Quals 2017 - Web Assembly


This is a writeup on Web Assembly challenge from Google CTF Quals 2017, which took place on 6/16/17-6/18/17, and was categorized as a web challenge. This writeup was also submitted to the Google CTF Writeup Contest, and earned us $500. Thanks to Nick Burnett (irc: itszn) for solving the challenge and for authoring the writeup!

Challenge Description

https://i.imgur.com/tprIixZ.png


This challenge takes place on a very retro looking page the lets you drag assembly instructions from the sidebar into the main page. One button lets us compile the code, and another lets us run it. There are also a list of test cases. A quick glance at the source shows that they have implemented a simple assembly architecture and vm on the client side.

We are given several unminimized javascript files. I will quickly list what each is responsible for:
  1. asm.js - This file parses the input, and converts it into a "bytecode", which encodes the instructions as raw bytes.
  2. vm.js - This file contains the VM implementation that decodes the bytecode and runs the instructions.
  3. test.js - This file contains code to run a webworker with the VM. It also gives it the testcase input, and compares the worker's output to the expected output.
  4. worker.js - This is run by the webworker. It takes the input, runs the VM, and then responds with the output.
One of the testcases checks our code's output against the flag. If it matches, it will print the flag out to us. Since we cannot know the flag to output it, we can assume that we need to find a bug in the VM and gain javscript execution.

Bytecode Compilation


The first step is in asm.js, where our data is parsed and compiled to a byte code. This process is fairly straight forward. There are three datatypes:
  • int is simply a 32 bit integer value
  • float is simply a 64 bit float value
  • string is a set of bytes, which is prefixed by the length as an integer. Strings are decoded back to normal javascript strings later one.
If a label is used, it looks up the location as an integer. However, it also sets the high bit of the byte that represents what data type it is. This is important for later, so I modified the code to let me set the bit by prepending the type with a *.

The actual instructions are also encoded as a byte. Finally, the 'data section' is encoded similarly to the data types, and stored at the beginning of the output byte array.

VM Implementation


The virtual machine first decodes all of the instructions. Each one is replaced by a function calls the action with the given arguments. Each argument is decoded, and passed to this function. If the high bit is set (marking it as a pointer), the argument is replaced by a function that offset of memory: 


Here is the code for two important opcodes:
  • mov - Moves the second value into some offset of memory

  • get - Call the function associated with a file descriptor, and store the value in memory

getValue() will recursively call its input until an error is thrown. This is used to either return the non-function for the normal values, or call the function for the pointer values: 


The other opcodes are fairly straight forward, but we will only need these two for the final exploit anyway. All these functions are put into an array called memory, with the data section starting at index 1.

Breaking out of the VM

The bug in this implementation is pretty simple. When we access some value in memory, we are not limited to numbers, since we can use force a string to be a pointer by setting the high bit. When we do this we gain access to all the attributes of the javascript array such as __proto__.

Doing something like mov int 0 *string __proto__ ultimately performs the operation memory[0] = memory['__proto__'].


Background: __proto__ 


In Javascript pretty much every type is an Object. Objects have attributes that define what they do, many of which are backed by the native interpreter, depending on the object's type. These attributes can be accessed with either the . operator, or ['key'] notation.

Objects also have __proto__ attribute (which is also an object), that defines all attributes for the class of the object. When you access an attribute that is not a direct property of the instance of the object, Javscript will try to access it on the object's __proto__. Of course if it isn't a direct attribute of the
__proto__, it will check the __proto__'s __proto__ (remember, __proto__ is just an object too!). This is how Javascript does inheritance. If the attribute is not found anywhere, and a null __proto__ is reached, then it returns it as undefined.

Note that the same
__proto__ is shared for a given class, so if you modify it, objects of the same type will also be affected by the changes.


Accessing a Function Constructor


In javascript there are many ways to try and escape sandboxes. Our eventual goal will be to call eval('our data') or Function('our data')().

If our goal is to run Function('our data')(), we need to be able to arbitrarily call Function, however we don't actually have a reference to it anywhere. Luckily, you can also use constructor, as long as you have the constructor reference from a function.

Unfortunately for our current situation, memory is an array, not a function, so memory['constructor'] will only ever create an array. To bypass this, we can change the __proto__ of memory. As I said above, javascript will recursively search __proto__ until it finds the attribute you are looking for. If we are asking for constructor, it will search memory.__proto__ for constructor, and if not found look for it in memory.__proto__.__proto__.

So what if we replace memory.__proto__ with some function? Well constructor will be found in memory.__proto__.__proto__ which will happen to be the function's original __proto__!

If so many __proto__s confuse you, the TL;DR is that we can turn memory into a function object temporally, allowing us to access a function constructor.

All we need to do is mov string __proto__ *string someArrayFunction which hopefully become memory['__proto__'] = memory['someArrayFunction'].

The only problem now, is getValue(). As we recall, getValue()
will continue to call what ever we try to access. If we want to store a function, we need getValue() to return a function. The only way to do that is to cause an exception. Looking at the array's functions, I found __defineGetter__, which takes two arguments. If only one is given, such as in getValue(), it throws an exception. Perfect! So far our exploit is:


Calling the Function's Constructor


First we want to grab the constructor from the now-function memory object with mov int 0 *string constructor, which will do memory[0] = memory['constructor'].

The next challenge is to actually call it with our payload. It is easy to call, all we need to do is mov int 0 *int 0 but this will end up doing memory['constructor'](memory) thanks to getValue(). Unfortunately this throws an exception as it tries to call memory.toString(), but toString() is function's toString(), which does not expect an array.

We can fix this by restoring memory's __proto__ with an array, much like how we made it a function before.

However, where do we get an array? We can't even call memory's constructor to make one, since it is a function now... Luckily we recall the get opcode, mentioned earlier.  fds is an array with a normal __proto__, so we can do get string __proto__ string constructor which will run memory['__proto__'] = fds['__constructor'](). This makes memory an array again.

memory.toString() works again, but what does it actually produce? For an array, it functions like memory.join(','). This will give us our data separated by commas.

For this to be valid javascript, we can stick our payload at the start, and comment out the rest:



To do this, we can simply stick our payload in the data section, and move it to index 0, while moving a */ to a very far off index. Here is our payload now:



All this to was done in order to call Function('PAYLOAD/*,,,,,,*/')()!


Passing the Flag Test

Now that we have arbitrary javascript running, we need to figure out how to get the flag. The code is running in a webworker, which is somewhat sandboxed. It cannot access the dom, nor the location of the old page, which is where the flag is located.

So now we can look at how the parent is reading the response from the worker. We can send any responses we want now, so there may be a bug there too.

test.js sets the worker, with a callback for onmessage



We can see that if the answer attribute of the returned data object is not equal to the expected output, it will reject, and terminate our worker. However, if we are able to cause an exception before worker.terminate();, we will be able to continue sending guesses.

Looking at TestCaseError we can see that data.test is appended to a string, meaning toString() will be called:



It is easy to cause an exception here, by making test.toString not a function: 


Now we can guess as many times as we want. As long as we get it right once this code will be called:


We can do this easily:


At this point we would have to make this code both small enough to make <400 bytes encoded, and also make it play nicly with the parser. I decided not to do this, and instead use a nice feature of the webworker, importScripts. importScripts will synchronously load and run a javascript file, which is nice, because we can make our payload as long as we want now. Here is the final payload (remember the `*` syntax is something I modified myself to set the pointer bit): 



Running this with the 'Guess The Flag' test causes all the test cases to pass, and have it print the flag.

https://i.imgur.com/7QLH69I.png

Now we just need to submit it so it will run on the remote server. It took a few tries, because I kept getting 500 errors (although I knew it was working because I was getting requests for the payload file). Finally it went though:

https://i.imgur.com/RsvYLNS.png

The final flag is CTF{_r3m0v3_th3_c0mm4s_plz_kthxbye_}

Google CTF Quals 2017 - Moon

This writeup is for the reversing challenge "Moon" we solved during 2017 Google CTF Quals. This writeup and 3 others were also submitted to the Google CTF Writeup Competition.

Dealing with GLEW


A big problem we noticed early on was the use of GL3W, which generates code to lazily load all OpenGL functions at runtime--at all places where an OpenGL function was used we would simply see a call to some offset in the data section. Unfortunately we couldn't find a script to re-symbolize function calls, but we plan to make one soon-ish :)

We can see the huge routine which calls LoadLibrary on every OpenGL function, at sub_4032c0.


Instead of symbolizing (by hand) all of the symbols, we only bothered to load ones which had valid XREFs to them, saving a bit of time.

Running the Program


Unfortunately all the RPI-sold computers from our year are not yet reported to support OpenGL 4.3, so first and foremost we had to patch the OpenGL verification check from 4.3 to 4.2. Surprisingly, this "just worked" despite the code making use of Compute Shaders which I had thought to be introduced in OpenGL 4.3.

The program simply opens up a window, and asks for a password. After we've entered 32 characters, the program either responds "good" (presumably), or "Nope".


When we XREF the string Nope, we see that it is used when constructing the texture to be printed for this SDL event loop iteration. Not too far from "Nope" do we find "Good", and we notice that "Good" is only selected if a particular global variable is set. We trace this back to the following code in main:


We want should_compute here to be 2, meaning the memcmp succeeded. Buf2 is the following string:

30c7ead97107775969be4ba00cf5578f1048ab1375113631dbb6871dbe35162b1
c62e982eb6a7512f3274743fb2e55c818912779ef7a34169a838666ff3994bb4d
3c6e14ba2d732f14414f2c1cb5d3844935aebbbe3fb206343a004e18a092daba0
2e3c0969871548ed2c372eb68d1af41152cb3b61f300e3c1a8246108010d282e1
6df8ae7bff6cb6314d4ad38b5f9779ef23208efe3e1b699700429eae1fa93c036
e5dcbe87d32be1ecfac2452ddfdc704a00ea24fbc2161b7824a968e9da1db7567
12be3e7b3d3420c8f33c37dba42072a941d799ba2eebbf86191cb59aa49a80ebe
0b61a79741888cb62341259f62848aad44df2b809383e09437928980f

So, once our input is hashed, it must match the above value. The hashing of our input seems to occur at sub_401BF0, and through windbg we can confirm that our input is the first argument, and the hash is written out to the second argument. Nothing much happens here, however, though we do see references to glUseProgram and glDispatchCompute, as seen below:


Although we can see the fragment and vertex shaders in clear view in the strings, we can't see any reference to the compute shader glsl source. We assume it's encrypted somehow, so we break on glCompileShader and wait until the glsl compute shader is decrypted. This is a wild guess, as the shader could have been precompiled somehow but we punt on this.

The dumped source is as follows, after adding the proper formatting:

Reversing the Compute Shader


We note a few properties of the hashing function below:

  1. The h variable is constant for all iterations of this compute shader. It is dependent only on the password.
  2.  The only place in which the index, idx affects final is where we compute final ^= idx << i in a loop. This is completely reversible.
  3. Other than these 2 conditions, final is completely dependent on the current character.

Our goal here is to exploit these characteristics to find an idx-independent and password-independent hash for each character. This would allow us to brute force the password character by character. We can find the h of the real password like so:

hash_C  = reverse_idx(final_C) ⊕ hash_h
hash_C' = reverse_idx(final_C) ⊕ hash_h'

Here, hash_C is the value of the first index corresponding to the 'C' character in 'CTF' (we're assuming the password starts thus because of the flag format), for the real password, or the password we'd like to find. hash_C' is the value of the same index in our dummy string, say CTF{AAAAAAAAAAAAAAAAAAAAAAAAAAAA which also has a 'C' at the same index. hash_h represents the h value used to xor with the output of the hash function for the real password, and likewise hash_h' is this value of h for our dummy password.

Note that final_C is the same for both the real and the dummy password; it's passed out of the hash function. The reverse_idx function removes final_C's dependence on idx, reproduced below:

reverse_idx(final,idx) = final ⊕ (idx<<0) 
                               ⊕ (idx<<6) 
                               ⊕ ... 
                               ⊕ (idx<<26)

Finally, to compute hash_h, we simply need to perform the following:
hash_h = hash_C' ⊕ hash_C ⊕ hash_h'

Although it took a lot of work, we now have h of the target password, since we know hash_C', hash_C, and hash_h'. With this, we can figure out the hash value of every character in the password, independent of the character's index, and brute force them character by character.

Brute Forcing the Password


Unfortunately we were unable to replicate this algorithm forwards in C++, for some unknown reasons (probably something to do with the calc function). Since we were running out of time, we opted instead to compute a lexicon of characters up front with the debugger. We then took the hash of each character and made them idx-independent, as well as un-xor'd their h terms, like so:


Now, all we need to do is take Buf2's characters and render them also idx-independent, un-xor them with the known h for this password (which turns out to be 0x6f6f6f6f, computed in the previous section) and then match each hash with the known values in our lexicon, as below:


The resulting flag is: CTF{OpenGLMoonMoonG0esT0TheMoon}

Google CTF Quals 2017 - The X Sanitizer

This is a writeup on The X Sanitizer challenge from Google CTF Quals 2017, which took place on 6/16/17-6/18/17, and was categorized as a web challenge. This writeup was also submitted to the Google CTF Writeup Contest. Thanks to Nick Burnett (itszn on IRC) for solving the challenge and for authoring the writeup!

Challenge Description


Investigation: Index page


The site contains a text box, which we can enter html into. When the button is clicked, it runs some kind of sanitization program, and finally renders the output back to the screen. The page claims that the entire process is client side, and that there is no hidden server logic. From this and description, I would guess that the goal is to preform a Cross Site Scripting (XSS) attack on the page.

Background: Cross Site Scripting

Browsers try to protect users from malicious websites by using something called the Same Origin Policy (SOP). This policy controls what a website can and cannot do. For example a website can access its own cookies and read its own web pages, but it cannot read the cookies or data of another webpage. To define what a webpage is, we use the term origin. A page's origin in most cases is based on the domain name. So google.com is one origin, while facebook.com is another.

The fact that SOP blocks cookies is a good thing for the user, because most websites use cookies to tell if you are logged in. Reading another site's cookie would allow an attacker to log in as you.

However, I mentioned that websites can access their own cookies. Here is where XSS comes into play. If an attacker can run javascript on a website, they will have all the same permissions as the website, even if the script was not originally from the website (hence the name cross site scripting). Executing javascript on this origin will be our goal for this challenge.

Investigation: Santization system


Included from the index page was two javscript script file sanitize.js. We can see that it first takes our input in the Sanitize function. The code then spawns a service worker. Service workers are a feature in chrome which allow the client to server response to requests for a script. Below we can see the responses it sends as part of the fetch function:

  • /sandbox will append the contents of the url parameter html to this html which loads the sanitize script:

  • /sanitize will respond with a script that sets up a 1 second timer to respond to the parent, as well as a Content Security Policy (more on that in a second). It also creates a remove function which will either delete a given html node, or remove the documents contents:

  • Any other request will respond with a page that is designed to either be html or javascript, either way it will run the javascript (since it will request x which just returns this page again.) The purpose of this code is to delete whatever requested it, either the script tag, or the HTML import (we'll look at this more soon too):

The sanitize function first tries to remove a few black listed words from our input:

Since they run it with a loop, we cannot bypass it by simply doing something like <me<meta>ta>. However, it is good to keep in mind what they are trying to block.

Finally the function creates an iframe pointed at /sandbox?html=<OURINPUT> and lets it run. As we saw above, after 1 second the page will send its contents to us. Once we get that back, the script writes the contents to the page without any further sanitization. If we can get any javascript into here, we should be able to steal the cookies.

Investigation: Sandbox page


As I said, the script run in the sandbox page sets up a Content Security Policy (CSP) using the meta html tag. This policy consits of default-src 'none'; script-src *. This means that by default all requests and inline content will be blocked, but all script requests will be allowed (but not inline content). Seeing this we can also check the CSP of the main page to find it is script-src 'self' which will block all script requests not going to the same origin.

Background: Content Security Policy

A CSP is another tool the browser uses to protect sites. Like SOP it dictates what a site is allowed to do. However, these restrictions are actually enabled by the site itself, to protect it from things it might not normally do. For example, if a site never expects to run unsigned script tags, then if one appears, it is probably an attacker trying to preform an XSS attack. By setting script-src in the CSP, the site knows to block that tag. Good CSPs are very effective and can be very difficult to bypass.

To check for a CSP, first check the response headers of the site. If there is not one there, it can still be enabled with a <meta> HTML tag in the page header.

Sandbox

The CSP on the sandbox page also has a special feature. The sanitize script sets up a callback which will be called on securitypolicyviolation which will happen any time a request is blocked by the CSP. It calls the remove function, which will delete the element that caused the CSP to trigger, removing them from the final output of the sandbox!

The second feature is that any scripts we run will respond with the javascript that removes the script tag. This also tries to stop HTML imports. HTML imports are a way of loading another HTML page into the current page, and is useful for XSS since the browser will run anything we put on the other page (assuming the CSP doesn't stop it.) It is done like this: <link rel="import" href="page to load">

Here, importing any page will also respond with this response. The javascript will be ignored, but <script src=x></script> will run, and the same script will be loaded. querySelector('link[rel="import"]') looks for the link tag doing the import.

At first glance it seems that every way for us to run javascript is either blocked, or will cause our tags to be removed from the final output!

Sandbox Bypass


I found two ways to bypass the sandbox, and inject script tags into the main page. Both of them use the HTML import feature.

Method 1:

To respond to the parent, we saw that the sandbox uses a one second time:


When this timer triggers, anything still on the page will be send back to the script.

I found that by using the async feature of HTML imports, I could cause some to remain when time was up. Adding async to the import tag, causes the import to actually be loaded after the page has finished loading. This means that onload would have been triggered, and the timer would have started counting down. By adding a large number of these tags (around 500), some will remain by the time 1 second is up.

Method 2:

A simpler method, (probably the intended solution) is due to a flaw in their code. When the import removing code is run, it uses querySelector('link[rel="import"]') to find the link tag. However this will only locate the first link tag.

If we put <link rel="import"> and also <link rel="import" href="page to load>, then only the first will be deleted when the second is loaded!

Using either method, we can now do a HTML import on the main page. However there is a new problem! As I mentioned above, the main page has a CSP with script-src 'self'. This means that we can only run scripts and import pages from the sanitizer.web.ctfcompetition.com domain.

Bypassing script-src 'self'


Our goal is still to run javscript, but now we must find a way to load it from the somewhere on the challenge.

Injecting a Script Tag

Lets start by injecting a script tag using the HTML import we smuggled out of the sandbox. This is relatively easy, thanks to the sandbox page. We can url encode the script tag with javascript and put it as the html url parameter.


Requesting /sandbox?html=%3Cscript%20src%3D%22target%22%3E%3C%2Fscript%3E gives us


You may be worried about the sanitize script being run again, but luckily since our code doesn't actually 'activate' it, there is no client yet, so the logic causes it to 404:


Our payload so far:


Putting Javascript on /sandbox

Now we can load a script, but we can still only load from the sanitizer.web.ctfcompetition.com domain. We can try to put our script on /sandbox like we did the script tag, put that gives us problems, since the tags in the first part of the page is not valid javascript.

To bypass this we can use an encoding attack. An encoding attack is where we specifiy a multibyte encoding for the script. If we are lucky, all of the html junk will turn into one large valid identifier, thanks to javascript's unicode support.

If we were to load the page as utf16 big endian (specified as utf-16be), beginning turns into

㰡摯捴祰攠䡔䵌㸊㱳捲楰琠獲挽獡湩瑩穥㸊㰯獣物灴㸊㱢潤社

To prevent this from causing an error, we can append =0\n. Now we can also append our own cookie stealing payload and encode it as utf-16be and urlencode (for normal characters in utf-16be, the character is prepended by a null byte):


We can load it like this:


The script that is run is this:


Putting it all together


Now we can put that script into the import like we did before, and we should be good to go:


This gives us the final long payload


However, if we try this, we find there is still one problem! The original sandbox removes 'utf-16be' from our input:


This is easy to bypass, as we can just url encode utf-16be to utf-16b%65 with this:


The final corrected payload is


Waiting for the request back we see


And we have captured the flag! CTF{no-problem-this-can-be-fixed-by-adding-a-single-if}