34C3CTF vim

Lense

Challenge description

Category: rev
Points: 215 (dynamic, calculated from solves)
Solves: 18

This is not compatible with emacs!

Tested with vim 7.4 on amd64 Ubuntu 16.04 (package vim 2:7.4.1689-3ubuntu1.2)

Difficulty: Easy

Preface

There are a lot of words here. If you looked at the challenge during the competition and just missed a few things, you might want to skip ahead to the high-level loop summary section.

Understanding the challenge file

Vim refresher

If you’re a complete Vim beginner, I recommend going through vimtutor. Just run it from a shell.

Here’s a list of every command that this challenge uses (all in normal mode):

  • h
  • j
  • k
  • l
  • ^
  • $
  • 0
  • G
  • f
  • F
  • t
  • Y
  • y
  • @
  • "
  • r (only after you get the flag)
  • d (only after you get the flag)

If you don’t know what some of them do, look them up in Vim’s help like: :help j or :h j. You might also be interested in the general concepts:

  • count (and the rest of notation or the whole intro.txt file)
  • registers
  • quote_quote
  • quote_alpha

and the command :registers for debugging. As an aside, the most common way I know to interact with registers is with q, which puts typed characters into the given register as a way to record macros. qx@x was a pattern I learned when I was a beginner, but I didn’t make the connection to registers for a long time.

Another thing to know is that @ execution will stop when a command fails to execute. For example, qx0hl@x will put 0hl in register "x and try to run it, but because 0h goes to the first character, h can’t move left and it will stop executing before moving right with l. I found this out a while ago experimentally—I’m sure this is documented somewhere, but I haven’t been able to find it. If you know where it is, please let me know.

A note of caution: if you run Vim without a .vimrc or in vi-compatible mode, 'cpoptions' will be set to all flags, which includes >. The help for cpo-> says: “When appending to a register, put a line break before the appended text.” That will break the challenge. Make sure the 'cpoptions' setting doesn’t contain >, and if it does, run :set cpoptions-=>.

Challenge file sections

The file takes in the flag as input and runs a program on it. Like any program, it consists of code and data (with some overlap).

  • Lines 1-3 and 5 are fixed data which aren’t used very much.
  • Line 4 is the flag, used as input data.
  • Lines 6-9 are data. It seems like they were created to look like code, but they’re garbage code used as part of a lookup table.
  • Lines 10-259 are code snippets in a grid interspersed with garbage data.
  • Lines 260-263 are code with garbage data at the ends of the lines.

Debugging Vim

There are lots of ways to debug vimscript, but I couldn’t find any way to print a log of normal commands, so I wrote a patch:

index f9a0124e2..8e54b43c2 100644
--- a/src/normal.c
+++ b/src/normal.c
@@ -1141,6 +1141,22 @@ getcount:
        }
     }

+    FILE *f = fopen("normal_log.txt", "a");
+    char_u *regcc;
+    for (int i=10; i<=15; ++i)
+    {
+       regcc = get_reg_contents(get_register_name(i), GREG_NO_EXPR);
+       if (regcc)
+       {
+           fprintf(f, "    \"%c: <%s>\n", get_register_name(i), regcc);
+           free(regcc);
+       }
+    }
+    if (ca.opcount == 0)
+       fprintf(f, ">>>%c%c at line %ld col %d\n", ca.cmdchar, ca.nchar, curwin->w_cursor.lnum, curwin->w_cursor.col + 1);
+    else
+       fprintf(f, ">>>%ld%c%c at line %ld col %d\n", ca.opcount, ca.cmdchar, ca.nchar, curwin->w_cursor.lnum, curwin->w_cursor.col + 1);
+    fclose(f);
     /*
      * Execute the command!
      * Call the command function found in the commands table.

It gave a trace that looked like:

...
>>>19l at line 9 col 17
    "a: <20>
    "b: <2088j>
    "c: <0lllllll>
    "d: <f">
    "e: <fA>
    "f: <fG>
>>>@b at line 9 col 36
    "a: <20>
    "b: <2088j>
    "c: <0lllllll>
    "d: <f">
    "e: <fA>
    "f: <fG>
>>>2088j at line 9 col 36
    "a: <20>
    "b: <2088j>
    "c: <0lllllll>
    "d: <f">
    "e: <fA>
    "f: <fG>
...

This was useful for making sure what I thought was happening was actually happening, but turned out to not help with actually solving the challenge.

For quick repeatability, run the challenge with:

vim -u DEFAULTS -c 'norm GY@"' vim-5ca46d1e8afdc0b30b25fdf8f69f868b33a16241.txt

As always, you can find out what vim things do with :help, e.g. :h -u.

What happens when you run the file

Entry point

The instructions say to type GY@", which will go to the last line, copy it, and run the line as if you had typed it (remember, "—unnamed register—is automatically the last thing that was copied).

Line 263

6Gf2"ayl^f0"cyl"Ayl^fh"Ayl^fj"Ayl^fG"gyl^fk"GylG$FY"Gy3l$F@"hylFh"Hyl260GY@"F10H

The last line of the file is 263, above. I found it helpful when reading to insert a newline whenever the cursor goes to a new line:

1  6Gf2"ayl^f0"cyl"Ayl^fh"Ayl^fj"Ayl^fG"gyl^fk"Gyl
2  G$FY"Gy3l$F@"hylFh"Hyl
3  260GY@"
4  F10H

I’m going to describe this line in great detail to get you familiar with the kinds of things the file does. After this, I’ll be briefer.

Part 1
  1. Go to line 6, go to next 2 in the line (col 52), put it in "a
  2. Go to beginning of line, go to next 0 (col 4), put it in "c and append it to "a
  3. Go to beginning of line, go to next h (col 43), append it to "a, go to next j (col 53), append it to "a
  4. Go to beginning of line, go to next G (col 49), put it in "g
  5. Go to beginning of line, go to next k (col 3), append it to "g

In summary:

20hj -> "a
0 -> "c
Gk -> "g

(in my notation the first part is literal characters, -> means assigned to, and "x is the register x)

Part 2
  1. Go to last line (263, self-referential), go to end of line, go to previous Y in the line (col 74), append the next 3 characters (YG") to to "g
  2. Go to end of line, go to previous @ (col 75), put it in "h
  3. Go to previous h (col 65), append it to "h

Cumulative summary:

20hj -> "a
0 -> "c
GkY@" -> "g
@h -> "h
Part 3

Go to line 260, copy it, and run it.

Part 4

This looks like garbage, and only runs if line 260 finishes successfully (spoiler: it never will).

Main loop

Lines 260 and 261 constitute the main loop of the program.

Line 260

1Gff"dyl"eyl"fyl
4G@c"Dyl
6G@dhj"Eylj"Fyl
1Gfl"Cyl
6Gf2"bylF0"Byl
261GY@"
3c043Bh4Hf5#

Line 260 mainly sets up registers for use in line 261. In my notation <...> is a non-literal string, fc is “flag character” (how this works is explained below), and the lookup functions are character substitutions defined later.

<"c>l -> "c
f<fc> -> "d
f<lookup1(fc)> -> "e
f<lookup2(fc)> -> "f
20 -> "b

After this it runs line 261. The end of the line 3c043Bh4Hf5# is garbage never executed.

I’ll stop and explain the purposes of some of the registers now. "a and "b are motions, and I’ll cover them when I talk about line 261.

"c

This register is the loop counter that indexes into the flag. In the first loop iteration, it’s 0, which goes to the first character. In the second loop iteration, it’s 0l (from line 260 appending a l), which goes to the second character. In the third, it’s 0ll, and so on. If the flag isn’t reached and it tries to index past the end of the flag, it will error and stop. Running the program and having the cursor end at the end of your flag means that you didn’t break anything but that your flag wasn’t correct.

"d

This is in the format fX, where X is the flag character this loop iteration is processing, and it is used as the first part of the lookup. In each loop it is executed on line 6 and moves to the current flag character. This means that all flag characters have to be in line 6 somewhere.

"e and "f

These are also in the format fX and are used in line 261 to set "a and "b. They are generated from a lookup function which works like this:

  1. Go to the flag character in line 6 (done by executing "d)
  2. Move down once and left once, and append the character to "e
  3. Move down again and append the character to "f

I’ll go into more detail about the substitution mapping in the loop mechanics section.

Line 261

1 9G@e19l@a"Byt.
2 6Gfj"Byl
3 6Gf2"aylF0"Ayl
4 9G@f19l@b"Ayt.
5 6Gfj"Ayl40|@a
6 260GY@"
7 @1fBkya5B^
  1. Go to line 9, run "e, go right 19, run "a, add contents of cell to "b
  2. Add j to "b (making it 20<cell>j)
  3. Put 20 in "a
  4. go to line 9, run "f, go right 19, run "b, add contents of cell "a
  5. Add j to "a (making it 20<cell>j), go to line 6 col 40, run "a
  6. Run line 260
  7. Garbage

Now we’re getting into the conceptually harder stuff. It uses "e and "a to get "b, uses "f and "b to get "a, and then runs "a. I’ll talk more about possible actions in the loop mechanics section.

"a comes from the last iteration, "b is just a temporary variable, and "e and "f come from the flag character, so the inputs are only "a and the flag character.

High-level loop summary

Every loop iteration applies some function f to the current flag character and "a and assigns the result to "a. The whole program can be summarized as:

20hj -> "a
for fc in flag:
    f(fc, "a) -> "a

Loop mechanics

It was hard to fit the whole loop structure into my head at one time. I had to go through a lot of patched Vim traces and write out the assignments for a few loop iterations (which I’d recommend; it helped a lot).

There are two processes: converting the flag character into "e and "f and converting those and "a into a new "a. I’ll now give examples of each with the first character: 3 (from the flag prefix: 34C3_).

  1. Find 3 in line 6. Put the cursor on column 9.
  2. Move down and to the left (line 7, column 8).
  3. Append the character (,) to "e, making f,
  4. Move down (line 8, column 8).
  5. Append the character (G) to "f, making fG

So the first part of the loop maps the flag character into a character in line 7 and the character below it, and gets ready to find those two characters in line 9 in the second part of the loop:

  1. Go to line 9 and run "e (f,), moving to column 22. Then move right 19, to column 40.
  2. Run "a (initialized as 20hj), moving to row 10, column 20.
  3. Append that cell (h33) and also a j to "b (initialized as 20), making it 20h33j.
  4. Go to line 9 and run "f (fG), moving to column 17. Then move right 19, to column 36.
  5. Run "b (20h33j), moving to row 42, column 16.
  6. Append that cell (h103) and also a j to "a (reinitialized as 20), making it 20h103j.
  7. Run "a (20h103j), which doesn’t do anything useful because it’s just a motion and the next command is to run line 260.

One important thing to notice is that there are only certain columns in line 9 that the program can go to. If you calculate the possible "e and "f, all from lines 7 and 8, you notice that there aren’t very many: ,G"@l2_hA. Now look at line 9 and find the first characters of each of those:

~_~~~~@~A~~"~~~~G~~~~,~~~~l~~~~h~~~~2~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Except for A (which is never used, idk) these repeat evenly. Since "a and "b always start with either 20h or 20l, and the program moves right 19 from these columns, the possible columns after running "a or "b are the columns with characters in the row above +19±20, or either -1 or +29.

So we know what columns the cells start at, and we know that they end with .s. That means that the data after the first . is garbage. I replaced unreachable places with .s and repeated the header characters, making my file look like this:

~_~~~~@~~~~"~~~~G~~~~,~~~~l~~~~h~~~~2~~~~_~~~~@~~~~"~~~~G~~~~,~~~~l~~~~h~~~~2~~~
l250.l35..l179.l240.h33..h208.l64..h96..h62..l107.l243.l175.h191.l36..h32..l223.
l208.h129.l207.h10..l210.h200.h190.l3...h153.l141.h105.l7...h62..l160.h200.h231.
l216.h227.l20..h145.l156.l232.l187.h220.h227.l208.l93..l123.h79..h210.h21..l105.
l219.l106.h150.h118.l85..h20..h192.h93..h76..h142.h53..h68..l125.h162.h72..h44..
h79..l201.h157.h102.h168.h216.l64..h173.l189.h25..h219.h134.h131.l232.l85..h169.
...

This looks much easier to deal with. It also reveals the format of "a and "b more clearly: 20, then h or l, then a number 1-250, then j.

The data section is 16 cells wide × 250 lines tall = 4000 total cells to deal with. That’s too much to deal with by hand, but it should be fast to handle with a program.

Before we do anything, though, we have to figure out what the goal is.

Winner message

Line 262

This is the last code line we haven’t talked about.

6Grwlrilrnlr!ld$
jdG@h
#!EdGG8E4f$5F15rdh3n839.l7gfiC$#HgBll8h^aHBFE"kBHn71^2gcj.c

It rewrites the first code line to be winner!, deletes the rest of the file, and runs register "h, which turns out to be an infinite loop (CTRL-C to break).

"g

There isn’t an explicit reference to line 262 like there is to other code lines (like 261G), but notice what the entry line 263 sets "g to GkY@". That looks good.

Line 108

There is the only instance of @g, and it’s the goal cell we want to reach: line 108, column 0.

Thinking back to the last part of line 261, it pointlessly runs "a at the end, right before looping. Except, it’s not pointless if "a contains @g.

So overall, the goal is to chain characters of the flag to make "a go to line 108 col 0.

Solving

First, a summary of what’s known so far:

  • The file is a program that moves around based on the entered flag.
  • It loops over each flag character and uses that and the current state of "a to decide the movements and the next "a.
  • The goal is to move to line 108 column 0, which will set "a to @g and display the winner message.
  • The possible values of "a are motion combinations that move the cursor to different cells.
  • There are only 4000 cells and so only 4000 different possible states.

I spent a lot of time trying to work backwards from the goal, but that turned out to be fruitless. I was scared of writing a brute forcer that would require a complicated emulator and take forever, and I didn’t connect the dots that there were only 4000 possible states total, not per flag character. Once I got past that, the solution turned out to not be that bad.

Each possible flag maps to an "a value in a many-to-one relationship. For each "a there can be many flags, but if only the shortest one in the set is considered, it’s almost a one-to-one relationship (the exceptions would be "as that are impossible to reach, which we don’t care about, and "as with multiple equal length flags. Assume that the challenge author is nice and that won’t happen). If we build up that table, which of size 4000 at most, we get the flag.

Think of each cell of the grid as a graph node with characters of the flag a directional edge to other nodes. The graph has 4000 nodes and many edges. The goal is to find the shortest path from a start node (34C3_ maps to an "a of 20l48j) to the goal node (@g). The algorithm that came to mind was a breadth-first search. A BFS will find the shortest flag, and if it remembers visited nodes, it should be fast as well.

I wrote up some Python and got the flag: 34C3_MgOSwZm9WFTRKYvCXFXXxpX9cs4

Code

This will print out the flag in about a second. It features the simplest possible Vim normal mode emulator for the commands hjklfF, a naïve breadth-first search algorithm, and absolutely zero error checking.

#!/usr/bin/python3
from queue import Queue


class Cursor(object):
    def __init__(self, r_=1, c_=1):
        self.r = r_
        self.c = c_

    def __repr__(self):
        return f'(r{self.r} c{self.c})'


# k, ", and . make the program do bad things
FLAG_CHARS = '34C_W0KVmQrFpgcZy8eY7bsBSETUvwiM5LPzuNofhn6Ox1G92jdaXlDRtHqJAI'

a = '20hj'
b = '20'
e = 'f'
f = 'f'

csr = Cursor()
buf = ['']

path_table = dict()
flag_q = Queue()


def csrchar():
    return buf[csr.r][csr.c]


def run(keys):
    i = 0
    count = 0
    while i < len(keys):
        if keys[i].isdigit():
            count = int(str(count) + keys[i])
        else:
            if count == 0:
                count = 1

            if keys[i] == 'h':
                csr.c -= count
            elif keys[i] == 'l':
                csr.c += count
            elif keys[i] == 'j':
                csr.r += count
            elif keys[i] == 'k':
                csr.r -= count
            elif keys[i] in 'fF':
                inc = 1 if keys[i] == 'f' else -1
                i += 1
                while csrchar() != keys[i]:
                    csr.c += inc
            count = 0
        i += 1


def runloop(fc, a):
    # 260
    csr.r, csr.c = 6, 1
    run('f' + fc + 'hj')
    e = 'f' + csrchar()
    csr.r += 1
    f = 'f' + csrchar()
    b = '20'

    # 261
    csr.r, csr.c = 9, 1
    run(e)
    csr.c += 19
    run(a)
    while csrchar() != '.':
        b += csrchar()
        csr.c += 1
    b += 'j'
    a = '20'

    csr.r, csr.c = 9, 1
    run(f)
    csr.c += 19
    run(b)
    while csrchar() != '.':
        a += csrchar()
        csr.c += 1
    a += 'j'

    return a


def breadth_first_search(flag, a):
    ret_a = runloop(flag[-1], a)
    if '@g' in ret_a:
        print(flag)
        exit(0)
    if ret_a not in path_table:
        path_table[ret_a] = flag
        for new_fc in FLAG_CHARS:
            flag_q.put((flag + new_fc, ret_a))


if __name__ == '__main__':
    with open('vim-5ca46d1e8afdc0b30b25fdf8f69f868b33a16241.txt', 'r') as fl:
        buf = ['*' * 80] + ['*' + line for line in fl.readlines()]
    flag_prefix = '34C3_'
    buf[4] = '*' + flag_prefix
    for fc in flag_prefix:
        a = runloop(fc, a)
    for new_fc in FLAG_CHARS:
        flag_q.put((flag_prefix + new_fc, a))
    while not flag_q.empty():
        breadth_first_search(*flag_q.get())