prettify code

2018年4月5日 星期四

[Write-up] 0ctf quals 2018 - pwn1000 Mighty Dragon

Glad to say that we (HITCON) are the only team solved this challenge :D
But in my opinion this challenge is more like a reversing but not a pwnable one.
This is a pwn challenge - but the pwn part is extremely easy, and the hardest part is to understand what is this binary doing. Anyway, let's see the details.

0x01 Basic Info

The binary, ELF interpreter, and are given, you can find them in my repository.

This is an arm 32bit challenge, and as the description, it will be launched by qemu. As I know all modern protections such as NX, ASLR will be disabled in qemu, so it will be extremely easy to exploit once we find the vulnerability(ies).

0x02 Reversing

Okay, let's face the most painful part.

The main function can be simplified into:
int main() {
  char input[256], output[256];
  int inlen = read(0, input, 130), outlen;
  for(int i=0;i<100;i++) {
    process(input, inlen, output, &outlen);
    memcpy(input, output, outlen);
    inlen = outlen;
  last_process(input, inlen);
130 bytes will be read to a stack buffer, then process it 100 times.

First introduce a commonly used structure in this binary before we go on:
struct Vector {
  u8 id;
  u8 buflen;
  u8 buf[buflen];
Notice that struct Vector is not a valid structure in C (not a fixed-size structure), but it's clear to express it in this way.

process uses a global variable context to record information during processing.
BTW, the only different between process and last_process is last_process uses a local variable (on stack) instead of the global one.

Method process does the following:
1. Fetch struct Vector(s) from input
2. Call functions with each vector according to their id
    - The functions called here will record some information (bitwise flags/content of buf, etc.) on context
3. Collect the records in context to generate content of output (will be the input for next process)

There're some checks during processing, once the format or value is not expected, the content of output will be set to all-zeros.

The most important thing we need to know is what the functions in step 2 do.

There are 7 different functions for handling struct Vectors with id equal to 0 to 8.
Though these functions are not really complicated, but a voice kept saying in my brain when I reversing these functions:
What the hell are they trying to do?

Everything seems make no sense, some functions even just copy bits from vector to context and copy it back. And because of its interesting challenge name and a nice banner when launching the binary, I expected it's a game or a VM challenge, which is totally wrong.

Let's get to the point. After some reversing, I found almost all vectors will remain unchanged after processing, except the vector with id equals to 8.
Function sub_10f58 simply copies the buffer buf in vector with id equals to 8 onto a buffer of context. But after passing some checks, function sub_11120 is called and manipulates the buffer of context.

Understanding what's sub_11120 doing is the key part, the source code of it can be simplified as:
void sub_11120(Context* context) {
  /* local variables */
  u8 copied_vec[255] = {};
  u8 record[60] = {};
  u8 i, j;
  int bitmask = 0;
  memcpy(copied_vec, context->id8buf, context->id8buflen);
  /* fetching more vectors in copied_buf */
  while(i < context->id8buflen) {
    u8 id = copied_vec[i];
    if(id == 15) {  /* special check */ }
    if(id <= 19u) {
      bitmask |= 1 << id;
      record[3 * id] = id;
      record[3 * id + 1] = copied_vec[i + 1]; // buflen
      record[3 * id + 2] = i;
    i += copied_vec[i + 1] + 2; // i += buflen  i.e. to next vector
  if(/* special check passed */) {
    /* do a strange counting sort */
    i = 0;
    memcpy(context->id8buf, &copied_vec[record[2]], record[1] + 2);
    i += record[1] + 2;
    memcpy(&context->id8buf[i], &copied_vec[record[47]], record[46] + 2);
    i += record[46] + 2;
    for ( j = 1; j <= 19u; ++j ) {
      if ( j != 15 && (bitmask & (1 << j)) == 1 << j ) {
          &copied_vec[record[3 * j + 2]],
          record[3 * j + 1] + 2);
        i += record[3 * j + 1] + 2;
(sorry, code is still long after simplified)

The content of id 8's buffer are struct Vectors as well, and what this function do is sorting. It sorts the vectors in ascending order according to their id. But an exception is the sorting always puts two vectors with id equals to 0 and 15 first, no matter these two vectors exist or not.

For example, let the content of context->id8buf be:
[{0, 0}, {5, 1, 255}, {4, 2, 255, 255}, {15, 3, 1, 2, 3}]
There're 4 vectors here, with id [0, 5, 415] and buflen [0, 1, 2, 3], respectively.
Now do the strange counting sort, which sorts id in ascending order but always let id 0 and 15 go first, so the sorting result will be:
[{0, 0}, {15, 3, 1, 2, 3}, {4, 2, 255, 255}, {5, 1, 255}]

In short, in one process, output is almost same as input, except the nested vectors in id 8 will be sorted.

There's a buffer overflow when performing counting sort. Notice that indexes ij in sub_11120 is 1-byte only, so if we set buflen(s) properly, buffer overflow on context will occur in later memcpy(s)! However, as we mentioned before, context is a global variable in the first 100 times of processing, only last_process uses a local variable context. So the target is clear now:

1. Construct an input that is valid for the first 100 processing
2. For the 101th process, the input still can trigger the buffer overflow, then we can control PC and jump to shellcode

0x03 Buffer Overflow

First try to create an input that triggers BOF, and next section shows how the payload be generated after 100 times of processing.

An example of overflow is let the content of id 8 be:
[{0, 0}, {15, 1, 0xb0}, {2, 4, 0xef, 0xbe, 0xad, 0xde}, {1, 0xfb}]
(An element 0xb0 in id 15 is for passing the special check in sub_11120.)

The sorting result will be:
[{0, 0}, {15, 1, 0xb0}, {1, 0xfb, 0, 0, 0, ..., 0}, {2, 4, 0xef, 0xbe, 0xad, 0xde}]
           length =  2 + 3 + 2 + 0xfb + 2 = 0x104
The buffer length of context is 256 bytes, so BOF occurs.

However, we cannot use this example as our input payload because it becomes
[{0, 0}, {15, 1, 0xb0}, {1, 0xfb, 0, 0, 0, 0, 0, 0}]
after one processing, but we need it can 'hold on' 100 times.

0x04 Brain Teaser

It took me hours to think how to create such payload that can trigger BOF after 100 times of processing.

I tried to construct a payload that has one less byte after each processing, and trigger BOF at the 101th time. But with the limit of input length (130 bytes), I only came up with a payload that can be processed 68 times.

After hours of trying and failed, suddenly I came up with an idea:
Why not construct a payload that will remain unchanged and cause BOF in every process?

Since context is a global variable for the first 100 times of processing, BOF will not have any side effect.

With this thought I succeeded in a short time. My construction is as following:
[{0, 0}, {15, x, 0xb0, ..., 3, y+4}, {2, y, ...}, {1, 0xfa-y}]
We will decide what the two variables x and y should be later, they are used for setting proper overflow length and making space for writing shellcode.

With this payload, the fetching of nested vectors are with id [0, 15, 2, 1, 3] and buflen [0, x, y, 0xfa-y, y+4], respectively.
Notice that here we uses the integer overflow of the 1-byte i, to treat the end of data of id 15 becomes start of another vector {3, y+4, 2, y, ..., 1, 0xfa-y}.

Let's see what will happen during counting sort:
1. copy 0 and 15: [{0, 0}, {15, x, 0xb0, ..., 3, y+4}]
2. copy 1: [{00}, {15, x, 0xb0, ..., 3, y+4}, {1, 0xfa-y, 0, 0, ..., 0}]
3. copy 2: [{00}, {15, x, 0xb0, ..., 3, y+4}, {10xfa-y00, ...0}, {2, y, ...}]
4. copy 3: [{00}, {15, x, 0xb0, ..., {3, y+4, 2, y, ..., 1, 0xfa-y}]

Overflow occurs at step 3, and step 4 makes the output exactly same as input!

Now we can calculate how to set x and y to control PC after overflow.
In step 3, totally 2 + (2 + x) + (2 + 0xfa - y) + (2 + y) = 0x102 + x are copied, and using GDB we can find the return address is stored at buf+0x108. All we need is let x be bigger than or equal to 0x108 - 0x102 + 4 = 10, and let the y >= x + 2 bytes of vector id 2 be full of shellcode's pointer.
An example is let x = 10, y = 12, and the input looks like:
[{0, 0}, {15, 10, 0xb0, ..., 3, 16}, {2, 12, p32(0xdeadbeef) * 3}, {1, 0xee}]
Then the program will jump to 0xdeadbeef when returning from last_process.

Since I also want to put shellcode in the content of id 15 (the ellipsis part above), I set x equal to 42, and 44 for y. The payload looks like:
[{0, 0}, {15, 42, 0xb0, shellcode, <pad>, 3, 48}, {2, 44, p32(sc_ptr) * 11}, {1, 0xce}]

Then happy shell and happy flag :D

0x05 Conclusion

My exploit script can be found in my repository. The method func8 is for the construction mentioned above, other methods (func0func2func6) are used for bypassing checks in process, it's easy so I didn't describe them in this writeup.

Thanks for reading, any suggestion is appreciated ;)