Binary Ninja Blog

Having Fun with Flare-on Using Time-Travel Debugging (TTD)

In this post, I will explain how I analyzed the Serpentine challenge in this year’s flare-on with the help of time-travel debugging (TTD) integration in the Binary Ninja debugger.

Serpentine is the 9th challenge and is commonly considered the hardest among the ten challenges this year, or even among ALL recent years. It features self-modifying code, x64 exception handling and unwinding, and arithmetic operations masquerading as table lookups. It is a fantastic challenge and a great test of the player’s skills and their tool’s reliability. I’m glad that I was able to solve it and also complete this year’s flare-on challenge!

I used TTD as an initial overview to explore the swamp of obstacles. If you have not heard of it yet, TTD is a technology that allows the recording and replay of the execution of a target. This could be a user process or even a whole system. TTD offers an unprecedented advantage to reverse engineering since it enables traveling back and forth in time and allows the examination of previous states of the target. Among the various TTD solutions, WinDbg TTD is free to use and supports the recording of Windows user-mode processes.

Binary Ninja’s debugger integrates WinDbg TTD support so you can record a TTD trace and replay it using a familiar user interface.

So much for the background, let’s get started!

Initial Analysis

The main function looks deceptively simple:

main function

The input string must be supplied on the command line, and it must be 0x20 chars long. After these checks, a code buffer is executed with the input string as the first argument.

There is also nothing fancy going on with the code buffer. The _TLS_Entry_0 function allocates a RWX buffer and copies some 0x800000 bytes from the .data section into it:

TLS Entry 0 function

However, when I try to disassemble the code buffer, it gives me a surprise that the first instruction is hlt:

hlt instruction

This instruction will cause an exception which needs to be handled. This is quite common in flare-on or other CTFs, and I quickly noticed that there is a call to SetUnhandledExceptionFilter at the beginning of the main function. To my disappointment, the exception handler is dumb and doesn’t do anything:

Dumb exception handler

I was stuck because I couldn’t figure out how the exception works. I ran the program with an input that are 0x20 bytes long (abcdefghijklmnopqrstuv0123456789) and it printed an error message Wrong key, which means the exception is somehow handled and the code execution continues, though I wasn’t sure how that works.

I decided to analyze it with TTD to see if I could make some progress.

TTD Recording

Before recording a TTD trace, you must first get WinDbg/TTD installed. For licensing reasons, we don’t bundle WinDbg as part of Binary Ninja. But the installation process is automated – just click Debugger -> Install WinDbg/TTD in the menu, and a Python script will download the relevant files and install them for you. If, however, you are running the Binary Ninja offline or using the free version, you will need to install it manually.

Once the installation is completed, relaunch Binary Ninja, and click Debugger -> Record TTD Trace in the menu. In the TTD Record dialog that pops up, make sure the parameters including the executable path and command line arguments, are correct:

TTD Record Dialog

After clicking the Record button, a Windows UAC dialog will pop up and ask for Administrator privilege. This is because the TTD recorder command line utility requires it to function properly. Once you click Yes on it, the recording starts. The duration of the recording varies from different targets, but for this binary, it finishes almost instantaneously.

Record Completed Dialog

I got two files in the output directory. serpentine01.run is the actual trace file and is 36.0 MB in size, which is quite impressive for what it’s doing! serpentine01.out is text output from the recorder command line utility that might be useful for troubleshooting.

Record Completed Dialog

Replaying TTD Trace

To replay the recorded TTD trace, I opened the Debug Adapter Settings dialog, and made the following changes:

  1. Select DBGENG_TTD as the Adapter Type
  2. Change the Executable Path to the path of the trace file, i.e., serpentine01.run

Debug Adapter Settings Dialog

I first clicked OK to accept the above changes, then clicked Launch to start debugging as usual.

When I replayed the TTD trace for the first time, the trace was indexed and took longer than future replays. I got the following output:

Time Travel Position: B:0 [Unindexed] Index
Indexed 3/3 keyframes
Successfully created the index in 360ms.
Breakpoint 0 hit
*** WARNING: Unable to verify checksum for serpentine.exe
Time Travel Position: 13:19
serpentine+0x1a14:
00000001`40001a14 4883ec28        sub     rsp,28h

An index file – serpentine01.idx also showed up in the same directory where the trace file is. For larger trace files, there will be more keyframes in it, and the indexing process can take a while. The good news is this only needs to be done once and later accesses to the trace are quite fast.

Now I was ready to see how the exception generated by the hlt instruction was handled. I first put a breakpoint on the instruction that calls into the code buffer and stepped into it. Now I was at the hlt instruction and stepped again. A bit to my surprise, I was actually on the next instruction immediately after the hlt instruction:

Debug Adapter Settings Dialog

But I quickly realized that this doesn’t mean the push rsp instruction is executed. This is just an artifact of how the exception works at the CPU level and how the TTD recorder handles it. Technically, since hlt instruction is a privileged instruction and executing it will trigger a general protection (GP) fault, the instruction pointer shouldn’t have advanced to the next instruction. The recorder causes a bogus instruction to be recorded in the trace.

I stepped once more and got to the start of KiUserExceptionDispatch, which is an important piece in Windows’s exception handling process. Now I can confirm the exception indeed somehow is handled, but I still had no clue how.

KiUserExceptionDispatch

After being stuck here for a while, I realized that I could not only debug the trace, but I could run queries on it and see which instructions had been executed in the code buffer!

Debugger Data Model and Queries

In my opinion, the debugger data model is the real gem of TTD. I do admit that having the ability to travel in both directions of the trace is a powerful addition to regular debugging, but it only gets you so far and doesn’t unleash the full potential of TTD.

To make it more tangible, think of the case where you wish to check the coverage of the code from a TTD trace. Of course, the information of all the executed instructions is in the trace, but it could become a problem if you have to extract them one by one. Specifically, you might try to repeatedly single-step on the trace and record what has been executed. However, that will descend into serious performance issues on a moderately sized trace.

The debugger data model allows us to run SQL-like queries on the trace. The model exposes the following valuable artifacts for query:

  1. Memory read/write/execute
  2. Function calls
  3. Processes and threads
  4. Heap
  5. Modules
  6. Exception and events

For this particular challenge, I wanted to see which instructions are executed in the code buffer. This can be answered by a query on the Memory object:

dx -g @$cursession.TTD.Memory(0xd1e0000, 0xd9e0000, "e")

It doesn’t look too complex, right? The first two parameters specify the start and end address of the memory area to include in the query, and I supplied the start and end address of the code buffer. The next parameter specifies the access type for the query. I used e, which not-so-surprisingly, stands for execution. Putting these together, this query returns all the instructions that have been executed in the range 0xd1e0000-0xd9e0000.

I can execute this query within Binary Ninja’s GUI since the Debugger console connects directly to the WinDbg/DbgEng engine:

TTD Query

The output should be pretty self-explanatory; each row stands for an instruction that has been executed, and the columns are information about it. For example, it includes the address and bytes of each executed instruction. For a detailed description of the information, please refer to the documentation.

From it, I can see the next instruction after the hlt instruction is at 0xd1e0098, with a start time 4E:7C5. This start time is a Position object, and I can time-travel to it directly by issuing the command !tt 4E:7C5 in the Debugger console. The debugger would switch to the state at that particular timestamp. The ability to teleport in the timeline is the main feature that makes TTD magical.

There is a function call at 0xd1e0098 and the code inside it looks like:

Self-Modifying Code 1

To my trained eye, the code is self-modifying, that the instruction at 0xd4c4d42 overwrites the byte at 0xd4c4d49. This means the mov edx, 0xddad4c46 instruction currently at 0xd4c4d49 will be changed to something else right before it is executed. I single-stepped a few times and then reached:

Self-Modifying Code 2

Now I can see the actual instruction is mov r11, 0x10add7f49, and another instruction will change the contents of the bytes back to the original value right after it gets executed. This prevents us from easily dumping the contents of the instruction.

I spent some time exploring the trace and realized that this self-modifying pattern is used across the trace. Every instruction is encrypted in this way, and I’m not getting anywhere unless I figure out a way to automate the process of extracting them.

Javascript Automation

Since the instruction we want is written to immediately before it gets executed, I figured I simply need to dump all the memory writes and instruction executions within the code buffer, and write a small script to harvest the ones I’m looking for. This can be seen by running the previous query with ew as the third parameter, which makes it include both memory writes and instruction execution. We can see the write-execute-write pattern on address 0xd4c4d49:

Memory Query

The debugger data model comes with a Javascript API. I’m not a big fan of Javascript, so I ended up first using Javascript to dump the required information into a file and then using a separate Python script to do the actual hunting. You could also do it with one script if you prefer JavaScript.

I modified 0vercl0k’s codecov.js to get the following:

const logln = p => host.diagnostics.debugLog(p + '\n');
const hex = p => p.toString(16);

function DumpQuery() {
    const CurrentSession = host.currentSession;
    const Utility = host.namespace.Debugger.Utility;

    const FilePath = "C:\\Users\\xusheng\\Downloads\\serpentine_writeup\\output.txt";
    logln('Writing ' + FilePath + '...');
    const FileHandle = Utility.FileSystem.CreateFile(FilePath, 'CreateAlways');
    const Writer = Utility.FileSystem.CreateTextWriter(FileHandle);

    const BaseAddress = 0xd1e0000;
    const Size = 0x800000

    const CoverageLines = CurrentSession.TTD.Memory(
        BaseAddress,
        BaseAddress.add(Size),
        'WE'
    );

    for (var line of CoverageLines)
    {
        Writer.WriteLine(line.TimeStart + "\t" + line.AccessType + "\t" + hex(line.IP)
            + "\t" + hex(line.Address) + "\t" + hex(line.Size) + "\t" + hex(line.Value));
    }

    FileHandle.Close();
    logln('Done!');
}

function initializeScript() {
    return [
        new host.apiVersionSupport(1, 2),
        new host.functionAlias(
            DumpQuery,
            'dumpquery'
        )
    ];
}

This script registers a command dumpquery which will run the DumpQuery function. It executes the same query we mentioned above with the CurrentSession.TTD.Memory object, and for each object within the result, prints the start time, access type, instruction pointer, address, size, and the value. The instruction pointer can be different from the address when the access type is Write – the instruction pointer is the address of the instruction itself, and the address is the address of the byte that the instruction writes to.

To run this script, I first ran the .loadscript command:

.scriptload C:\Users\xusheng\Downloads\serpentine_writeup\dumpquery.js

JavaScript script successfully loaded from 'C:\Users\xusheng\Downloads\serpentine_writeup\dumpquery.js'

Now that it loaded fine, I invoked the command with:

!dumpquery

Writing C:\Users\xusheng\Downloads\serpentine_writeup\output.txt...
Done!
@$dumpquery()

Thanks to the debugger console, I can run these WinDbg commands all within Binary Ninja. You can also open the trace file in WinDbg and do the same thing.

Patching and Deobfuscation

Now that I have all the information, I started to write a script to patch the code and remove the self-modification code. As a brief recap, the code uses the following pattern for each instruction:

  1. A function call into a short snippet of code
  2. Inside the function, the real instruction is decrypted, executed, and re-encrypted
  3. The return address is manipulated to point to the next code blob
  4. The function returns to the designated address
  5. Repeat

My plan is as follows:

  1. Patch the code to replace the call instruction with a jmp. It jumps to the very address of the real instruction
  2. Replace the encrypted bytes with the actual bytes of the instruction
  3. Find the real return address, and insert another jmp to it right after the real instruction

In this way, we chain the real instructions together with jmp instructions. I’m a bit shy to include my script in this blog since it was written in a rush and surely looks ugly. I put it in this gist for those curious. The only interesting part of the script is that it pinpoints the real instructions by scanning for the write-execute-write pattern.

Here are my results. We can already get a clue of what the code looks like roughly:

Cleaned-up Code

That code immediately reminded me of exception handling:

Exception Handling Code

If we check the definition of EXCEPTION_ROUTINE, we can see r9, as the fourth argument, should hold a DispatcherContext:

typedef EXCEPTION_DISPOSITION (*PEXCEPTION_ROUTINE) (
    IN PEXCEPTION_RECORD ExceptionRecord,
    IN ULONG64 EstablisherFrame,
    IN OUT PCONTEXT ContextRecord,
    IN OUT PDISPATCHER_CONTEXT DispatcherContext
);

Inside a DispatcherContext, the object at offset 0x28 is ContextRecord, which contains all the registers saved when the exception happens:

DISPATCHER_CONTEXT

Registers in ContextRecord

So, for code

mov rbp, qword [r9+0x28]
mov rdi, qword [rdi+0xe0]

It only does one thing, loading the value of the r13 register (before the context save) into the rdi register. This makes sense, since the exception will clobber many registers, making the continuation of code execution hard. Using this clever trick, the author can move the registers around and implement the desired code logic.

To keep this blog post short and focused on TTD, I’ll stop the analysis here since there is still quite a bit of a journey ahead to arrive at the flag. As a brief recap, TTD helped me analyze and understand the code in a way that traditional debugging would be hard-pressed to. If you are interested, you should definitely try it out yourself!