Binary Ninja Blog

Working with Global Pointers in Binary Ninja

Global pointer usage is an important yet often overlooked concept in reverse engineering. This article explains what it is, how it interacts with Binary Ninja’s analysis and the relevant APIs and settings. This can hopefully help you master it in your day-to-day reverse engineering!

Overview

A primer on global pointer

(Feel free to skip this introduction if you’re already familiar with the concepts discussed.)

Let us start with a Linux x86 example. On Linux-x86 platform, usually the compiler will use the ebx register as the global pointer register and point it to the GOT (Global Offset Table). References to global variables or external functions can be made relative to GOT. This way, the code can be position-independent and the instruction encoding can use a smaller relative offset value.

In the following screenshot, you can see that soon after the entry of the main function, it calls a function called __x86.get_pc_thunk.bx.

main function

And the __x86.get_pc_thunk.bx function only has two instructions:

000110f0  __x86.get_pc_thunk.bx:
000110f0  mov     ebx, dword [esp {__return_addr}]
000110f3  retn     {__return_addr}

It simply loads the dword at the stack pointer into ebx and then returns. Remember the return address is on the stack, so this loads the return address into ebx. Since the call is made at 0x11204 and the next instruction is at 0x11209, ebx ends up being 0x11209.

Back in the main function, the next instruction is add ebx, 0x2dcb, and 0x11209 + 0x2dcb = 0x13fd4, so it turns ebx into 0x13fd4, and the GOT is at that address:

GOT

In short, the code uses the above combination to load the address of the GOT into the ebx register. We call ebx the global pointer register, and its value, 0x13fd4, the global pointer value. To examine them, we can use the Python API:

>>> current_function.calling_convention.global_pointer_reg
'ebx'
>>> bv.global_pointer_value
<const ptr 0x13fd4>

Or see it in the Triage View:

Global pointer in triage view

A few instructions down, we can see that it loads a global string using ebx relative addressing:

00011214  lea     eax, [ebx-0x1fcc]  {"Hello, world!"}
0001121a  push    eax
0001121b  call    puts

If you have dealt with MIPS binaries, you probably know that the $gp register is used similarly. However, not every architecture/calling convention uses the global pointer. For example, for x86_64 sysv calling convention (the default calling convention on Linux x86_64) along with the convenience of RIP-based addressing mode enables compilers to generate code like lea rdi, [rip + offset] instead of designating a dedicated register.

You might not have noticed this though, since Binary Ninja does the calculation (next instruction address + offset) for you and shows the actual offset being referenced in the form of lea rdi, [0xXXXXX]. In this case, the calling convention doesn’t specify a global pointer register at all, and the global pointer value in the triage view shows N/A:

No global pointer

The Global Pointer in Binary Ninja

A common misconception is that tracking the value of ebx is unrelated to global pointer analysis. Our state-of-the-art dataflow engine understands the process in the above sequence and can track the value of ebx properly, as evidenced by the correct resolution of the “Hello, world!” string. From a program analysis point of view, this is intra-procedural dataflow, i.e., the dataflow within a function. Binary Ninja already handles it quite well. In other words, if every function initiates the global pointer in the same way at its entry, then we wouldn’t need to do anything to support it.

What can cause issues is inter-procedural usage of the global pointer value. For example, after the main function sets the ebx register to the GOT, the compiler ensures it does not change, so in a callee of the main function, the code will use the value of ebx directly without initializing it again.

In the above example, the call to the puts function is indeed performed through a small thunk function:

00011090  puts:
00011090  endbr32
00011094  jmp     dword [ebx+0x10]  {puts}

You can see that it uses the ebx value without first setting it. If the analysis does not know that ebx is pointing to the GOT, it wouldn’t be able to know this is jumping the puts function which significantly degrades the analysis.

Given that, we modeled the behavior of the global pointer in our core analysis. After all functions have been analyzed individually, a module-level workflow will calculate the global pointer value based on information from individual functions, and the calculated global pointer value is then used to update functions that use the global pointer register without initializing it, and the process repeats until the analysis converges (no more changes are made). This process can correctly determine the value in most cases. That said, binary analysis is hard and there is no guarantee of correctness, so we also provided the necessary API and settings to customize the behavior of global pointer value analysis or specify a custom global pointer value.

Global Pointer Value Calculation

To start with, when the calling convention specifies a global pointer, it will override the GetGlobalPointerRegister() method to specify it. See, for example, our x86 and MIPS open-source architectures.

If a function’s calling convention specifies a global register, then the analysis will look for an assignment of a constant value to the global register and record its value which is then a candidate global pointer value. If you wish to see the value suggested by a function, you can open the “Edit Function Properties” dialog and look for something like GP (reg) = value:

Global pointer value suggested by a function

Note that this merely means this function suggested a global pointer value, it does not mean it is using this value for analysis. To check that out that, run bv.global_pointer_value in the Python console or view it in the Triage view. We might make the presentation clearer as it is causing some confusion in #6675.

After all the functions are analyzed, there could be many that have suggested a candidate global pointer value. To find the correct one, a majority vote is done and the most recommended value is considered the global pointer value.

After a global pointer value is elected, we will use that to update all functions that use the global pointer before initializing it. That’s important to point out – Binary Ninja not only collects candidate global pointer values from every function, but it also collects whether they use that register without initialization. Once these functions are updated (re-analyzed), they may trigger other analysis updates, and the analysis will stop once there are no more such updates.

Settings for Global Pointer Values

While the general process of calculating the global pointer value does not sound too complex, binary analysis is always full of surprises. Here are two primary situations I have run into during development that I do not have a perfect fix for – what works perfectly for certain binaries will sometimes work badly on others, and vice-versa. As such, I added two configurable settings that influence the global pointer value calculation.

The first setting is Global Pointer Value Minimum Majority Votes / analysis.globalPointerValueMinimumMajorityVotes. I mentioned that the value of the global pointer value is determined by a majority vote, but that means something like “10 functions suggest a value A, and one single function suggests a value B, so we choose A”. However, I have seen binaries where we get a single function that sets the value of the ebx register, and it is just using it as a general-purpose register which has nothing to do with the global pointer. As such, I have decided that there has to be at least two votes for a value to be selected, which is the default value of this setting. On the other hand, I have also seen a firmware that really only sets the $gp register once at its entry point, and never does so again. For that, changing this setting to one will be the way to go.

The next one is Max Global Pointer Value Updates / analysis.limits.maxGlobalPointerValueUpdates, which is used to limit the number of analysis updates during the global pointer value analysis. As mentioned earlier, once a global pointer value is selected, we will use that to update all the functions that use the global pointer value. This can lead to the analysis of various other functions, and in some weird cases, could change the selected global pointer value! When that happens, it could easily lead to an infinite analysis loop. As such, I limit the number of analysis updates and give it a default value of 10, which is far more than should be required (normally 2-3 updates should suffice). You may set this value to zero to disable any analysis updates caused by the global pointer value, i.e., not using its value to re-analyze any functions.

Specifying User Global Pointer Values

Even with all the available customizations, there will still be cases where the analysis cannot get it right. But don’t worry – you can still set user global pointer values!

To do so, you can click menu -> Analysis -> Global Pointer Value -> Set User Value.... In the dialog that pops up, you can set the type of the value to ConstantPointerValue, and then type in a constant value:

Set user global pointer value

If the auto-analysis figures out a global pointer value, but you think it is wrong, you can set the user global pointer value to UndeterminedValue.

To clear a previously set user global pointer value, click: Analysis menu -> Global Pointer Value -> Clear User Value. The analysis will take over and then decide on a proper global pointer value.

Next Steps

We are not stopping here – we intend to support multiple global pointer registers which is tracked in #6096. Also, in a broader effort, we intended to support more flexible ways to support inter-procedural dataflow tracking. For example, one function may set a register or a stack location to a specific value, and a callee function uses that, but it is not necessarily a global.

Let us know how your experience goes with global pointers. Did you learn something today? Are there features related to global pointer analysis you’d like us to implement? Let us know!