Global pointer usage is an important yet often overlooked concept in reverse engineering. This article explains what it is, how it interacts with Binary Ninja’s analysis and the relevant APIs and settings. This can hopefully help you master it in your day-to-day reverse engineering!
A primer on global pointer
(Feel free to skip this introduction if you’re already familiar with the concepts discussed.)
Let us start with a Linux x86 example. On Linux-x86 platform, usually the compiler will use the ebx
register as the
global pointer register and point it to the GOT (Global Offset Table). References to global variables or external
functions can be made relative to GOT. This way, the code can be position-independent and the instruction
encoding can use a smaller relative offset value.
In the following screenshot, you can see that soon after the entry of the main
function, it calls a function called
__x86.get_pc_thunk.bx
.
And the __x86.get_pc_thunk.bx
function only has two instructions:
000110f0 __x86.get_pc_thunk.bx:
000110f0 mov ebx, dword [esp {__return_addr}]
000110f3 retn {__return_addr}
It simply loads the dword at the stack pointer into ebx
and then returns. Remember the return address is on the stack,
so this loads the return address into ebx
. Since the call is made at 0x11204
and the next instruction is at
0x11209
, ebx
ends up being 0x11209
.
Back in the main
function, the next instruction is add ebx, 0x2dcb
, and 0x11209 + 0x2dcb = 0x13fd4
, so it turns
ebx
into 0x13fd4
, and the GOT is at that address:
In short, the code uses the above combination to load the address of the GOT into the ebx
register. We call ebx
the
global pointer register, and its value, 0x13fd4
, the global pointer value. To examine them, we can use the Python API:
>>> current_function.calling_convention.global_pointer_reg
'ebx'
>>> bv.global_pointer_value
<const ptr 0x13fd4>
Or see it in the Triage View:
A few instructions down, we can see that it loads a global string using ebx
relative addressing:
00011214 lea eax, [ebx-0x1fcc] {"Hello, world!"}
0001121a push eax
0001121b call puts
If you have dealt with MIPS binaries, you probably know that the $gp
register is used similarly. However, not every
architecture/calling convention uses the global pointer. For example, for x86_64 sysv calling convention (the default
calling convention on Linux x86_64) along with the convenience of RIP-based addressing mode enables compilers to
generate code like lea rdi, [rip + offset]
instead of designating a dedicated register.
You might not have noticed this though, since Binary Ninja does the calculation (next instruction address + offset) for
you and shows the actual offset being referenced in the form of lea rdi, [0xXXXXX]
. In this case, the calling
convention doesn’t specify a global pointer register at all, and the global pointer value in the triage view shows
N/A
:
The Global Pointer in Binary Ninja
A common misconception is that tracking the value of ebx
is unrelated to global pointer analysis. Our state-of-the-art
dataflow engine understands the process in the above sequence and can track the value of ebx
properly, as evidenced by
the correct resolution of the “Hello, world!” string. From a program analysis point of view, this is intra-procedural
dataflow, i.e., the dataflow within a function. Binary Ninja already handles it quite well. In other words, if every
function initiates the global pointer in the same way at its entry, then we wouldn’t need to do anything to support it.
What can cause issues is inter-procedural usage of the global pointer value. For example, after the main function sets
the ebx
register to the GOT, the compiler ensures it does not change, so in a callee of the main
function, the code
will use the value of ebx
directly without initializing it again.
In the above example, the call to the puts
function is indeed performed through a small thunk function:
00011090 puts:
00011090 endbr32
00011094 jmp dword [ebx+0x10] {puts}
You can see that it uses the ebx
value without first setting it. If the analysis does not know that ebx
is pointing
to the GOT, it wouldn’t be able to know this is jumping the puts
function which significantly degrades the analysis.
Given that, we modeled the behavior of the global pointer in our core analysis. After all functions have been analyzed individually, a module-level workflow will calculate the global pointer value based on information from individual functions, and the calculated global pointer value is then used to update functions that use the global pointer register without initializing it, and the process repeats until the analysis converges (no more changes are made). This process can correctly determine the value in most cases. That said, binary analysis is hard and there is no guarantee of correctness, so we also provided the necessary API and settings to customize the behavior of global pointer value analysis or specify a custom global pointer value.
Global Pointer Value Calculation
To start with, when the calling convention specifies a global pointer, it will override the GetGlobalPointerRegister()
method to specify it. See, for example, our
x86
and
MIPS
open-source architectures.
If a function’s calling convention specifies a global register, then the analysis will look for an assignment of a
constant value to the global register and record its value which is then a candidate global pointer value. If you wish to
see the value suggested by a function, you can open the “Edit Function Properties” dialog and look for something like
GP (reg) = value
:
Note that this merely means this function suggested a global pointer value, it does not mean it is using this value for
analysis. To check that out that, run bv.global_pointer_value
in the Python console or view it in the Triage
view. We might make the presentation clearer as it
is causing some confusion in #6675.
After all the functions are analyzed, there could be many that have suggested a candidate global pointer value. To find the correct one, a majority vote is done and the most recommended value is considered the global pointer value.
After a global pointer value is elected, we will use that to update all functions that use the global pointer before initializing it. That’s important to point out – Binary Ninja not only collects candidate global pointer values from every function, but it also collects whether they use that register without initialization. Once these functions are updated (re-analyzed), they may trigger other analysis updates, and the analysis will stop once there are no more such updates.
Settings for Global Pointer Values
While the general process of calculating the global pointer value does not sound too complex, binary analysis is always full of surprises. Here are two primary situations I have run into during development that I do not have a perfect fix for – what works perfectly for certain binaries will sometimes work badly on others, and vice-versa. As such, I added two configurable settings that influence the global pointer value calculation.
The first setting is Global Pointer Value Minimum Majority Votes
/
analysis.globalPointerValueMinimumMajorityVotes.
I mentioned that the value of the global pointer value is determined by a majority vote, but that means something like
“10 functions suggest a value A, and one single function suggests a value B, so we choose A”. However, I have seen
binaries where we get a single function that sets the value of the ebx
register, and it is just using it as a
general-purpose register which has nothing to do with the global pointer. As such, I have decided that there has to be
at least two votes for a value to be selected, which is the default value of this setting. On the other hand, I have
also seen a firmware that really only sets the $gp
register once at its entry point, and never does so again. For
that, changing this setting to one will be the way to go.
The next one is Max Global Pointer Value Updates
/
analysis.limits.maxGlobalPointerValueUpdates,
which is used to limit the number of analysis updates during the global pointer value analysis. As mentioned earlier,
once a global pointer value is selected, we will use that to update all the functions that use the global pointer value.
This can lead to the analysis of various other functions, and in some weird cases, could change the selected global
pointer value! When that happens, it could easily lead to an infinite analysis loop. As such, I limit the number of
analysis updates and give it a default value of 10, which is far more than should be required (normally 2-3 updates
should suffice). You may set this value to zero to disable any analysis updates caused by the global pointer value,
i.e., not using its value to re-analyze any functions.
Specifying User Global Pointer Values
Even with all the available customizations, there will still be cases where the analysis cannot get it right. But don’t worry – you can still set user global pointer values!
To do so, you can click menu -> Analysis
-> Global Pointer Value
-> Set User Value...
. In the dialog that pops up,
you can set the type of the value to ConstantPointerValue
, and then type in a constant value:
If the auto-analysis figures out a global pointer value, but you think it is wrong, you can set the user global pointer
value to UndeterminedValue
.
To clear a previously set user global pointer value, click: Analysis
menu -> Global Pointer Value
-> Clear User
Value
. The analysis will take over and then decide on a proper global pointer value.
Next Steps
We are not stopping here – we intend to support multiple global pointer registers which is tracked in #6096. Also, in a broader effort, we intended to support more flexible ways to support inter-procedural dataflow tracking. For example, one function may set a register or a stack location to a specific value, and a callee function uses that, but it is not necessarily a global.
Let us know how your experience goes with global pointers. Did you learn something today? Are there features related to global pointer analysis you’d like us to implement? Let us know!