This blog post covers two new features from BinaryNinja version 2.3 (though both have seen some improvements on post-2.3 development branches as well!). Variable and type cross-references (xrefs) are two highly-demanded features for version 2.3. The two issues had accumulated 13 and 23 thumbs-up on GitHub, respectively. I am honored to work on such important features.
When a variable is selected, BinaryNinja now lists all of the xrefs to it:
This works for all variables, including stack variables, register variables, and flag variables.
You should have noticed that the xref item preview shows the same IL as in the graph, in this case, MLIL. This hopefully makes the xref more useful than always showing the disassembly.
Double-clicking an xref item sends you to the corresponding line at the same IL, even if there are multiple IL instructions at the same address because they are internally identified by their instruction index.
One thing worth noting about variable xrefs is that references from MLIL and HLIL can be different. That is because in HLIL some variables are combined (or removed), so the result can be slightly different from the MLIL. If the graph is in MLIL, then the xref widget will show MLIL variable xrefs; if the graph is in HLIL, then it will show variable xrefs from HLIL. Of course, both can be queried using the API.
Also, the filter now allows the user to toggle what type(s) of references they wish to see:
Type cross-references are a bit more complex. Let’s start with the Type view:
Type xrefs can have three forms:
- data references
- code references
- type references
Let’s have a look at what we get when I select the
b2 field in
The first category is data references. It is shown when a data variable references the type or type field. In this case, we have a data variable called
global_b starting at
Note the xref reports the address as
0x4028, which is the actual address of the field
b2, rather than the start of the data variable. This is due to several reasons. One of them is that we can easily get the start of the data variable if we know the address of its field (using
bv.get_data_var_at()), but it may not be trivial to do the opposite.
Also, the current implementation works better for arrays.
annonymous_b is an array of
struct B, and the xref widget makes it clear: there are three occurrences of field
b2, and it shows three xrefs. And each one with its actual address, respectively:
However, this is not free from problems. A huge array could cause performance issues, for example. So we put a cap there: for each category of cross-references, we only show the first 1,000 references in the UI. If you wish to retrieve all of them, the API will be happy to help you. The limit is configurable on development releases after build 2714 (or the next stable 3.0 when it’s released!) as
The next form of type xref is code references. They show where in code a type or type field is used.
The references should be easy to understand, but there is one behavior that is worth mentioning. When we query the xrefs to a type, it shows the xrefs to the type itself. This can be different from xrefs to the offset 0x0 of the type, in several cases. In fact, when we take the address of a
struct B, that is considered a reference to the structure itself, rather than its first member. However, in other cases, they cannot be distinguished, so both are shown. If you are confused by this, do not worry – most of the time we are interested in the xrefs to type fields.
The last category of type xref is the appropriately named: type reference. They describe how types reference one another. Since
struct B is also a member of
struct A, we know the field
b2 is also part of the
struct A, which can be seen here:
b2 occurs at offset 0x18 of
struct D (not shown in the screenshot) includes an array of
struct B, it makes up several references as well.
Double-clicking a type reference item will send you to the type view and highlight the corresponding field, which is an intuitive behavior.
Type xrefs can be triggered in graph or linear view as well. As long as we select a type or type field in it, the xref widget will show references to it. Also, double-clicking on a type or type field (in graph or linear view) will send you to the type view as well instead of asking you to rename it, which was the previous behavior.
The last feature related to the UI is you can select a range of code, and the xref widget will show all of the outgoing variables and type xrefs (along with outgoing code and type references as well, which hasn’t changed). The list can grow very long quickly, which is a good excuse for applying a filter.
Python API Usage
Since the API is such an important part of BinaryNinja, this blog would be incomplete without covering the Python API changes related to the variable and type xrefs.
There are many new functions added, even I don’t remember all of them! One good place to get a list of them: the unit test! Its open-source and can be viewed here.
test_variable_xref covers the basic usage of variable xref, and the function
test_type_xref covers the functions for type xref. From there, you can click on the function names to navigate to its implementation (also open-source) and view the comment for it.
For variable xref, there are four functions:
get_mlil_var_refsreturns MLIL variable references
get_hlil_var_refsreturns HLIL variable references
get_mlil_var_refs_fromreturns outgoing MLIL variable references from a range
get_hlil_var_refs_fromreturns outgoing MLIL variable references from a range
For type xref, the first six collects incoming references:
get_code_refs_for_typecollects code refs to type
get_data_refs_for_typecollects data refs to type
get_type_refs_for_typecollects type refs to type
get_code_refs_for_type_fieldcollects code refs to type field
get_data_refs_for_type_fieldcollects data refs to type field
get_type_refs_for_type_fieldcollects type refs to type field
For outgoing type xrefs, use the following two:
get_code_refs_for_type_fromcollects outgoing code refs to type from a range
get_code_refs_for_type_fields_fromcollects outgoing code refs to type field from a range
Despite the many APIs, their usage should be self-evident from their names and of course you can always just remember the following API documentation wildcard search to find them all again: get_*_refs.
I hope you like the two features so far. Although we believe they are usable and will benefit our users, we still have future plans for even further improvements. Some of them are:
- Add a type for each xref. For example, read, write, call, take the address of, etc.
- Better IL matching for all ILs, including SSA forms.
- Enum xref as suggested in a recent comment