Binary Ninja Blog

Type and Variable Cross-References

This blog post covers two new features from BinaryNinja version 2.3 (though both have seen some improvements on post-2.3 development branches as well!). Variable and type cross-references (xrefs) are two highly-demanded features for version 2.3. The two issues had accumulated 13 and 23 thumbs-up on GitHub, respectively. I am honored to work on such important features.

UI Updates

When a variable is selected, BinaryNinja now lists all of the xrefs to it:

This works for all variables, including stack variables, register variables, and flag variables.

You should have noticed that the xref item preview shows the same IL as in the graph, in this case, MLIL. This hopefully makes the xref more useful than always showing the disassembly.

Double-clicking an xref item sends you to the corresponding line at the same IL, even if there are multiple IL instructions at the same address because they are internally identified by their instruction index.

One thing worth noting about variable xrefs is that references from MLIL and HLIL can be different. That is because in HLIL some variables are combined (or removed), so the result can be slightly different from the MLIL. If the graph is in MLIL, then the xref widget will show MLIL variable xrefs; if the graph is in HLIL, then it will show variable xrefs from HLIL. Of course, both can be queried using the API.

Also, the filter now allows the user to toggle what type(s) of references they wish to see:

Type cross-references are a bit more complex. Let’s start with the Type view:

Type xrefs can have three forms:

  • data references
  • code references
  • type references

Let’s have a look at what we get when I select the b2 field in struct B.

The first category is data references. It is shown when a data variable references the type or type field. In this case, we have a data variable called global_b starting at 0x4020.

Note the xref reports the address as 0x4028, which is the actual address of the field b2, rather than the start of the data variable. This is due to several reasons. One of them is that we can easily get the start of the data variable if we know the address of its field (using bv.get_data_var_at()), but it may not be trivial to do the opposite.

Also, the current implementation works better for arrays. annonymous_b is an array of struct B, and the xref widget makes it clear: there are three occurrences of field b2, and it shows three xrefs. And each one with its actual address, respectively:

However, this is not free from problems. A huge array could cause performance issues, for example. So we put a cap there: for each category of cross-references, we only show the first 1,000 references in the UI. If you wish to retrieve all of them, the API will be happy to help you. The limit is configurable on development releases after build 2714 (or the next stable 3.0 when it’s released!) as ui.maxXrefItems.

The next form of type xref is code references. They show where in code a type or type field is used.

The references should be easy to understand, but there is one behavior that is worth mentioning. When we query the xrefs to a type, it shows the xrefs to the type itself. This can be different from xrefs to the offset 0x0 of the type, in several cases. In fact, when we take the address of a struct B, that is considered a reference to the structure itself, rather than its first member. However, in other cases, they cannot be distinguished, so both are shown. If you are confused by this, do not worry – most of the time we are interested in the xrefs to type fields.

The last category of type xref is the appropriately named: type reference. They describe how types reference one another. Since struct B is also a member of struct A, we know the field b2 is also part of the struct A, which can be seen here:

It means b2 occurs at offset 0x18 of struct A.

Also, struct D (not shown in the screenshot) includes an array of struct B, it makes up several references as well.

Double-clicking a type reference item will send you to the type view and highlight the corresponding field, which is an intuitive behavior.

Type xrefs can be triggered in graph or linear view as well. As long as we select a type or type field in it, the xref widget will show references to it. Also, double-clicking on a type or type field (in graph or linear view) will send you to the type view as well instead of asking you to rename it, which was the previous behavior.

The last feature related to the UI is you can select a range of code, and the xref widget will show all of the outgoing variables and type xrefs (along with outgoing code and type references as well, which hasn’t changed). The list can grow very long quickly, which is a good excuse for applying a filter.

Python API Usage

Since the API is such an important part of BinaryNinja, this blog would be incomplete without covering the Python API changes related to the variable and type xrefs.

There are many new functions added, even I don’t remember all of them! One good place to get a list of them: the unit test! Its open-source and can be viewed here.

The function test_variable_xref covers the basic usage of variable xref, and the function test_type_xref covers the functions for type xref. From there, you can click on the function names to navigate to its implementation (also open-source) and view the comment for it.

For variable xref, there are four functions:

  1. get_mlil_var_refs returns MLIL variable references
  2. get_hlil_var_refs returns HLIL variable references
  3. get_mlil_var_refs_from returns outgoing MLIL variable references from a range
  4. get_hlil_var_refs_from returns outgoing MLIL variable references from a range

For type xref, the first six collects incoming references:

  1. get_code_refs_for_type collects code refs to type
  2. get_data_refs_for_type collects data refs to type
  3. get_type_refs_for_type collects type refs to type
  4. get_code_refs_for_type_field collects code refs to type field
  5. get_data_refs_for_type_field collects data refs to type field
  6. get_type_refs_for_type_field collects type refs to type field

For outgoing type xrefs, use the following two:

  1. get_code_refs_for_type_from collects outgoing code refs to type from a range
  2. get_code_refs_for_type_fields_from collects outgoing code refs to type field from a range

Despite the many APIs, their usage should be self-evident from their names and of course you can always just remember the following API documentation wildcard search to find them all again: get_*_refs.

Future Work

I hope you like the two features so far. Although we believe they are usable and will benefit our users, we still have future plans for even further improvements. Some of them are:

  1. Add a type for each xref. For example, read, write, call, take the address of, etc.
  2. Better IL matching for all ILs, including SSA forms.
  3. Enum xref as suggested in a recent comment

Please give these new features a try, and feel free to offer bug reports and suggestions!