C++’s standard library containers are extremely convenient, as standard library components should be. They’re so convenient that you’ve probably only ever needed to remember their Big-O notations for insertion, deletion, and search after your data structures final exam. However, beyond their cold, clean, bare, templated masks, C++ container templates are quite complex.
Most of the time, when you’re using a container, you don’t care about how it allocates or frees memory. But, because the specifics of allocation can sometimes matter, C++ exposes the allocation backend to you through a default template parameter. The default allocator most of the standard library uses for its containers is std::allocator<T>
on the type you’re allocating.
Now, from a programming perspective, this is great. C++ is well-known for the control it allows you to have over memory, alongside other primitives. It provides this control while still trying to be user-friendly by hiding options most programmers will never need to use. Unfortunately, as reverse engineers, these things are no longer hidden from us. And, since even default templates are a form of code reuse and generation, those default parameters can balloon into type signatures of insane sizes at compile time.
This begs the question: What does something as simple as vector<string>
really look like? And how bad can it get?
What makes compiled C++ symbols so hard to read?
Let’s start by looking at something simple: an ordered list of strings. A programmer declares vector<string>
, and it “just works”. For those who need it, you can also specify what allocator to use for the list: vector<string, myArtisanalAllocator>
. Over C++’s long history basically everything in the standard library has become a template. One of the consequences of this is that types you might not expect to be templates often are. Even the unassuming std::string
is a template, resulting in GCC expanding vector<string>
’s symbol at compile time to:
std::vector<
std::__cxx11::basic_string<
char,
std::char_traits<char>,
std::allocator<char> >,
std::allocator<
std::__cxx11::basic_string<
char,
std::char_traits<char>,
std::allocator<char> > > >
At this point I feel the solemn duty to remind you that C++ scopes member functions behind fully-qualified type names, which means every single member function for a vector of strings is prefaced with, say it with me this time:
std::vector<
std::__cxx11::basic_string<
char,
std::char_traits<char>,
std::allocator<char> >,
std::allocator<
std::__cxx11::basic_string<
char,
std::char_traits<char>,
std::allocator<char> > > >
::pain
Furthermore, a simple vector of strings is rather tame compared to the kinds of data structures that form in even the most mundane programs.
What can Binary Ninja do about this?
A lot! In fact, this is something we fixed way back in Binary Ninja 2.2. We call it C++ Template Simplification, and if everything’s been working correctly, you’ve hopefully been experiencing less of that slight burning sensation behind your eyes ever since.
Take this simple program, for example:
#include <iostream>
#include <string>
using namespace std;
int main() {
string str = “Hello World!”
cout << string(str.rbegin(), str.rend()) << endl;
return 0;
}
Here’s the HLIL of main
:
Six out of the ten function names called here are so long that we don’t even display the whole thing. Four of those functions are just std::string
member calls that are impossible to read. Here’s what the same HLIL looks like with our template simplifier enabled:
So much better, right?
How can you use it?
Well, it’s been enabled by default for a while now, but the toggle is under Settings -> Analysis -> Types -> Simplify Templates
. This will change the default behavior globally, however, which might not be what you want. Click the “Project” or “Resource” scope in the top left to change the default for your project or file instead.
To change it for a single file before analysis runs, make sure to open your binary with File -> Open With Options
and turn “Simplify Templates” on in the dialog.
If you’d like to set this option through the API while loading a file, try this:
bv = binaryninja.load("/path/to/binary", options={"analysis.types.TemplateSimplifier": True})
Or, if you’d like to use the demangler directly on a type name, use:
>>> binaryninja.demangle.simplify_name_to_string("std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >")
‘std::string’
Check out the API docs for more ways to use this API.
How do we plan to expand this going forward?
Our template simplifier is smart. We don’t use regular expression and we don’t just find-replace portions of symbols; we used a parser combinator to generate honest-to-goodness ASTs that we can make informed decisions about. This means we can match subtrees, compare nested arguments, expose traversals to the API, and pretty much anything else you can imagine.
Right now, we simplify all of the C++17 containers and most of the IO and FS library, but we’ll continue expanding to the rest the C++ standard library as needed.
As of writing, we don’t simplify templates that have any non-default parameters, but we are considering hiding default parameters in templates that might only have a single non-default. For example, map<string, string, alloc=myArtisanalAllocator>
where we’re hiding the default compare template parameter.
Finally, we don’t need to stop at C++. Other languages use use namespaces and code generation in a similar way to C++, which we intend to bring support to in the future as well.