Reverse engineering often requires quickly finding a proverbial needle in a haystack. Whether hunting for specific byte signatures, cryptographic constants, or matching instruction patterns, the ability to efficiently locate precise byte sequences within binaries is essential. This post introduces the Advanced Binary Search (ABS) search mode in Binary Ninja (BN), designed to streamline the most common reverse engineering search tasks.
Raising the Bar for Binary Search: Simple Meets Powerful
Prior to ABS, search in BN was limited to exact hex bytes, escaped strings, or raw strings—with no regex or wildcard support.
ABS, released in Binary Ninja 5.0, improves this workflow through an intelligent parsing layer that automatically detects your search intent, while still providing explicit modes when you need complete control. The result is a unified search interface that adapts to what you’re looking for, not the other way around.
Note that we’ve left the older explicit search modes in the UI while we continue to flesh out and test the UX of ABS, but long-term we expect to be able to remove them.
The new ABS is available in the “Find” search dialog and in the API using bv.search
.
First Principles: Search Pattern Recognition
By default, the search engine interprets your input in the following ways, automatically detecting the most likely pattern type:
- FlexHex: A simplified nibble-based hex pattern language with wildcards (e.g.,
53 8b 3?
) - Bytes-Based Regex: A regular expression operating on raw bytes and ASCII text
- Escaped String: Standard escape notation (e.g.,
\x53\x8b\x3f
) is handled natively by the regular expression engine - Raw String: A literal string search for ASCII text (default fallback when no other interpretation is possible)
The engine analyzes your search pattern to determine intent, eliminating the need to explicitly specify search modes in most cases. Whether you use FlexHex notation, standard regex, or escaped hex sequences, the system intelligently scans your input to find the intended patterns without requiring you to switch between different modes or settings.
This automatic detection works as follows:
┌─────────────────┐
│ User Input │
└────────┬────────┘
│
▼
┌────────────────┐ Yes ┌─────────────────┐
│ Looks like ├────────────►│ Process as │
│ FlexHex? │ │ FlexHex │
└────────┬───────┘ └────────┬────────┘
│ No │
▼ │
┌────────────────┐ Yes ┌────────▼────────┐
│ Valid Bytes ├────────────►│ Convert to │
│ Regex? │ │ Regex Pattern │
└────────┬───────┘ └────────┬────────┘
│ No │
▼ │
┌────────────────┐ │
│ Treat as │ │
│ Literal String │ │
└────────┬───────┘ │
│ │
▼ ▼
┌─────────────────────────────────────────────────┐
│ Execute Search │
└─────────────────────────────────────────────────┘
This intelligent parsing reduces friction while introducing minimal ambiguity. The only real edge cases occur with text that could be interpreted as either valid hexadecimal or as regex patterns. For example:
bv.search("CAFE")
is interpreted as hex and matches the bytes0xCA
and0xFE
.- If you want to find the ASCII string “CAFE” (bytes
0x43
,0x41
,0x46
,0x45
), usebv.search("CAFE", raw=True)
.
Special characters like ?
or +
might be interpreted as regex operators rather than literal characters. These ambiguities are easily resolved by using the raw=True
parameter when you need to search for literal text rather than its hexadecimal or regex interpretation. In the UI, you can enable the “Raw string” checkbox to do the same thing.
Accessing the Search Engine: API and UI Interfaces
The new ABS is available through both an API and the UI using the same underlying engine.
Python API Interface
For programmatic access and scripting, the Python API is available:
# Basic search with FlexHex pattern
results = bv.search("50 ?? 45")
# Literal string with optional parameters
results = bv.search("Main", raw=True, ignore_case=True)
# Regex pattern with alignment and limit (Finds items with 10+ printable ASCII characters)
results = bv.search("[\\x20-\\x7E]{10,}", align=4, limit=5)
The results are a generator object that yields the offset and a DataBuffer for each match. You can also provide optional callback functions for progress tracking and match handling:
def print_match(offset, data):
hex_data = bytes(data).hex() # convert the match bytes to a hex string
print(f"Found match at offset 0x{offset:X}, data={hex_data}")
return True # return True to keep searching
list(bv.search("ServiceMain", match_callback=print_match))
User Interface
The Find dialog provides the "Advanced Binary Search"
type, offering real-time feedback of the detected search mode. This dialog is accessible via the Find...
action in the Edit
menu, or by pressing Ctrl/CMD+F
.
There are three possible search modes:
- FlexHex: A simplified hex pattern language with wildcards
- Regex: A bytes-based regular expression
- Raw String: A literal ASCII string search
The UI automatically interprets your inputs using the same intelligent pattern detection as the API. This immediate feedback helps you understand exactly how the search engine will process your pattern, reducing trial-and-error and making complex searches more intuitive.
FlexHex Mode
FlexHex provides an intuitive syntax for byte pattern matching with wildcards. This is particularly valuable for signatures with variable bytes or when you only care about parts of a pattern.
FlexHex Syntax Reference
Syntax | Description | Example | Matches |
---|---|---|---|
XX | Exact byte | 4F |
Byte 0x4F |
?X | Wildcard high nibble | ?F |
Any byte ending with 0xF (0x0F, 0x1F, 0x2F, etc.) |
X? | Wildcard low nibble | 4? |
Any byte starting with 0x4 (0x40, 0x41, 0x42, etc.) |
?? | Full wildcard byte | ?? |
Any byte |
Key points about FlexHex patterns:
- patterns are space-insensitive, allowing you to write patterns with or without spaces.
- patterns must consist of an even number of hexadecimal characters, as each pair represents a single byte.
How FlexHex Works
Under the hood, FlexHex patterns are transformed into regex patterns:
50
becomes\x50
(exact byte match)??
becomes[\x00-\xff]
(match any byte)?F
expands to a character class matching all bytes ending with F4?
becomes[\x40-\x4f]
(match any byte in the range)
This transformation maintains readability while leveraging the power of the regex engine.
Bytes-Based Regex Mode
When you need more complex matching patterns you can use the full power of regular expressions. Unlike text-based regex engines, a bytes-based implementation operates directly on raw binary data, making it ideal for reverse engineering tasks. We leverage the Rust regex::bytes crate to provide a robust and efficient engine for binary data. By default, Binary Ninja configures the engine to operate in ASCII compatible mode with the dot-all mode enabled, allowing the '.'
character to match any byte including '\n'
. In other words, default flags are (?s-u)
. For more information on regex syntax, see the Rust regex::bytes crate documentation.
Practical Examples
# Match sequences of 10+ printable ASCII characters
bv.search("[\\x20-\\x7E]{10,}")
# Find potential IPv4 addresses
bv.search("192\\.168\\.\\d{1,3}\\.\\d{1,3}")
# Locate PEM-style certificate headers
bv.search("-----BEGIN.*?-----")
# Mixed hex and ASCII pattern for HTTP protocol headers
bv.search("\\x48\\x54\\x54\\x50/[0-9]\\.[0-9]") # HTTP/1.x
Raw String Mode
When you need to search for literal ASCII text, the Raw String mode is the most straightforward option. This mode is ideal for finding strings, function names, or other text patterns within the binary data.
Basic String Searches
# Special case: ensuring literal interpretation
search("BEEF", raw=True) # Finds the ASCII string "BEEF" (0x42, 0x45, 0x45, 0x46), not the hex bytes 0xBE, 0xEF
# When searching for simple ascii strings, Regex mode or Raw String mode both provide the same result
search("ServiceMain")
search("CreateProcess")
# For strings where special characters are present, Raw String mode is the best choice
search("SELECT * FROM", raw=True)
search("Content-Type:", raw=True)
Considerations and Important ???
Case Sensitivity in All Search Modes
All search modes respect the case sensitivity option. By default, searches are case-sensitive. If opting for case-insensitive searches, it affects ASCII alphabetic characters in text patterns, or hexadecimal bytes that represent ASCII alphabetic characters.
For example, searching for the byte 0x41
(‘A’) will match both 0x41
and 0x61
(‘a’) with a case insensitive search.
Overlapping Matches
For certain analysis tasks, you may need to find every possible match, even when they overlap:
# Find all potential 32-bit addresses
search("[A-F0-9]{8}", overlap=True)
# Locate overlapping instruction patterns
search("\\x8B\\x4D..", overlap=True)
Without overlap=True
, the search would skip ahead past each match before continuing, potentially missing valuable patterns. This is particularly important when:
- Hunting for instruction patterns that might be part of multiple larger patterns
- Looking for data structures that may be embedded within other structures
- Analyzing encoded or obfuscated data where patterns might overlap
Equivalent Ways to Specify Byte Patterns
Sometimes you’ll see different Python expressions that all yield the same byte pattern. For instance:
list(bv.search(b"\x53\x8b\x3f".hex())) # Regex Mode
list(bv.search(r"\x53\x8b\x3f")) # Regex Mode
list(bv.search("\\x53\\x8b\\x3f")) # Regex Mode
list(bv.search("53 8B 3F")) # FlexHex Mode
All of these expressions are equivalent and will match the same byte pattern (0x53 0x8B 0x3F
). The variations simply reflect different ways of escaping backslashes and handling raw strings in Python. Importantly, the pattern
parameter for the search function is always a Python str
type, not bytes.
Search Options Reference
The search engine supports several options to customize how patterns are matched:
Option | Type | Default | Description |
---|---|---|---|
pattern |
string | Required | The pattern to search for |
start |
integer | BinaryView.start |
The address to start the search from |
end |
integer | BinaryView.end-1 |
The address to end the search (inclusive) |
raw |
boolean | False |
When True , treats the pattern as a literal string |
ignore_case |
boolean | False |
Makes the search case-insensitive for ASCII alphabetic characters and cooresponding hex bytes |
overlap |
boolean | False |
When True , allows matches to begin within previous matches |
align |
integer | 1 |
Only reports matches that start at offsets divisible by this value (must be a power of 2) |
limit |
integer | None |
Maximum number of matches to return |
progress_callback |
function | None |
Optional function to track progress and allow cancellation |
match_callback |
function | None |
Optional function called for each match |
Future Directions
We’re excited about the potential for further enhancements to Binary Ninja by leveraging a bytes-based regular expression engine. This feature is evolving as we gather feedback and identify additional use cases. One area of particular interest is structured or typed searching—for example, allowing users to easily search for 32-bit or 64-bit integer constants (in either endianness), specific floating-point patterns, instruction and/or string encodings. By letting users specify higher-level “shapes” or “types” rather than raw bytes, we aim to provide more targeted search results, ultimately streamlining reverse engineering workflows.
On the UI side, we have some clean-up to do. There are some existing Find types (Escaped string, Hex string, and Raw string) that could be deprecated in favor of the Advanced Binary Search type.
Conclusion
The Advanced Binary Search capability gives reverse engineers a new intuitive, powerful way to interact with binary data. By intelligently detecting search intent and providing a unified interface across both text and binary patterns, it eliminates much of the friction typically associated with search. Whether you’re analyzing malware, reverse engineering proprietary protocols, or hunting for vulnerabilities, this search engine helps you focus on the analysis rather than the mechanics of the search process.
We welcome feedback on these features and ideas for additional capabilities that would improve your reverse engineering workflow.