Binary Ninja Blog

Advanced Binary Search: Finding Needles in Binary Haystacks

Reverse engineering often requires quickly finding a proverbial needle in a haystack. Whether hunting for specific byte signatures, cryptographic constants, or matching instruction patterns, the ability to efficiently locate precise byte sequences within binaries is essential. This post introduces the Advanced Binary Search (ABS) search mode in Binary Ninja (BN), designed to streamline the most common reverse engineering search tasks.

Raising the Bar for Binary Search: Simple Meets Powerful

Prior to ABS, search in BN was limited to exact hex bytes, escaped strings, or raw strings—with no regex or wildcard support.

ABS, released in Binary Ninja 5.0, improves this workflow through an intelligent parsing layer that automatically detects your search intent, while still providing explicit modes when you need complete control. The result is a unified search interface that adapts to what you’re looking for, not the other way around.

Note that we’ve left the older explicit search modes in the UI while we continue to flesh out and test the UX of ABS, but long-term we expect to be able to remove them.

The new ABS is available in the “Find” search dialog and in the API using bv.search.

First Principles: Search Pattern Recognition

By default, the search engine interprets your input in the following ways, automatically detecting the most likely pattern type:

  1. FlexHex: A simplified nibble-based hex pattern language with wildcards (e.g., 53 8b 3?)
  2. Bytes-Based Regex: A regular expression operating on raw bytes and ASCII text
  3. Escaped String: Standard escape notation (e.g., \x53\x8b\x3f) is handled natively by the regular expression engine
  4. Raw String: A literal string search for ASCII text (default fallback when no other interpretation is possible)

The engine analyzes your search pattern to determine intent, eliminating the need to explicitly specify search modes in most cases. Whether you use FlexHex notation, standard regex, or escaped hex sequences, the system intelligently scans your input to find the intended patterns without requiring you to switch between different modes or settings.

This automatic detection works as follows:

┌─────────────────┐
│  User Input     │
└────────┬────────┘
         │
         ▼
┌────────────────┐     Yes     ┌─────────────────┐
│ Looks like     ├────────────►│ Process as      │
│ FlexHex?       │             │ FlexHex         │
└────────┬───────┘             └────────┬────────┘
         │ No                           │
         ▼                              │
┌────────────────┐     Yes     ┌────────▼────────┐
│ Valid Bytes    ├────────────►│ Convert to      │
│ Regex?         │             │ Regex Pattern   │
└────────┬───────┘             └────────┬────────┘
         │ No                           │
         ▼                              │
┌────────────────┐                      │
│ Treat as       │                      │
│ Literal String │                      │
└────────┬───────┘                      │
         │                              │
         ▼                              ▼
┌─────────────────────────────────────────────────┐
│               Execute Search                    │
└─────────────────────────────────────────────────┘

This intelligent parsing reduces friction while introducing minimal ambiguity. The only real edge cases occur with text that could be interpreted as either valid hexadecimal or as regex patterns. For example:

  • bv.search("CAFE") is interpreted as hex and matches the bytes 0xCA and 0xFE.
  • If you want to find the ASCII string “CAFE” (bytes 0x43, 0x41, 0x46, 0x45), use bv.search("CAFE", raw=True).

Special characters like ? or + might be interpreted as regex operators rather than literal characters. These ambiguities are easily resolved by using the raw=True parameter when you need to search for literal text rather than its hexadecimal or regex interpretation. In the UI, you can enable the “Raw string” checkbox to do the same thing.

Searching for CAFE

Accessing the Search Engine: API and UI Interfaces

The new ABS is available through both an API and the UI using the same underlying engine.

Python API Interface

For programmatic access and scripting, the Python API is available:

# Basic search with FlexHex pattern
results = bv.search("50 ?? 45")

# Literal string with optional parameters
results = bv.search("Main", raw=True, ignore_case=True)

# Regex pattern with alignment and limit (Finds items with 10+ printable ASCII characters)
results = bv.search("[\\x20-\\x7E]{10,}", align=4, limit=5)

The results are a generator object that yields the offset and a DataBuffer for each match. You can also provide optional callback functions for progress tracking and match handling:

def print_match(offset, data):
   hex_data = bytes(data).hex() # convert the match bytes to a hex string
   print(f"Found match at offset 0x{offset:X}, data={hex_data}")
   return True # return True to keep searching

list(bv.search("ServiceMain", match_callback=print_match))

User Interface

The Find dialog provides the "Advanced Binary Search" type, offering real-time feedback of the detected search mode. This dialog is accessible via the Find... action in the Edit menu, or by pressing Ctrl/CMD+F.

Advanced Binary Search UI

There are three possible search modes:

  • FlexHex: A simplified hex pattern language with wildcards
  • Regex: A bytes-based regular expression
  • Raw String: A literal ASCII string search

The UI automatically interprets your inputs using the same intelligent pattern detection as the API. This immediate feedback helps you understand exactly how the search engine will process your pattern, reducing trial-and-error and making complex searches more intuitive.

FlexHex Mode

FlexHex provides an intuitive syntax for byte pattern matching with wildcards. This is particularly valuable for signatures with variable bytes or when you only care about parts of a pattern.

FlexHex Syntax Reference

Syntax Description Example Matches
XX Exact byte 4F Byte 0x4F
?X Wildcard high nibble ?F Any byte ending with 0xF (0x0F, 0x1F, 0x2F, etc.)
X? Wildcard low nibble 4? Any byte starting with 0x4 (0x40, 0x41, 0x42, etc.)
?? Full wildcard byte ?? Any byte

Key points about FlexHex patterns:

  • patterns are space-insensitive, allowing you to write patterns with or without spaces.
  • patterns must consist of an even number of hexadecimal characters, as each pair represents a single byte.

How FlexHex Works

Under the hood, FlexHex patterns are transformed into regex patterns:

  • 50 becomes \x50 (exact byte match)
  • ?? becomes [\x00-\xff] (match any byte)
  • ?F expands to a character class matching all bytes ending with F
  • 4? becomes [\x40-\x4f] (match any byte in the range)

This transformation maintains readability while leveraging the power of the regex engine.

Bytes-Based Regex Mode

When you need more complex matching patterns you can use the full power of regular expressions. Unlike text-based regex engines, a bytes-based implementation operates directly on raw binary data, making it ideal for reverse engineering tasks. We leverage the Rust regex::bytes crate to provide a robust and efficient engine for binary data. By default, Binary Ninja configures the engine to operate in ASCII compatible mode with the dot-all mode enabled, allowing the '.' character to match any byte including '\n'. In other words, default flags are (?s-u). For more information on regex syntax, see the Rust regex::bytes crate documentation.

Practical Examples

# Match sequences of 10+ printable ASCII characters
bv.search("[\\x20-\\x7E]{10,}")

# Find potential IPv4 addresses
bv.search("192\\.168\\.\\d{1,3}\\.\\d{1,3}")

# Locate PEM-style certificate headers
bv.search("-----BEGIN.*?-----")

# Mixed hex and ASCII pattern for HTTP protocol headers
bv.search("\\x48\\x54\\x54\\x50/[0-9]\\.[0-9]")  # HTTP/1.x

Raw String Mode

When you need to search for literal ASCII text, the Raw String mode is the most straightforward option. This mode is ideal for finding strings, function names, or other text patterns within the binary data.

Basic String Searches

# Special case: ensuring literal interpretation
search("BEEF", raw=True)  # Finds the ASCII string "BEEF" (0x42, 0x45, 0x45, 0x46), not the hex bytes 0xBE, 0xEF

# When searching for simple ascii strings, Regex mode or Raw String mode both provide the same result
search("ServiceMain")
search("CreateProcess")

# For strings where special characters are present, Raw String mode is the best choice
search("SELECT * FROM", raw=True)
search("Content-Type:", raw=True)

Considerations and Important ???

Case Sensitivity in All Search Modes

All search modes respect the case sensitivity option. By default, searches are case-sensitive. If opting for case-insensitive searches, it affects ASCII alphabetic characters in text patterns, or hexadecimal bytes that represent ASCII alphabetic characters.

For example, searching for the byte 0x41 (‘A’) will match both 0x41 and 0x61 (‘a’) with a case insensitive search.

Overlapping Matches

For certain analysis tasks, you may need to find every possible match, even when they overlap:

# Find all potential 32-bit addresses
search("[A-F0-9]{8}", overlap=True)

# Locate overlapping instruction patterns
search("\\x8B\\x4D..", overlap=True)

Without overlap=True, the search would skip ahead past each match before continuing, potentially missing valuable patterns. This is particularly important when:

  • Hunting for instruction patterns that might be part of multiple larger patterns
  • Looking for data structures that may be embedded within other structures
  • Analyzing encoded or obfuscated data where patterns might overlap

Equivalent Ways to Specify Byte Patterns

Sometimes you’ll see different Python expressions that all yield the same byte pattern. For instance:

list(bv.search(b"\x53\x8b\x3f".hex())) # Regex Mode
list(bv.search(r"\x53\x8b\x3f")) # Regex Mode
list(bv.search("\\x53\\x8b\\x3f")) # Regex Mode
list(bv.search("53 8B 3F")) # FlexHex Mode

All of these expressions are equivalent and will match the same byte pattern (0x53 0x8B 0x3F). The variations simply reflect different ways of escaping backslashes and handling raw strings in Python. Importantly, the pattern parameter for the search function is always a Python str type, not bytes.

Search Options Reference

The search engine supports several options to customize how patterns are matched:

Option Type Default Description
pattern string Required The pattern to search for
start integer BinaryView.start The address to start the search from
end integer BinaryView.end-1 The address to end the search (inclusive)
raw boolean False When True, treats the pattern as a literal string
ignore_case boolean False Makes the search case-insensitive for ASCII alphabetic characters and cooresponding hex bytes
overlap boolean False When True, allows matches to begin within previous matches
align integer 1 Only reports matches that start at offsets divisible by this value (must be a power of 2)
limit integer None Maximum number of matches to return
progress_callback function None Optional function to track progress and allow cancellation
match_callback function None Optional function called for each match

Future Directions

We’re excited about the potential for further enhancements to Binary Ninja by leveraging a bytes-based regular expression engine. This feature is evolving as we gather feedback and identify additional use cases. One area of particular interest is structured or typed searching—for example, allowing users to easily search for 32-bit or 64-bit integer constants (in either endianness), specific floating-point patterns, instruction and/or string encodings. By letting users specify higher-level “shapes” or “types” rather than raw bytes, we aim to provide more targeted search results, ultimately streamlining reverse engineering workflows.

On the UI side, we have some clean-up to do. There are some existing Find types (Escaped string, Hex string, and Raw string) that could be deprecated in favor of the Advanced Binary Search type.

Conclusion

The Advanced Binary Search capability gives reverse engineers a new intuitive, powerful way to interact with binary data. By intelligently detecting search intent and providing a unified interface across both text and binary patterns, it eliminates much of the friction typically associated with search. Whether you’re analyzing malware, reverse engineering proprietary protocols, or hunting for vulnerabilities, this search engine helps you focus on the analysis rather than the mechanics of the search process.

We welcome feedback on these features and ideas for additional capabilities that would improve your reverse engineering workflow.