I’ve been working on regex.zig, a native regular expression engine for Zig in the RE2 family.
The main goal is an eventual production-grade Zig regex package with linear-time matching semantics, instead of a limited-scoped engine or a wrapper around another library.
Current status:
- Pike VM backend
- literals, concatenation, alternation
- capturing and non-capturing groups
- repetition operators including lazy forms
- Perl classes, bracket classes, POSIX classes
- assertions and boundaries:
^,$,\A,\z,\b,\B - global flags through compile options:
i,m,s,U - leftmost-first search semantics
- ASCII-only for now
The repo is here:
Small example:
const std = @import("std");
const Regex = @import("regex");
pub fn main() !void {
const gpa = std.heap.page_allocator;
var re = try Regex.compile(gpa, "(\\d\\d)/(\\d\\d)/(\\d\\d\\d\\d)", .{});
defer re.deinit();
if (re.find("date=03/18/2026")) |m| {
std.debug.print("match at [{}, {})\n", .{ m.start, m.end });
}
}
Public API surface (with comments):
pub fn compile(gpa: Allocator, pattern: []const u8, options: Options) !Regex
pub fn match(re: *Regex, haystack: []const u8) bool
pub fn find(re: *Regex, haystack: []const u8) ?Match
pub fn findCaptures(re: *Regex, haystack: []const u8, buffer: []?Match) ?Captures
pub fn findCapturesAlloc(re: *Regex, gpa: Allocator, haystack: []const u8) !?Captures
A few things that might be different in this package compared to other Zig regex work:
- It is a RE2-family engine, which means that it has
O(m * n)matching, but the downside is not every PCRE-style feature will be implemented. - The test setup is fairly serious already. Supported and unsupported behavior is tracked in an explicit capability matrix, which is used during development as capability gate, as well as to test other backends later on. This is inspired by rust-lang/regex setup.
- The available Pike VM backend has typical optimizations of a Pike VM backend: query-cost split between
match/find/findCaptures, sparse-set thread dedup, reused thread lists; plus literal-prefix fast path for unanchored search.
There are many things in the pipeline, including:
- fuller syntax coverage
- API refinement, better docs
- more backends beyond Pike VM
- longer term, Unicode support
So it is far from finished, but I want to post it here get some opinions from the community:
- The current public API set is mostly inspired by rust-regex. I wonder what would feel right for Zig?
findCapturesandfindCapturesAlloc()try to follow Zig-style memory ownership model, but I find it a bit clunky. I wonder if there is any obvious improvement here?- Would a future explicit input struct for bounds/anchoring be preferable to adding more top-level methods?
- Part of the reason for this repo is that I want to push SoA-style layouts where they actually make sense in Zig. In particular, I’ve thought about using something closer to the Zig compiler’s
ExtraDatastyle for variable node payloads in the parser instead of scattered slices. I held off because regex patterns are often small, and maybe it will be worse performance wise to have so much machinery? Of course to know for sure I’ll have to measure it, but I’d still like to hear how others think about that tradeoff.
If anyone wants to look at the repo and comment on API shape, package ergonomics, or internal representation choices, that would be very welcome!