Abelha: A Parser Combinator library inspired nom(Rust)

I’d like to share Abelha, a parser combinator library inspired by Rust’s nom.
It provides a structured way to build parsers in Zig, focusing on composability and ease of use.

Features

  • Nom-style combinators – Intuitive API for constructing parsers.
  • Pre-built parsers – Common parsing functions are already available.
  • API reference – Hosted on GitHub Pages.
  • Used in real projects – Check out markdown-zig, a Markdown parser built with Abelha.
const std = @import("std");
const ab = @import("abelha");

const ParseResult = ab.ParseResult;
const tag = ab.bytes.tag;
const take = ab.bytes.take;
const separated_list1 = ab.multi.separated_list1;

fn parseHex(input: []const u8) !ParseResult(u8) {
    const res = try take(2)(input);
    const hex = try std.fmt.parseInt(u8, res.result, 16);
    return ParseResult(u8){ .rest = res.rest, .result = hex };
}

fn hexColor(input: []const u8) !ParseResult([]const u8) {
    const result = try tag("#")(input);
    const res = try separated_list1(
        u8,
        tag(""),
        parseHex,
    )(result.rest);
    return res;
}

test {
    const text = "#1A2B3C";
    const result = try hexColor(text);
    const answer = [_]u8{ 0x1a, 0x2b, 0x3c };
    try std.testing.expectEqualSlices(u8, &answer, result.result);
}

If you’re looking for a parser combinator in Zig, feel free to take a look:

Feedback and contributions are always welcome!

8 Likes

Excellent! I’ve been looking forward to some actual parsing libraries to appear in Zig, since Rust had spoiled me a ton when it comes to quality choices in this area :slight_smile:

nom isn’t exactly my favorite (I prefer the Pest way, personally) but your version looks quite nice and clean. Since this an early version, I’ll take the liberty to raise a crucial issue that the current API doesn’t seem to handle all that well.

Parsing errors.

Because Zig doesn’t have error payloads like Rust does, one needs to put more effort error API, and in parsers this is especially important. Right now you have ParseFunc which returns anyerror and that’s not gonna cut it for anything more complex than your example of CSS colors.

You’ll probably need to follow the “diagnostics pattern” that’s common in Zig and have the parsing function take some kind of context object to which they can push the errors. At the very least, the user’s parsing code needs to be able to provide a string message on errors, and ideally the logic that drives the parsing would automatically attach line/column numbers + the exact rule (parsing function) that failed.

This likely means you’d need to introduce an explicit Parser struct rather than having the user invoke the top-level ParseFunc directly. I believe it might actually simplify the API, since Parser/context could now hold the ParseResult; which means that one wouldn’t have spell “result” in five different ways inside a parsing function, and risk mixing up input with result.rest or res.rest or outcome.rest, etc.

Here’s a quick idea as to how this API could look like.

fn parseHex(ctx: *ParseContext, input: []const u8) !ParseResult(u8) {
    const hex_str = try ctx.parse(take(2));
    const hex = std.fmt.parseInt(u8, hex_str, 16) catch |err| {
        // ctx remembers it last executed take(2) on its remaining input
        ctx.err("invalid hex string: {!}", .{err});
        return err; // or perhaps a predefined error, like error.Postprocess
    };
    return ParseResult(u8).init(hex);
}

fn hexColor(ctx: *ParseContext, input: []const u8) !ParseResult([]const u8) {
    try ctx.parse(tag("#"));
    return ctx.parse(separated_list1(u8, tag(""), parseHex);
}

test {
    const text = "#1A2B3C";
    const parser = ab.Parser.init(hexColor); // init with top-level ParseFunc
    const result = try parser.parse(text);
    // on failure, `parser` should have the error(s)
    const answer = [_]u8{ 0x1a, 0x2b, 0x3c };
    try std.testing.expectEqualSlices(u8, &answer, result.result);
}
8 Likes

Thanks for the improvement suggestions!

I too feel that the error API in Abelha needs to be improved.

I am considering various implementations such as using tagged union(enum{Ok,Err)) to mimic Rust’s Result type, or using a library (or user) defined Parser structure (exactly like you suggested).

I will implement the best way considering what Abelha needs, including your great improvement suggestions!