Binary parsing with Kaitai

Luke · September 24, 2023, 4:18pm

Say I have a Kaitai definition (as a Yaml file) and want to parse a binary file with it. How would I go about it with zig?

Is there a way to parse Yaml file at compile time and generate a data structure and a parser that match the Kaitai definition?

Does this exists already?

kubkon · September 24, 2023, 6:33pm

For yaml parser, you might wanna try GitHub - kubkon/zig-yaml: YAML parser for Zig

Luke · September 24, 2023, 7:09pm

I’ve seen it but with my limited knowledge of zig I gathered it could not perform compile time parsing because of the use of allocator. But I could be completely wrong.

IntegratedQuantum · September 24, 2023, 8:29pm

You can use the FixedBufferAllocator at compile time:

comptime {
    var buf: [12345]u8 = undefined;
    var fba = FixedBufferAllocator.init(&buf);
    try parseYamlStuff(fba.allocator());
}

Luke · September 25, 2023, 11:06am

Tried but I’ve got some failures.
First Tokenkizer.zig is using some log calls that fail at comptime. It’s easy to comment those out.
However, it still fails later when allocating:

/snap/zig/8870/lib/std/mem.zig:3809:18: error: unable to evaluate comptime expression
    const addr = @intFromPtr(ptr);
                 ^~~~~~~~~~~~~~~~
/snap/zig/8870/lib/std/mem.zig:3809:30: note: operation is runtime due to this operand
    const addr = @intFromPtr(ptr);
                             ^~~
/snap/zig/8870/lib/std/heap.zig:426:50: note: called from here
        const adjust_off = mem.alignPointerOffset(self.buffer.ptr + self.end_index, ptr_align) orelse return null;
                           ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/snap/zig/8870/lib/std/mem/Allocator.zig:86:29: note: called from here
    return self.vtable.alloc(self.ptr, len, ptr_align, ret_addr);
           ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/snap/zig/8870/lib/std/mem/Allocator.zig:225:35: note: called from here
    const byte_ptr = self.rawAlloc(byte_count, log2a(alignment), return_address) orelse return Error.OutOfMemory;
                     ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/snap/zig/8870/lib/std/mem/Allocator.zig:211:40: note: called from here
    return self.allocBytesWithAlignment(alignment, byte_count, return_address);
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/snap/zig/8870/lib/std/mem/Allocator.zig:205:75: note: called from here
    const ptr: [*]align(a) T = @ptrCast(try self.allocWithSizeAndAlignment(@sizeOf(T), a, n, return_address));
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/snap/zig/8870/lib/std/mem/Allocator.zig:193:41: note: called from here
    return self.allocAdvancedWithRetAddr(T, alignment, n, @returnAddress());
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/snap/zig/8870/lib/std/array_list.zig:403:67: note: called from here
                const new_memory = try self.allocator.alignedAlloc(T, alignment, new_capacity);
                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/snap/zig/8870/lib/std/array_list.zig:379:51: note: called from here
            return self.ensureTotalCapacityPrecise(better_capacity);
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
/snap/zig/8870/lib/std/array_list.zig:426:41: note: called from here
            try self.ensureTotalCapacity(self.items.len + 1);
                ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/snap/zig/8870/lib/std/array_list.zig:207:49: note: called from here
            const new_item_ptr = try self.addOne();
                                     ~~~~~~~~~~~^~
src/parse.zig:285:30: note: called from here
            try tokens.append(token);
                ~~~~~~~~~~~~~^~~~~~~
src/yaml.zig:309:23: note: called from here
        try tree.parse(source);
            ~~~~~~~~~~^~~~~~~~
src/katai.zig:22:23: note: called from here
    try yaml.Yaml.load(allocator, definition);
        ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~

I tried to replace the arena allocators of the module by the FixedBuffer but to no avail.

IntegratedQuantum · September 25, 2023, 11:30am

Oh actually it seems that the FixedBufferAllocator doesn’t work in comptime yet, but it is a feature that is supposed to work eventually: Making (FixedBuffer)Allocator available at comptime · Issue #14931 · ziglang/zig · GitHub

I would suggest, that you just parse your files at runtime until that issue gets resolved. Or is there a particular reason for doing it at comptime?
It also wouldn’t hurt to comment under that github issue, so the team knows that this is important to people.