Design for nested ArrayLists of ArrayLists

koalakools · June 16, 2024, 3:06am

Hello friends, recently I made a WASM file parser and have enough of it complete to handle simple programs compiled by zig to webassembly targets.

I mostly focused on learning the spec so not a lot of time on the initial design of my data structures and so a lot of it became a struct containing ArrayLists of structs that potentially had their own ArrayLists. This is modeled pretty close to how WASM binary files are layed as well.

I’m wondering if anyone has any interesting alternative designs? Its a bit clunky to work with and then also deallocate the structure. I was thinking instead of ArrayLists kept in each struct have a start and end index instead which point to a global ArrayList of unions instead for the actual entry.

Thoughts?

ktz_alias · June 16, 2024, 6:00am

For implementation of sparse matrix, I’ve often used a wrapped std.Treap.
This wrapper type is taken a coordinate (column and row) to return the value.
Because std.Treap is only required a comparator, that implementation is concise.

If this tree implementation is disadvantage to the performance, it may be useed std.AutoHashMap.

mnemnion · June 17, 2024, 6:56pm

Taking a brief look at the code, I’d use a single container type for everything in wasm-structure.zig.

It looks like that’s what the Header type is, but I question the terminology there: something called Header shouldn’t contain all the Sections, so you could do something like this:

pub const Wasm = struct {
    pub const len: u16 = 8;

    version: u32,
    alloc: Allocator,
    sections: std.ArrayListUnmanaged(Section),
 
    pub fn deinit(self: Wasm) void {
        // ...
    }
};

Which is a lead-in to my main suggestion: the Wasm struct should manage memory for everything in it, so that code which uses it can return it, and deallocate it whenever it’s done with it with defer wasm.deinit();.

That way the “destructor” code is kept with the structure it’s freeing, and doesn’t get mixed up with the parser code. Also, the ArrayLists will take up less memory, because in your current design, they all carry a pointer to the Allocator, which isn’t necessary.

koalakools · June 17, 2024, 11:18pm

Thanks for the feedback @mnemnion ! I wasn’t too sure about ArrayListUnmanaged before your comment but that makes sense since I’m passing in the same allocator each time an ArrayList is being created.

And thats a good point having deinit be in the struct instead, looking back no idea why I threw that into parser.zig

In terms of the nested array’s any comment on that? Mostly in if there’s a better way to represent this:

pub const Header = struct {
    sections: std.ArrayList(Section),
};

pub const Section = struct {
    fns: std.ArrayList(Function),
}

pub const Function = struct {
    params: std.ArrayList(ValType),
};

My first thought was to have a union of all the types (Section, Function, etc) in 1 array and have indexes instead since in parsing I would expect blocks of the same type to be together and then each struct can have a start / stop index to access them.

But of course that would be harder to use and loose a bit of type safety

Cloudef · June 18, 2024, 1:23am

I’d keep it the current way, as having bunch of different types in same array is going to increase the size of the element and thus more unfriendly to CPU cache. You may want to use std.heap.ArenaAllocator for the allocations that eventually are returned to the caller, so you can deinit the memory on single .deinit();. This is what std.json.parseFromSlice also does. Similarily it’s good idea to also provide “leaky” version where you return the structure without the arena, expecting the caller to provide arena as a allocator instead as std.json.parseFromSliceLeaky does.

There’s some things like the Data which probably does not need to copy the bytes, but instead could point to the input slice itself.