Can I allocate only the required amount of memory for tagged unions?

timfayz · January 11, 2024, 9:22am

This question is about allocating less memory if possible. The snipped below is a contrived example of an actual code.

The snippet defines a tagged union with two types of Nodes: Type1 of 1 byte size, and Type2 of 2 (in real example, there are about 9 types and sizes range from 24 to 72 bytes):

const std = @import("std");

var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
const allocator = arena.allocator();

const Node = union(enum) {
    Type1: u8,
    Type2: struct { f1: u8, f2: u8 },

    pub fn increment(self: *Node) void {
        switch (self.*) {
            .Type1 => self.Type1 += 1,    // simulation of
            .Type2 => self.Type2.f1 += 1, // something useful
        }
    }
};

fn NodeFieldType(comptime field: anytype) type {
    return std.meta.fieldInfo(Node, field).type;
}

pub fn main() !void {
    std.debug.print("Size of {s}:       {d}\n", .{ "Node", @sizeOf(Node) });
    std.debug.print("Size of {s}: {d}\n", .{ "Node.Type1", @sizeOf(NodeFieldType(.Type1)) });
    std.debug.print("Size of {s}: {d}\n", .{ "Node.Type2", @sizeOf(NodeFieldType(.Type2)) });

    const node: *Node = try allocator.create(NodeFieldType(.Type1));
    node.* = .{ .Type2 = .{ .f1 = 1, .f2 = 2 } };
    node.increment();
}

This code is not supposed to run correctly, so here is the expected error:

howtos/allocateUnionField.zig:26:25: error: expected type '*allocateUnionField.Node', found '*u8'
    const node: *Node = try allocator.create(NodeFieldType(.Type1));
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
howtos/allocateUnionField.zig:26:25: note: pointer type child 'u8' cannot cast into pointer type child 'allocateUnionField.Node'

The story is that in my actual code I reached the part where I need to allocate nodes and build a tree, interlinking them. I realized that I can allocate only the necessary amount of memory for every node type in the union. However, I quickly understood that the type changes from *Node to smth like *Node.Union__struct_1234 and I loose the generic Node interface that provide utils for all node types, like the one increment() I provided above.

So my question is that can I (or even should I) get around this type matching thing? It feels like it is possible and not so much changes are needed.

Cloudef · January 11, 2024, 9:56am

If you want to save memory you have to have a union that has pointers and not types.

const Node = union (enum) {
   .type1: *NodeV1,
   .type2: *NodeV2,
};

Then you store allocated data behind those pointers. This makes it so every member is of the same size. Pointer is 64 bit on x86-64 for example. If you have some smaller node, it might still make sense to not use pointer for it even in this scenario.

IntegratedQuantum · January 11, 2024, 9:58am

First of all a union in zig has no defined memory layout. The compiler can for example add hidden fields in the background (this is for example done for untagged unions to detect a wrong field access).
In order to get defined memory layout you’d need to use a packed or extern union.
Secondly there is no library functionality for this. You’d need to manually allocate the right amount of memory with the right alignment and then do a pointer cast. And of course this is rather unsafe, you cannot change the union field without re-allocating it.
Instead I’d do what @Cloudef suggested. This is much safer and easier to use.

LucasSantos91 · January 11, 2024, 10:51am

When you access a union, the compiler is free to assume that the entire union is there. That would be a problem if you allocated a smaller amount of memory than the union takes. Also, the tag is usually stored at the end, because it’s usually a small number, in order to prevent padding from alignment. So, if you use the standard union, you would have to allocate the whole thing just to put the tag at the end.
You could make your own variable-length tagged union, something like this:

const Tagged = extern struct{
  tag: u8,
  payload: u8,
};

The address of the payload corresponds to where the data starts, but you would have to be mindful of alignment. The tag at the beggining will mess up your alignment, and you would have to manually allocate extra memory and align the data inside. Every time you read one of these, you would need to do the opposite calculation to know where the data actually is.
The pointer union people mentioned is one option, but it reminds us a lot of object oriented virtual polymorphism, with all it’s problems.
You might be better off with a data oriented approach:

const Entry = struct{
  tag: Tag
  index: usize
};

const Nodes1 = std.ArrayList(Node1)
const Nodes2 = std.ArralList(Node2)

Now you store the actual payload in a dedicated container for that type, and you glue all the types together with a simple and homogenous container that just stores what object is there and where to find it.

timfayz · January 11, 2024, 11:42am

Great answers. Feels like @Cloudef is the way to go but @LucasSantos91’s sounds more efficient with a bit of management and organization. I need to think about them and try.

Cloudef · January 11, 2024, 12:01pm

For the second zig provides std.MultiArrayList

LucasSantos91 · January 11, 2024, 12:30pm

That wouldn’t work. Multiarraylist assumes that there will be the same number of elements in each row. It simply takes a struct and “de-structure” it into different arrays. Here we would need lists with different numbers of elements.

Cloudef · January 11, 2024, 12:39pm

Ah yes, oversight on that from my part. I think you could do generic type for this use case as well though.

timfayz · January 11, 2024, 3:42pm

@Cloudef, is this similar to what you suggested? Feels like I’m going the wrong path.

const std = @import("std");

var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
const allocator = arena.allocator();
const Tag = std.meta.Tag;

const Node = union(enum) {
    Foo: *struct { shared: u8, text: []const u8 },
    Bar: *struct { shared: u8, num: f64 },

    fn Type(comptime field: anytype) type {
        return @typeInfo(std.meta.fieldInfo(Node, field).type).Pointer.child;
    }

    pub fn init(alloc: std.mem.Allocator, node_type: Tag(Node)) !Node {
        switch (node_type) {
            .Foo => {
                const foo = try alloc.create(Node.Type(.Foo));
                foo.* = .{ .shared = 1, .text = "hello" };
                return Node{ .Foo = foo };
            },
            .Bar => {
                const bar = try alloc.create(Node.Type(.Bar));
                bar.* = .{ .shared = 1, .num = 4.2 };
                return Node{ .Bar = bar };
            },
        }
    }

    pub fn selfPrint(self: *Node) void {
        switch (self.*) {
            .Foo => std.debug.print("{any}\n", .{self.Foo}),
            .Bar => std.debug.print("{any}\n", .{self.Bar}),
        }
    }
};

pub fn main() !void {
    var node = try Node.init(allocator, .Bar);
    node.selfPrint();
}

Cloudef · January 11, 2024, 4:36pm

That looks good to me.

Sze · January 11, 2024, 8:25pm

Here are some suggestions:

using @unionInit, std.meta.TagPayload and std.meta.Child
using switch with inline else
avoiding the arena (often better to first make sure your program works without an arena too, if your alloc always needs to be an arena, then call it arena or arena_alloc?)

const std = @import("std");

const Node = union(enum) {
    const Tag = std.meta.Tag(Node);

    Foo: *struct { shared: u8, text: []const u8 },
    Bar: *struct { shared: u8, num: f64 },

    pub fn Value(comptime tag: Tag) type {
        return std.meta.Child(std.meta.TagPayload(Node, tag));
    }

    pub fn initPtr(comptime tag: Tag, payload_ptr: std.meta.TagPayload(Node, tag)) !Node {
        return @unionInit(Node, @tagName(tag), payload_ptr);
    }

    pub fn init(alloc: std.mem.Allocator, comptime tag: Tag, payload_value: Value(tag)) !Node {
        const ptr = try alloc.create(@TypeOf(payload_value));
        ptr.* = payload_value;
        return initPtr(tag, ptr);
    }

    pub fn deinit(self: *Node, alloc: std.mem.Allocator) void {
        switch (self.*) {
            inline else => |n| alloc.destroy(n),
        }
    }

    pub fn selfPrint(self: *Node) void {
        switch (self.*) {
            inline else => |n| std.debug.print("{any}\n", .{n}),
        }
    }
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    const Foo = Node.Value(.Foo);
    const foo = try allocator.create(Foo);
    foo.* = .{ .shared = 1, .text = "hello" };

    var foo_node = try Node.initPtr(.Foo, foo);
    defer foo_node.deinit(allocator);
    foo_node.selfPrint();

    var bar_node = try Node.init(allocator, .Bar, .{ .shared = 1, .num = 4.2 });
    defer bar_node.deinit(allocator);
    bar_node.selfPrint();
}

timfayz · January 12, 2024, 11:26am

Yes, inline else is already in service. The rest is new. For example, I didn’t know meta provides these utils. Thank you! That’s quite advanced stuff overall :). I was having a hard time with @unionInit, though. The documentation makes it quite difficult to understand. Am I right that this builtin was introduced because you can’t initialize a tagged union with a parameterized tag/field name? Like you can’t pass const tag_name = .Foo into const val = TaggedUnion { .$tag_name = payload } ?

Good suggestion. I just didn’t think about it. I just assumed my program will use these trees only for compilation purposes (program will not live after it’s done transforming source code into a tree and the tree into another source).

It is not the case but it’s good to know that you can hint a user of a library that the parameter should be an arena by calling it so (arena_alloc). However, I think you can simply set the type of the parameter to hardfix it, can’t you?

Finally, this part:

pub fn initPtr(comptime tag: Tag, payload_ptr: std.meta.TagPayload(Node, tag)) !Node {
    return @unionInit(Node, @tagName(tag), payload_ptr);
}

pub fn init(alloc: std.mem.Allocator, comptime tag: Tag, payload_value: Value(tag)) !Node {
    const ptr = try alloc.create(@TypeOf(payload_value));
    ptr.* = payload_value;
    return initPtr(tag, ptr);
}

The problem is that I need to be able to initialize a particular node type at runtime. I can’t provide the tag literal in advance. The tag of the node I want to create will be available upon reading the user input (source code).

timfayz · January 12, 2024, 11:38am

Maybe this way?

pub fn init(alloc: std.mem.Allocator, arch_tag: ArchTag) !Node {
    switch (arch_tag) {
        inline else => |tag| {
            const node = try alloc.create(Node.ArchStruct(tag));
            // Right now, not sure how to pass payload at runtime
            // so I simply rely on defaults.
            node.* = .{}; 
            return @unionInit(Node, @tagName(tag), node);
        },
    }
}

Sze · January 13, 2024, 3:39am

I think it might be possible by constructing complicated struct literals at comptime instead, but I haven’t tried because @unionInit is the easier (and maybe only) way to do it.

If you only want to allow ArenaAllocator you could accept a pointer to that directly, but there are cases where you want the allocator to behave in some way, that isn’t directly enforced by the std.mem.Allocator interface, but you still don’t want to restrict it more then necessary so that the user can choose.

So basically I meant, you could communicate to the user in some way, that your data structure is going to leak its allocations. The user then has the choice to pass an ArenaAllocator, or a big enough FixedBufferAllocator, or something similar, that can be reset in some way. That way the user is still able to use the datastructure without leaking memory globally until the program exits.
When I see an allocator called arena I just assume that it is going to leak memory,
but maybe we should have some community chosen names somewhere for different kinds of allocators, maybe there is already something somewhere.

I don’t understand how this is a problem, the moment you have the tag you can switch on it and within the different prongs of that switch the tag is compile time known. Take a look at random_node or default_node below.

I think you have a misconception thinking it can’t be comptime, but the switch can choose between different code paths at run time, where these pathes can have different compile time known tag values, either by using inline else => |comptime_tag| or by you knowing that in this branch for example .Foo the tag is .Foo.

If you have the tag but not the data for the value you can construct the default value or even just leave it undefined and then set the field values later (undefined_node), for example when you need the pointer to the node before you are done parsing the full value for it.

All the inline else switches can be put into their own helper functions like defaultNode if it is something that is needed multiple times. However I don’t think it makes sense to put these switches that are used to construct the right value inside the data structure. Maybe there are cases where it makes sense to put the helper function defaultNode into the Node namespace, but I still think of it as a helper function that is just located there for some reason, while I think of the other init functions as the “real constructor functions” because they are more basic and defaultNode just branches from runtime tag to comptime tag.

If I wrote some of my own code and I knew what I needed then I may end up writing functions like defaultNode directly in Node, if I only need default nodes. The way I write it below is more for the case, where the data structure may be used in different ways and you want to keep the options open. I think it makes sense to think about the general cases, put the different helper things in separate functions that just boil down to the same basic functions. Writing specific things is good, but only if it doesn’t unnecessarily prescribe/restrict how it can be used on the call/use-site.

Here are a bunch more variations, the program uses random to switch between different tags, run it multiple times:

const std = @import("std");

const Node = union(enum) {
    const Tag = std.meta.Tag(Node);

    Foo: *struct { shared: u8 = 0, text: []const u8 = "foo" },
    Bar: *struct { shared: u8 = 1, num: f64 = 42.45656 },

    pub fn Value(comptime tag: Tag) type {
        return std.meta.Child(std.meta.TagPayload(Node, tag));
    }

    pub fn initPtr(comptime tag: Tag, payload_ptr: std.meta.TagPayload(Node, tag)) Node {
        return @unionInit(Node, @tagName(tag), payload_ptr);
    }

    pub fn init(alloc: std.mem.Allocator, comptime tag: Tag, payload_value: Value(tag)) !Node {
        const ptr = try alloc.create(@TypeOf(payload_value));
        ptr.* = payload_value;
        return initPtr(tag, ptr);
    }

    pub fn initPtrRuntime(tag: Tag, type_erased_payload_ptr: *anyopaque) Node {
        return switch (tag) {
            inline else => |comptime_tag| initPtr(comptime_tag, @ptrCast(@alignCast(type_erased_payload_ptr))),
        };
    }

    pub fn deinit(self: Node, alloc: std.mem.Allocator) void {
        switch (self) {
            inline else => |n| alloc.destroy(n),
        }
    }

    pub fn copy(self: Node, alloc: std.mem.Allocator) !Node {
        return switch (self) {
            inline else => |value_ptr, tag| try init(alloc, tag, value_ptr.*),
        };
    }

    pub fn print(self: *const Node, heading: []const u8) void {
        std.debug.print("{s}\n    ", .{heading});
        switch (self.*) {
            .Foo => |f| std.debug.print("Foo shared: {d}   text: {s}\n", .{ f.shared, f.text }),
            .Bar => |f| std.debug.print("Bar shared: {d}   num:  {e}\n", .{ f.shared, f.num }),
        }
        std.debug.print("\n", .{});
    }
};

pub fn defaultNode(alloc: std.mem.Allocator, tag: Node.Tag) !Node {
    return switch (tag) {
        inline else => |comptime_tag| Node.init(alloc, comptime_tag, .{}),
    };
}

const RndGen = std.rand.DefaultPrng;

pub fn pickRandomEnum(comptime T: type, random: std.rand.Random) T {
    const enum_fields = std.meta.fields(T);
    return @enumFromInt(random.uintLessThan(u8, enum_fields.len));
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    const seed: u64 = @intCast(std.time.nanoTimestamp());
    var rng = RndGen.init(seed);
    const random = rng.random();

    const tag = pickRandomEnum(Node.Tag, random);
    // use a switch at the call site where you know the runtime value of tag
    // to switch over the different possibilities within the prongs of the switch
    // you know the tag type at comptime because you know that that prong will only
    // be selected for that type
    // so there is no need for the payload_value to be of an unknown type, we already know its type,
    // just by using a switch
    const random_node: Node = switch (tag) {
        .Foo => try Node.init(allocator, .Foo, .{ .shared = 4, .text = "fooooo" }),
        .Bar => try Node.init(allocator, .Bar, .{ .shared = 9, .num = 25.030303 }),
    };
    defer random_node.deinit(allocator);
    random_node.print("random");

    // if multiple prongs can be constructed in a similar way we can use an inline else switch
    // to generate the different cases
    const shared_random_node: Node = switch (pickRandomEnum(Node.Tag, random)) {
        inline else => |comptime_tag| try Node.init(allocator, comptime_tag, .{ .shared = @intFromEnum(comptime_tag) }),
    };
    defer shared_random_node.deinit(allocator);
    shared_random_node.print("shared_random");

    const default_node: Node = try defaultNode(allocator, pickRandomEnum(Node.Tag, random));
    defer default_node.deinit(allocator);
    default_node.print("default_node");

    const Foo = Node.Value(.Foo);
    const foo = try allocator.create(Foo);
    foo.* = .{ .shared = 1, .text = "hello" };
    var foo_node = Node.initPtr(.Foo, foo);
    defer foo_node.deinit(allocator);
    foo_node.print("foo_node");

    const Bar = Node.Value(.Bar);
    const api_internal_value = try allocator.create(Bar);
    api_internal_value.* = .{ .num = 11111.22222 };

    // lets pretend some api has given us a runtime tag value and an opaque (type erased) pointer
    const api_tag: Node.Tag = .Bar;
    const api_value: *anyopaque = @ptrCast(api_internal_value);

    const api_node: Node = Node.initPtrRuntime(api_tag, api_value);
    defer api_node.deinit(allocator);
    api_node.print("api_node");

    const undefined_node: Node = switch (pickRandomEnum(Node.Tag, random)) {
        inline else => |comptime_tag| try Node.init(allocator, comptime_tag, undefined),
    };
    defer undefined_node.deinit(allocator);
    // disabled because this could print a lot of garbage if we pick the Foo node and we are unlucky
    // (or segfault if we try reading memory we aren't allowed to access)
    // undefined_node.print("undefined_node garbage");
    switch (undefined_node) {
        inline else => |ptr, comptime_tag| {
            ptr.* = .{};
            const shared: u8 = @intFromEnum(comptime_tag);
            const special = 42;
            const special_shared = special + shared;
            ptr.shared = special_shared;
        },
    }
    undefined_node.print("undefined_node fixed");

    var bar_node = try Node.init(allocator, .Bar, .{ .shared = 1, .num = 4.2 });
    defer bar_node.deinit(allocator);
    bar_node.print("bar_node");

    var copy_node = try bar_node.copy(allocator);
    defer copy_node.deinit(allocator);
    copy_node.Bar.num = -260.3333;
    copy_node.print("copy_node");
}

I can’t think of another variation at the moment, hopefully what you need is one of them, if it isn’t I am really curious about what it could be, maybe you could try adding one that is more like what you are doing.

Sze · January 13, 2024, 3:52am

This is similar to my defaultNode function just collapsed down into one function.
The part I don’t understand is this:

Maybe you can show some of your call site code, so I can understand why?
Can’t you just switch on the call site and construct the node with tag and value, like one of the variations in my other post?

Sze · January 13, 2024, 7:48am

LucasSantos91:

You might be better off with a data oriented approach:
const Entry = struct{
  tag: Tag
  index: usize
};

const Nodes1 = std.ArrayList(Node1)
const Nodes2 = std.ArralList(Node2)
Now you store the actual payload in a dedicated container for that type, and you glue all the types together with a simple and homogenous container that just stores what object is there and where to find it.

I felt this topic needed another answer that actually showed a sketch of the data oriented design style that you mentioned. Because most of the other answers are about other styles. But I think I often prefer the data oriented one.

const std = @import("std");

fn ArrayLists(comptime FieldsUnion: type) type {
    const info = @typeInfo(FieldsUnion);
    const len = info.Union.fields.len;
    var fields: [len]std.builtin.Type.StructField = undefined;
    for (info.Union.fields, 0..) |field, i| {
        const T = std.meta.Child(field.type);
        const ListType = std.ArrayListUnmanaged(T);
        fields[i] = .{
            .name = field.name,
            .type = ListType,
            .default_value = &ListType{},
            .is_comptime = false,
            .alignment = @alignOf(ListType),
        };
    }
    return @Type(.{
        .Struct = .{
            .layout = .Auto,
            .fields = &fields,
            .decls = &.{},
            .is_tuple = false,
        },
    });
}

fn Nodes(comptime FieldsUnion: type, comptime Index: type) type {
    return struct {
        const Self = @This();
        pub const Tag = std.meta.Tag(FieldsUnion);
        pub const Lists = ArrayLists(FieldsUnion);
        pub const Infos = std.ArrayListUnmanaged(Info);
        pub const Info = packed struct {
            tag: Tag,
            index: Index,
        };

        allocator: std.mem.Allocator,
        infos: Infos = .{},
        lists: Lists = .{},

        pub fn Value(comptime tag: Tag) type {
            return std.meta.Child(std.meta.TagPayload(FieldsUnion, tag));
        }

        pub fn init(allocator: std.mem.Allocator) Self {
            return .{ .allocator = allocator };
        }
        pub fn deinit(self: *Self) void {
            inline for (std.meta.fields(Lists)) |field| {
                @field(self.lists, field.name).deinit(self.allocator);
            }
            self.infos.deinit(self.allocator);
        }

        pub fn list(self: *Self, comptime tag: Tag) *@TypeOf(@field(self.lists, @tagName(tag))) {
            return &@field(self.lists, @tagName(tag));
        }

        pub fn addNode(self: *Self, comptime tag: Tag, payload_value: Value(tag)) !Index {
            const lst = self.list(tag);
            const index: Index = @intCast(lst.items.len);
            try lst.append(self.allocator, payload_value);
            errdefer _ = lst.pop();
            const infos_index: Index = @intCast(self.infos.items.len);
            try self.infos.append(self.allocator, .{ .tag = tag, .index = index });
            return infos_index;
        }

        pub fn lookup(self: *Self, comptime tag: Tag, index: Index) !*Value(tag) {
            if (self.infos.items.len <= index) return error.InvalidIndex;
            const info = self.infos.items[index];
            return self.lookupInfo(tag, info);
        }

        fn lookupInfo(self: *Self, comptime tag: Tag, info: Info) *Value(tag) {
            const lst = self.list(tag);
            return &lst.items[info.index];
        }

        pub fn get(self: *Self, index: Index) !FieldsUnion {
            if (self.infos.items.len <= index) return error.InvalidIndex;
            const info = self.infos.items[index];
            return switch (info.tag) {
                inline else => |comptime_tag| @unionInit(FieldsUnion, @tagName(comptime_tag), self.lookupInfo(comptime_tag, info)),
            };
        }
    };
}

const RndGen = std.rand.DefaultPrng;

pub fn pickRandomEnum(comptime T: type, random: std.rand.Random) T {
    const enum_fields = std.meta.fields(T);
    return @enumFromInt(random.uintLessThan(u8, enum_fields.len));
}

const Node = union(enum) {
    const Foo = struct {
        shared: u8,
        text: []const u8,

        pub fn format(self: Foo, comptime fmt: []const u8, options: std.fmt.FormatOptions, writer: anytype) !void {
            _ = fmt;
            _ = options;
            try writer.print("Foo shared: {d} text: {s}", .{ self.shared, self.text });
        }
    };
    const Bar = struct {
        shared: u8,
        num: f64,

        pub fn format(self: Bar, comptime fmt: []const u8, options: std.fmt.FormatOptions, writer: anytype) !void {
            _ = fmt;
            _ = options;
            try writer.print("Bar shared: {d} num: {d}", .{ self.shared, self.num });
        }
    };
    Foo: *Foo,
    Bar: *Bar,

    pub fn format(self: @This(), comptime fmt: []const u8, options: std.fmt.FormatOptions, writer: anytype) !void {
        switch (self) {
            inline else => |e| try e.format(fmt, options, writer),
        }
    }
};
const AstNodes = Nodes(Node, u31);

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    const seed: u64 = @intCast(std.time.nanoTimestamp());
    var rng = RndGen.init(seed);
    const random = rng.random();

    var nodes = AstNodes.init(allocator);
    defer nodes.deinit();

    const Tag = AstNodes.Tag;
    for (0..10) |_| {
        _ = switch (pickRandomEnum(Tag, random)) {
            .Foo => try nodes.addNode(.Foo, .{ .shared = 4, .text = "fooooo" }),
            .Bar => try nodes.addNode(.Bar, .{ .shared = 9, .num = 25.030303 }),
        };
    }

    _ = try nodes.addNode(.Bar, .{ .shared = 23, .num = 0 });
    const foo_index = try nodes.addNode(.Foo, .{ .shared = 42, .text = "some other text" });

    const ptr = try nodes.lookup(.Foo, foo_index);
    std.debug.print("shared {}  text {s}\n", .{ ptr.shared, ptr.text });

    const val = try nodes.get(8);
    std.debug.print("get {}\n", .{val});

    // iterate heterogenous list
    std.debug.print("\n------------------\n", .{});
    std.debug.print("heterogenous list:\n", .{});
    for (nodes.infos.items) |info| {
        switch (info.tag) {
            inline else => |tag| {
                const value = nodes.list(tag).items[info.index];
                std.debug.print("value {}\n", .{value});
            },
        }
    }

    std.debug.print("\n\n", .{});

    // iterate over all homogenous lists
    inline for (std.meta.fields(@TypeOf(nodes.lists))) |field| {
        std.debug.print("\n------------------\n", .{});
        std.debug.print("all {s}s:\n", .{field.name});
        for (@field(nodes.lists, field.name).items) |value| {
            std.debug.print("{}\n", .{value});
        }
    }
}

There are probably a bunch of things that could be improved about this implementation, like adding checking that the fields of Node are pointers, adding convenience functions, etc.

I think a way to improve beyond this (if it is needed by the application) is going towards ECS style, I think that would make sense if you have a vast number of different possible types and especially if they are composed from more basic “fragments” called Components in Entity Component Systems.

timfayz · January 13, 2024, 10:01am

That’s a lot to process!

Sze:

The part I don’t understand is this:
timfayz:
// Right now, not sure how to pass payload at runtime
// so I simply rely on defaults.
Can’t you just switch on the call site and construct the node with tag and value, like one of the variations in my other post?

You’re right. Before going to sleep, I realized the same thing you’ve already shown in defaultNode() – we can “go around” comptime tags using a switch where necessary.

It’s interesting to mention that I didn’t know a switch over tagged unions can capture a payload:

Also, the @Type builtin, which I’ve seen several times but never seen how I might be using it:

There are many other things I didn’t know and need to learn from. Thank you for the heavy lifting. I think the data-oriented tree organization is a bit beyond me right now, even though I see what’s happening in there. I have much more mundane concerns to think right now (see below).

Hopefully, this is the last thing in this topic. What was bothering me last night was the use of *Node vs Node for interlinking nodes and writing generic methods. In case of interlinking, it feels like if I add, for example, a shared field children for every node type, it should be of type []Node rather than []*Node. The pointer to a tagged union feels redundant as it already contains one to a concrete node struct + the pointer itself is residing on the stack (so to make *Node stable, we will need to do another allocation for it). Given that, I think I need write it this way:

const std = @import("std");

pub fn main() !void {
    const alloc = std.heap.page_allocator;

    const foo1 = try Node.init(alloc, .Foo, .{});
    foo1.fooSetText("hello");
    const bar1 = try Node.init(alloc, .Bar, .{});
    bar1.barSetNum(42);

    try foo1.addNext(bar1);
    foo1.selfPrint();
}

const Node = union(enum) {
    const Tag = std.meta.Tag(Node);
    pub fn Struct(comptime tag: Tag) type {
        return std.meta.Child(std.meta.TagPayload(Node, tag));
    }

    Foo: *struct { text: []const u8 = "foo", next: std.ArrayList(Node) = undefined },
    Bar: *struct { num: u8 = 255, next: std.ArrayList(Node) = undefined },

    pub fn init(alloc: std.mem.Allocator, comptime tag: Tag, payload: Struct(tag)) !Node {
        const node = try alloc.create(Struct(tag));
        node.* = payload;
        node.next = std.ArrayList(Node).init(alloc); // next should be ignored, if we don't wanna bothering users with manual list init?
        return @unionInit(Node, @tagName(tag), node);
    }

    // Note: Maybe, for reliability purposes, I should've put those fooSetText()
    // and barSetNum() inside their respective structs to eliminate switches
    // and strange panic prongs. What do you think?

    /// Example of a Foo setter.
    pub fn fooSetText(self: Node, text: []const u8) void {
        switch (self) {
            .Foo => self.Foo.text = text,
            else => @panic("This method should not be used on the wrong node type."),
        }
    }

    /// Example of a Bar setter.
    pub fn barSetNum(self: Node, num: u8) void {
        switch (self) {
            .Bar => self.Bar.num = num,
            else => @panic("This method should not be used on the wrong node type."),
        }
    }

    /// Example of a shared setter.
    pub fn setNext(self: Node, next: Node) void {
        switch (self) {
            inline else => |node| node.next = next,
        }
    }

    /// In some cases, it seems we may use a const pointer to ensure this method
    /// does not accidentally modify the data.
    pub fn selfPrint(self: *const Node) void {
        switch (self.*) {
            inline else => |node| std.debug.print("{any}\n", .{node}),
        }
    }

    /// Otherwise, I don't see any issues with using just Node, which implicitly
    /// hides a stable pointer, whereas the tag is literal and always stable.
    pub fn addNext(self: Node, child: Node) !void {
        switch (self) {
            inline else => |node| try node.next.append(child),
        }
    }
};

Does it look ok?

Sze · January 13, 2024, 11:09am

I am not a big fan of setters instead I would just alloc a Foo instance set some values on it and then pass it to initPtr to wrap it in a node. Or if you already have a node you can just use node.Foo.text = "some new text" instead of calling a fooSetText function. I guess the shared setter makes sense, in this example.

yes

The one thing that makes me wonder a bit is all the stateful manipulation of the tree.
If what you are building just parses something and generates some other output from it, it seems there is too much “object noodle soup” going on here.
It seems like you are doing something more dynamic, like parsing something building up the tree, interacting with it (or executing it like a tree walking interpreter) and then generating something from it.

From reading between the lines I also wonder whether you are worrying about design too much, if you only need a few Nodes then it doesn’t matter that much. If you have a lot of nodes it should be relatively simple to sum up the memory needs between different designs and use those insights to decide whether it makes sense to optimize it more.

timfayz · January 13, 2024, 4:18pm

Exactly. That will be the case when I get to that. I’m just in the very early stages.

I think I am. I was very curious about the options because I’ve never done something complex in system languages like Zig. And in Zig, it feels good. The comptime thing is indeed very powerful in the right hands.

When I first saw initPtr, it was a bit confusing, but now I think I understand why you created it separately.

Those were indeed insights. Thank you again!