Newtype Index Pattern In Zig

22 Likes

I definitely remember you writing a similar article for Rust! Thanks for sharing, awesome reading!

1 Like

I’ve used a similar library. I can post it here.

It creates a unique enum given a type:

const Foo = struct {
    const Ind = Index.Unique(Foo);

};

Index also provides a Range type that is a start + end. I would like to use a len instead of end to allow working with ring buffers.

The Range type also provides a get function that returns a index of the correct type with start as an offset.
It’s veri useful for loop

for (0..range.len()) |i| {
    handle = range.get(i) // type safe handle
}

Index.zig (4.1 KB)

Edit.
I’ve reworked the library and it’s usage in my programs to use len instead of end.
The result is much cleaner code in my opinion.
I’m going to attach the second version also.
Index.zig (4.1 KB)

1 Like

I didn’t remember that! It’s just that when I was returning to the draft, and opened newtype-index-pattern.dj file, I got very confused why all examples are suddenly in Rust, lol :rofl:

3 Likes

minor nit: I was expecting FontWeight to be an example of a non-exhaustive enum, but the _ prong is missing.

1 Like

Doesn’t the OS already give us indexes? Don’t we already have virtual per-process address space? And can thus serialize the entire process address space to disk? I wonder if there were a few choice extensions of the virtual address space API from the OS, new efficiencies or language features could be enabled…

1 Like

OS/400-style OS’s are single address space operating systems where you can persist process state

For more common OSs, I think the benefit of direct indices is compactness (e.g. 32-bit) and not thwarting ASLR

3 Likes

By this:

If you need to store a list of items, you can often avoid materializing the list of indexes by storing a range “pointing” into the shared storage.

…do you mean (by “materializing”) merely that, rather than storing (necessarily contiguous?) individual(?) indices, you can store a {.index, .count} (“range”)? Or am I loosing a little translation here? If so, this, and the n-ary tree example, seem to be suited for a situation where the data structure, in the end, is likely to remain “static”, true? Is there any commentary on how suitable this is for a situation that requires lots of balancing, sort-adding nodes, etc.? Sorry if this is obvious to all but me….

And a tangent - does:

return tree.nodes[range.index..][0..range.count];

…have special advantages over:

return tree.nodes[range.index..range.index+range.count];

…(besides readability, perhaps)?

If you need to modify it (for example grow it), you can put the old range in a free-list of unused ranges and get or create a new one of the needed size. If you don’t know the needed size upfront you can use a reasonable guess (like std.ArrayList does) and then shrinkAndFree it once you do. Possibly keeping a capacity in a parallel array, while you still need that. Once you are done resizing, that additional meta data can be thrown away.

Freed pieces can either be reused by other lists that need that size, or alternatively you can rebuild the entire thing without all the gaps, which makes it behave somewhat similar to a data-structure specific garbage collector.

I think in the case where the count is only runtime known they should be equivalent, but I still prefer the former because I find it more readable and it doesn’t require mentioning .index twice.

1 Like

Got it, so there’s no magic I’m missing. Certainly one could also implement something like a red-black tree, e.g., with indices to contiguous storage, but where the contiguity has no special advantage other than being an unfragmented (though disordered) store that could indeed be memcpy’d or efficiently block-written to file, (etc?) The other advantages of this pattern, especially the enum index, could still be useful, potentially using symbolic labels beyond “root” and “invalid”.

you haven’t quite reached peak Newtype Index Pattern power level yet… :grinning_face:

check out some of these patterns (from wasm linker code):

The “Unpack Pattern”

Packed in-memory layout vs convenient union(enum) layout

    /// Represents a synthetic function, a function from an object, or a
    /// function from the Zcu.
    pub const Resolution = enum(u32) {
        unresolved,
        __wasm_apply_global_tls_relocs,
        __wasm_call_ctors,
        __wasm_init_memory,
        __wasm_init_tls,
        // Next, index into `object_functions`.
        // Next, index into `zcu_funcs`.
        _,

        const first_object_function = @intFromEnum(Resolution.__wasm_init_tls) + 1;

        pub const Unpacked = union(enum) {
            unresolved,
            __wasm_apply_global_tls_relocs,
            __wasm_call_ctors,
            __wasm_init_memory,
            __wasm_init_tls,
            object_function: ObjectFunctionIndex,
            zcu_func: ZcuFunc.Index,
        };

        pub fn unpack(r: Resolution, wasm: *const Wasm) Unpacked {
            return switch (r) {
                .unresolved => .unresolved,
                .__wasm_apply_global_tls_relocs => .__wasm_apply_global_tls_relocs,
                .__wasm_call_ctors => .__wasm_call_ctors,
                .__wasm_init_memory => .__wasm_init_memory,
                .__wasm_init_tls => .__wasm_init_tls,
                _ => {
                    const object_function_index = @intFromEnum(r) - first_object_function;

                    const zcu_func_index = if (object_function_index < wasm.object_functions.items.len)
                        return .{ .object_function = @enumFromInt(object_function_index) }
                    else
                        object_function_index - wasm.object_functions.items.len;

                    return .{ .zcu_func = @enumFromInt(zcu_func_index) };
                },
            };
        }

        pub fn pack(wasm: *const Wasm, unpacked: Unpacked) Resolution {
            return switch (unpacked) {
                .unresolved => .unresolved,
                .__wasm_apply_global_tls_relocs => .__wasm_apply_global_tls_relocs,
                .__wasm_call_ctors => .__wasm_call_ctors,
                .__wasm_init_memory => .__wasm_init_memory,
                .__wasm_init_tls => .__wasm_init_tls,
                .object_function => |i| @enumFromInt(first_object_function + @intFromEnum(i)),
                .zcu_func => |i| @enumFromInt(first_object_function + wasm.object_functions.items.len + @intFromEnum(i)),
            };
        }

The “Watermark Pattern”

Reverting an array hash map to a previous length to efficiently restore to an old state

    const functions_end_zcu: u32 = @intCast(wasm.functions.entries.len);
    defer wasm.functions.shrinkRetainingCapacity(functions_end_zcu);

    const globals_end_zcu: u32 = @intCast(wasm.globals.entries.len);
    defer wasm.globals.shrinkRetainingCapacity(globals_end_zcu);

    const function_exports_end_zcu: u32 = @intCast(wasm.function_exports.entries.len);
    defer wasm.function_exports.shrinkRetainingCapacity(function_exports_end_zcu);

    const hidden_function_exports_end_zcu: u32 = @intCast(wasm.hidden_function_exports.entries.len);
    defer wasm.hidden_function_exports.shrinkRetainingCapacity(hidden_function_exports_end_zcu);

    wasm.flush_buffer.clear();
    try wasm.flush_buffer.missing_exports.reinit(gpa, wasm.missing_exports.keys(), &.{});
    try wasm.flush_buffer.function_imports.reinit(gpa, wasm.function_imports.keys(), wasm.function_imports.values());
    try wasm.flush_buffer.global_imports.reinit(gpa, wasm.global_imports.keys(), wasm.global_imports.values());
    try wasm.flush_buffer.data_imports.reinit(gpa, wasm.data_imports.keys(), wasm.data_imports.values());

    return wasm.flush_buffer.finish(wasm) catch |err| switch (err) {
        error.OutOfMemory => return error.OutOfMemory,
        error.LinkFailure => return error.LinkFailure,
        else => |e| return diags.fail("failed to flush wasm: {s}", .{@errorName(e)}),
    };
8 Likes

It’s more like I had a self-imposed challenge to explain “weird” enum(u32) { _ } syntax without dragging in SoA and the rest of data oriented design (at which I failed, given that half of the post speaks about caches and what not) :stuck_out_tongue:

To add to the pile of more advanced tricks, this one comes from Sorbet (type checker for ruby, design talk):

One pain point with indexes is concurrency. If you want to populate your “GlobalState with arrays” from multiple threads, you need to make sure two threads do not race when reserving an index for an item, and that’s tricky to get correct, and tricky to get fast (it’s very easy to introduce content on some global counter).

The way Sorbet solves this is that each worker thread gets its own private copy of “GlobalState arrays” and populates it using indexes local to this thread, without any mutexes . And then there’s a merge phase that takes actually GlobalState, thread-local version of GlobalState, and merged the two together, shifting the indexes.

4 Likes

Yes. The index transformation is a manual way of doing pointer compression, basically. There are some cognitive advantages as well: thinking of the data structure as an array of memory encourages taking advantage of that fact, laying data structures out densely to take advantage of array traversal where possible. But it’s an array of memory whether that array is used well or poorly. The difference between indexing an array and doing pointer arithmetic is a mental one; in C the syntax one chooses is a matter of taste, convention if you prefer. Zig improves this situation (thankfully!) but slice[3] and (slice.ptr + 3)[0] will still give you the same value.

Memory dumps, ‘images’, used to be a common way to persist a program between invocations. They’re still in use: Emacs will, upon request, compile everything one wants loaded and then dump itself in that state, so it doesn’t have to do it all over again next time. Smalltalk used this extensively, combined with excellent tools for editing a program while it’s running. Problem is some of those programs would get to the point where it’s not actually possible to build the program starting from the source, so it’s edit the image or do without. There are a few reasons why programmers have adopted serialization which consists of more ceremony than merely dumping the image, good reasons, but it isn’t because we can’t do it. We can and sometimes we do.

It’s absolutely possible with the right operating system, and appropriate language support, to just have smaller pointers which actually act like (are) pointers. Memory is all virtual, no particular reason[1] why every process wouldn’t get one high-word which memory starts from. If we kept everything word-aligned (mostly reasonable) this could handle 32GiB with small pointers. That would be nicer to work with, there’s a cognitive cost to using array indices when what you really want is pointers. Pointers are no barrier to laying out memory intelligently, the main advantage of ‘indicizing’ this kind of data structure really is smaller integers.

But an array transform we can just do, without waiting around for a great deal of work which may or may not ever get done (or setting aside a few professional lifetimes to get it done ourselves). It’s a hack, sure. So what, it works. What else is new.

Similar thing with the enum newtype: it provides some of the benefits of a hypothetical feature of greater elegance, but considerable complexity, and we already have nonexhaustive enums. If all we would do with a unique numeric type is type check it, we don’t need two ways to do that, a little bit of @enumFromInt and vice versa isn’t going to bust the byte budget.

The indexing bugs which have eaten hours of my life very seldom stem from using something which was meant to be data as an index. Generally it’s using an index meant for one array on another by accident, and since those are the same type, newtyping the indices wouldn’t have helped. Some kind of general-purpose mechanism to annotate that kind of static code intention, and check it in a validation step, would be most welcome.


  1. Security considerations, sure. Less pressing when it comes to the heap. ↩︎

3 Likes

In WebAssembly, multiple memories are supported. For each load/store/memory-related instruction, you can also specify which memory to operate on.

But in reality, nothing supports generate those instructions yet AFAIK :slight_smile: So if you want to use it, normal pointer/slice dereferencing syntax will never work, ya have to wrap it inside inline assembly or so, and access memories via functions.

Would be cool if there is a language feature for “this index/pointer is intended for usage in this address space”, and could generate the above WebAssembly multiple memories instructions! (something like addrspace, but with a runtime known addrspace index/identifier? I don’t know!)

3 Likes

Hi, how does the builtin std pool allocator fit in here? My understanding is that indices provide configurable index size and better deserialisation, while both approaches have similar cache benefits. If so, then when would I ever use a pool allocator (e.g. the one in stdlib) vs a general purpose allocator + arraylist or other contiguous buffer strategy?

1 Like

Nice article!

I have been contemplating using non-exhaustive enums as indexes in different cases, but never got to it. Something stops me. I like simple direct raw ints too.

Always love reading your articles. I think they are the perfect size to explain the topics you approach, always leave with a better understanding on the topic at hand.

I think this could have been a great addition to the article, as most of the article only has examples of how to “get” items but there is no insert examples. Since my knowledge in Zig or Programming theory isn’t as extensive as yours I struggled to to apply this pattern in an Interval tree - Wikipedia. I feel like the caveats with the “Newtype Index” pattern are more prevalent when inserting new items over getting them.

1 Like

Welcome @DoomishGuy!

Pools are great when you need a lot of something lightweight, which isn’t durably connected, and especially when there’s a lot of recycling going on.

Trees, the kind @matklad’s article illustrates, are highly connected and tend to stick around. So paying attention to data layout pays off, because the workload does a lot of traversing the structure: locality, cache-friendliness, predictable access patterns, you get to take advantage of all of that if you make the effort to optimize those things.

Sometimes you just need somewhere to aggregate data, maybe transfer ownership, possibly sort, and the amount you need is variable, the need for it comes and goes. Say you have a message system, and they’re a little too big to pass by copy, 256 bytes for the sake of argument. So there are a lot of them flying around, some last longer than others, maybe they end up linked into a list, that kind of thing.

It’s hard to squeeze much efficiency out of array-ifying something like that. Memory pools give you some efficiency right away: they only involve the allocator when a new ‘high water mark’ is reached, your code can use a Message as soon as it’s freed, it only needs fresh memory when the pool happens to be out.

But an array is going to end up fragmented to pieces very quickly, because of the use pattern. So you’re not able to take advantage of the density, because you can’t keep it dense. Plus you do want to reuse those holes, so now you make a freelist to find them: and you have reinvented the MemoryPool from the standard library, that’s what they are.

1 Like