I definitely remember you writing a similar article for Rust! Thanks for sharing, awesome reading!
Iâve used a similar library. I can post it here.
It creates a unique enum given a type:
const Foo = struct {
const Ind = Index.Unique(Foo);
};
Index also provides a Range type that is a start + end. I would like to use a len instead of end to allow working with ring buffers.
The Range type also provides a get function that returns a index of the correct type with start as an offset.
Itâs veri useful for loop
for (0..range.len()) |i| {
handle = range.get(i) // type safe handle
}
Index.zig (4.1 KB)
Edit.
Iâve reworked the library and itâs usage in my programs to use len instead of end.
The result is much cleaner code in my opinion.
Iâm going to attach the second version also.
Index.zig (4.1 KB)
I didnât remember that! Itâs just that when I was returning to the draft, and opened newtype-index-pattern.dj file, I got very confused why all examples are suddenly in Rust, lol ![]()
minor nit: I was expecting FontWeight to be an example of a non-exhaustive enum, but the _ prong is missing.
Doesnât the OS already give us indexes? Donât we already have virtual per-process address space? And can thus serialize the entire process address space to disk? I wonder if there were a few choice extensions of the virtual address space API from the OS, new efficiencies or language features could be enabledâŚ
OS/400-style OSâs are single address space operating systems where you can persist process state
For more common OSs, I think the benefit of direct indices is compactness (e.g. 32-bit) and not thwarting ASLR
By this:
If you need to store a list of items, you can often avoid materializing the list of indexes by storing a range âpointingâ into the shared storage.
âŚdo you mean (by âmaterializingâ) merely that, rather than storing (necessarily contiguous?) individual(?) indices, you can store a {.index, .count} (ârangeâ)? Or am I loosing a little translation here? If so, this, and the n-ary tree example, seem to be suited for a situation where the data structure, in the end, is likely to remain âstaticâ, true? Is there any commentary on how suitable this is for a situation that requires lots of balancing, sort-adding nodes, etc.? Sorry if this is obvious to all but meâŚ.
And a tangent - does:
return tree.nodes[range.index..][0..range.count];
âŚhave special advantages over:
return tree.nodes[range.index..range.index+range.count];
âŚ(besides readability, perhaps)?
If you need to modify it (for example grow it), you can put the old range in a free-list of unused ranges and get or create a new one of the needed size. If you donât know the needed size upfront you can use a reasonable guess (like std.ArrayList does) and then shrinkAndFree it once you do. Possibly keeping a capacity in a parallel array, while you still need that. Once you are done resizing, that additional meta data can be thrown away.
Freed pieces can either be reused by other lists that need that size, or alternatively you can rebuild the entire thing without all the gaps, which makes it behave somewhat similar to a data-structure specific garbage collector.
I think in the case where the count is only runtime known they should be equivalent, but I still prefer the former because I find it more readable and it doesnât require mentioning .index twice.
Got it, so thereâs no magic Iâm missing. Certainly one could also implement something like a red-black tree, e.g., with indices to contiguous storage, but where the contiguity has no special advantage other than being an unfragmented (though disordered) store that could indeed be memcpyâd or efficiently block-written to file, (etc?) The other advantages of this pattern, especially the enum index, could still be useful, potentially using symbolic labels beyond ârootâ and âinvalidâ.
you havenât quite reached peak Newtype Index Pattern power level yet⌠![]()
check out some of these patterns (from wasm linker code):
The âUnpack Patternâ
Packed in-memory layout vs convenient union(enum) layout
/// Represents a synthetic function, a function from an object, or a
/// function from the Zcu.
pub const Resolution = enum(u32) {
unresolved,
__wasm_apply_global_tls_relocs,
__wasm_call_ctors,
__wasm_init_memory,
__wasm_init_tls,
// Next, index into `object_functions`.
// Next, index into `zcu_funcs`.
_,
const first_object_function = @intFromEnum(Resolution.__wasm_init_tls) + 1;
pub const Unpacked = union(enum) {
unresolved,
__wasm_apply_global_tls_relocs,
__wasm_call_ctors,
__wasm_init_memory,
__wasm_init_tls,
object_function: ObjectFunctionIndex,
zcu_func: ZcuFunc.Index,
};
pub fn unpack(r: Resolution, wasm: *const Wasm) Unpacked {
return switch (r) {
.unresolved => .unresolved,
.__wasm_apply_global_tls_relocs => .__wasm_apply_global_tls_relocs,
.__wasm_call_ctors => .__wasm_call_ctors,
.__wasm_init_memory => .__wasm_init_memory,
.__wasm_init_tls => .__wasm_init_tls,
_ => {
const object_function_index = @intFromEnum(r) - first_object_function;
const zcu_func_index = if (object_function_index < wasm.object_functions.items.len)
return .{ .object_function = @enumFromInt(object_function_index) }
else
object_function_index - wasm.object_functions.items.len;
return .{ .zcu_func = @enumFromInt(zcu_func_index) };
},
};
}
pub fn pack(wasm: *const Wasm, unpacked: Unpacked) Resolution {
return switch (unpacked) {
.unresolved => .unresolved,
.__wasm_apply_global_tls_relocs => .__wasm_apply_global_tls_relocs,
.__wasm_call_ctors => .__wasm_call_ctors,
.__wasm_init_memory => .__wasm_init_memory,
.__wasm_init_tls => .__wasm_init_tls,
.object_function => |i| @enumFromInt(first_object_function + @intFromEnum(i)),
.zcu_func => |i| @enumFromInt(first_object_function + wasm.object_functions.items.len + @intFromEnum(i)),
};
}
The âWatermark Patternâ
Reverting an array hash map to a previous length to efficiently restore to an old state
const functions_end_zcu: u32 = @intCast(wasm.functions.entries.len);
defer wasm.functions.shrinkRetainingCapacity(functions_end_zcu);
const globals_end_zcu: u32 = @intCast(wasm.globals.entries.len);
defer wasm.globals.shrinkRetainingCapacity(globals_end_zcu);
const function_exports_end_zcu: u32 = @intCast(wasm.function_exports.entries.len);
defer wasm.function_exports.shrinkRetainingCapacity(function_exports_end_zcu);
const hidden_function_exports_end_zcu: u32 = @intCast(wasm.hidden_function_exports.entries.len);
defer wasm.hidden_function_exports.shrinkRetainingCapacity(hidden_function_exports_end_zcu);
wasm.flush_buffer.clear();
try wasm.flush_buffer.missing_exports.reinit(gpa, wasm.missing_exports.keys(), &.{});
try wasm.flush_buffer.function_imports.reinit(gpa, wasm.function_imports.keys(), wasm.function_imports.values());
try wasm.flush_buffer.global_imports.reinit(gpa, wasm.global_imports.keys(), wasm.global_imports.values());
try wasm.flush_buffer.data_imports.reinit(gpa, wasm.data_imports.keys(), wasm.data_imports.values());
return wasm.flush_buffer.finish(wasm) catch |err| switch (err) {
error.OutOfMemory => return error.OutOfMemory,
error.LinkFailure => return error.LinkFailure,
else => |e| return diags.fail("failed to flush wasm: {s}", .{@errorName(e)}),
};
Itâs more like I had a self-imposed challenge to explain âweirdâ enum(u32) { _ } syntax without dragging in SoA and the rest of data oriented design (at which I failed, given that half of the post speaks about caches and what not) ![]()
To add to the pile of more advanced tricks, this one comes from Sorbet (type checker for ruby, design talk):
One pain point with indexes is concurrency. If you want to populate your âGlobalState with arraysâ from multiple threads, you need to make sure two threads do not race when reserving an index for an item, and thatâs tricky to get correct, and tricky to get fast (itâs very easy to introduce content on some global counter).
The way Sorbet solves this is that each worker thread gets its own private copy of âGlobalState arraysâ and populates it using indexes local to this thread, without any mutexes . And then thereâs a merge phase that takes actually GlobalState, thread-local version of GlobalState, and merged the two together, shifting the indexes.
Yes. The index transformation is a manual way of doing pointer compression, basically. There are some cognitive advantages as well: thinking of the data structure as an array of memory encourages taking advantage of that fact, laying data structures out densely to take advantage of array traversal where possible. But itâs an array of memory whether that array is used well or poorly. The difference between indexing an array and doing pointer arithmetic is a mental one; in C the syntax one chooses is a matter of taste, convention if you prefer. Zig improves this situation (thankfully!) but slice[3] and (slice.ptr + 3)[0] will still give you the same value.
Memory dumps, âimagesâ, used to be a common way to persist a program between invocations. Theyâre still in use: Emacs will, upon request, compile everything one wants loaded and then dump itself in that state, so it doesnât have to do it all over again next time. Smalltalk used this extensively, combined with excellent tools for editing a program while itâs running. Problem is some of those programs would get to the point where itâs not actually possible to build the program starting from the source, so itâs edit the image or do without. There are a few reasons why programmers have adopted serialization which consists of more ceremony than merely dumping the image, good reasons, but it isnât because we canât do it. We can and sometimes we do.
Itâs absolutely possible with the right operating system, and appropriate language support, to just have smaller pointers which actually act like (are) pointers. Memory is all virtual, no particular reason[1] why every process wouldnât get one high-word which memory starts from. If we kept everything word-aligned (mostly reasonable) this could handle 32GiB with small pointers. That would be nicer to work with, thereâs a cognitive cost to using array indices when what you really want is pointers. Pointers are no barrier to laying out memory intelligently, the main advantage of âindicizingâ this kind of data structure really is smaller integers.
But an array transform we can just do, without waiting around for a great deal of work which may or may not ever get done (or setting aside a few professional lifetimes to get it done ourselves). Itâs a hack, sure. So what, it works. What else is new.
Similar thing with the enum newtype: it provides some of the benefits of a hypothetical feature of greater elegance, but considerable complexity, and we already have nonexhaustive enums. If all we would do with a unique numeric type is type check it, we donât need two ways to do that, a little bit of @enumFromInt and vice versa isnât going to bust the byte budget.
The indexing bugs which have eaten hours of my life very seldom stem from using something which was meant to be data as an index. Generally itâs using an index meant for one array on another by accident, and since those are the same type, newtyping the indices wouldnât have helped. Some kind of general-purpose mechanism to annotate that kind of static code intention, and check it in a validation step, would be most welcome.
Security considerations, sure. Less pressing when it comes to the heap. âŠď¸
In WebAssembly, multiple memories are supported. For each load/store/memory-related instruction, you can also specify which memory to operate on.
But in reality, nothing supports generate those instructions yet AFAIK
So if you want to use it, normal pointer/slice dereferencing syntax will never work, ya have to wrap it inside inline assembly or so, and access memories via functions.
Would be cool if there is a language feature for âthis index/pointer is intended for usage in this address spaceâ, and could generate the above WebAssembly multiple memories instructions! (something like addrspace, but with a runtime known addrspace index/identifier? I donât know!)
Hi, how does the builtin std pool allocator fit in here? My understanding is that indices provide configurable index size and better deserialisation, while both approaches have similar cache benefits. If so, then when would I ever use a pool allocator (e.g. the one in stdlib) vs a general purpose allocator + arraylist or other contiguous buffer strategy?
Nice article!
I have been contemplating using non-exhaustive enums as indexes in different cases, but never got to it. Something stops me. I like simple direct raw ints too.
Always love reading your articles. I think they are the perfect size to explain the topics you approach, always leave with a better understanding on the topic at hand.
I think this could have been a great addition to the article, as most of the article only has examples of how to âgetâ items but there is no insert examples. Since my knowledge in Zig or Programming theory isnât as extensive as yours I struggled to to apply this pattern in an Interval tree - Wikipedia. I feel like the caveats with the âNewtype Indexâ pattern are more prevalent when inserting new items over getting them.
Welcome @DoomishGuy!
Pools are great when you need a lot of something lightweight, which isnât durably connected, and especially when thereâs a lot of recycling going on.
Trees, the kind @matkladâs article illustrates, are highly connected and tend to stick around. So paying attention to data layout pays off, because the workload does a lot of traversing the structure: locality, cache-friendliness, predictable access patterns, you get to take advantage of all of that if you make the effort to optimize those things.
Sometimes you just need somewhere to aggregate data, maybe transfer ownership, possibly sort, and the amount you need is variable, the need for it comes and goes. Say you have a message system, and theyâre a little too big to pass by copy, 256 bytes for the sake of argument. So there are a lot of them flying around, some last longer than others, maybe they end up linked into a list, that kind of thing.
Itâs hard to squeeze much efficiency out of array-ifying something like that. Memory pools give you some efficiency right away: they only involve the allocator when a new âhigh water markâ is reached, your code can use a Message as soon as itâs freed, it only needs fresh memory when the pool happens to be out.
But an array is going to end up fragmented to pieces very quickly, because of the use pattern. So youâre not able to take advantage of the density, because you canât keep it dense. Plus you do want to reuse those holes, so now you make a freelist to find them: and you have reinvented the MemoryPool from the standard library, thatâs what they are.