Video: Programming without pointers

chrboesch · March 3, 2025, 10:15pm

Video from Andrew Kelley on Hytradboi 2025:

lunacookies · March 4, 2025, 12:38pm

This video came at the perfect time for me! Just a few days before I saw it I was trying to apply the same techniques to the data model of a music player I’m working on. I decided to finally give Richard Fabian’s Data-Oriented Design a read, which made me realize that the “Programming Without Pointers” methodology is very reminiscent of first normal form.

I’m a little confused on one point. Take the disease locations example from the video – how would one go about dynamically adding and removing locations from an already-existing disease? My instinct after watching the video was to supplement Location.Slice with a capacity field, turning the locations array into an ad-hoc allocator. This means you end up with a bunch of “dead” subarrays within locations (which you could clean up with a periodic compaction step), unless you add infrastructure to track free regions, at which point it’s really starting to look like an allocator!

Alternatively, following 1NF I think you’d replace the Location.Slice approach with a top-level std.ArrayList(struct { diseaseId: u32, location: Location }) field. Unfortunately, finding all the locations associated with a given disease requires a search through the entire array.

How do you solve this?

Would anyone have any further resources on programming in the style of the relational model? The DOD book is a little light on practical examples, at least so far (I just finished chapter 7).

Sze · March 4, 2025, 2:05pm

I think sometimes remove just isn’t needed, in those cases this organization makes it super easy.

If you need remove, I think there are a bunch of different options, that aren’t necessarily way harder.

One is to rewrite all the data without holes to a new instance, that is basically a copy (rebuilding the data structure without what should be removed), removing the old one. (this is basically like data-structure-specific garbage-collection without having to keep track of things in a complicated general way)

For the diseases example you also can swap-remove a disease (if order / index stability isn’t needed) and when adding a new one, you could use the ArrayList in a way, where the unused capacity slice contains valid off+len slices, so basically your freelist becomes the unused capacity slice. (And the ones that don’t correspond to removed locations would have len 0)

Because the diseases array indexes into the locations array and the former is likely indexed externally while the latter probably is not, you also have the full information available and ability to re-organize internally, you basically could write an in-place compaction method that moves locations forward to fill in-gaps created by removed diseases and their locations.

If you need index stability for the diseases, you could mark removed diseases with a zero length and add a new free-list entry, basically leaving the disease value there but marked as a tombstone, you then could re-use the offset and use it to index to the next tombstone. That way you just need a sentinel value that means end of list (for example maxInt(Index)) and a first_tombstone index to have a list of indices that can be used for newly added disease entries.

For removing you can remove them and compact the locations in place, change the len and create a new disease that describes the unused memory and add it to the unused capacity slice that is used as free-list.
For growing the locations you can go through the free-list (which you could sort periodically), find one that is big enough, copy the data over, add the new data and swap the indicies (of the disease and the disease in the free-list), potentially doing the removing part for the remainder that isn’t used (creating a free-list entry for it).

I agree that it has some similarity to allocators, or more specifically it is like a memory pool for variable sized slices and it allows you to relocate the slices, because you know where the metadata is stored and you can distinguish between data that can be identified by an id externally and data that is “internally” managed (basically data that can be moved around, because the data-structure holds all the valid indices into the locations array).

I think another approach that can be useful (especially if there are some limited number of expected slice sizes) is to instead store every size of slice as their own group, which turns the index into a handle that can be used to find the right size-group and than the index in that group.

I used something similar to that in my HAMT and StencilArrays implementation, I think for data structures that have nested tree like structure, it might help with making it easier to replace and reuse nodes, because all nodes that share a size are part of the same memory block, which makes it easier to reuse memory (without copying around other memory for compaction purposes).

I think it could cut down on the amount of wasted memory / needed reorganization, if you have a lot of adding and removing.
But I haven’t investigated that in detail and it would vary based on how the data-structure is used.

Zonion · March 4, 2025, 2:58pm

Also, any idea why he uses an non-exhaustive enum(32) to represent String instead of a u32? Is it just to define associated constants (Table, TableContext, …)? It’s around 5min in the video.

TUSF · March 4, 2025, 3:41pm

To help prevent user error.

Because in Zig, every declaration of an enum type is a distinct type, while all u32s is a u32.

Using an enum(u32) means that you can rely on the type system to stop you from passing a handle into the wrong function, because it’ll give you a compiler error that it doesn’t accept the type you gave it.

Zonion · March 4, 2025, 3:58pm

Ok makes sense, thanks!

floooh · March 4, 2025, 4:09pm

I was nodding all along the video in agreement. A couple of years ago after recovering from a decades long ‘Clean Code’ mind virus infestation, I was tinkering with putting all shared application state into a single nested struct, eliminating most dynamic allocation, and ‘mutable shared globals considered harmful’ be damned.

It works surprisingly well despite the obvious downsides (all code has write-access to all data).

…one example is here in my 8-bit emulators: an entire emulator state is a single struct without pointers, and this struct can be ‘snapshotted’ (via menu => System => Save Snapshot: Tiny Emulators)

Of course since then I keep thinking about a language which would assign filesystem-like access rights to such a globally shared, nested structure (because that struct is pretty much like a file system, e.g. you have ‘filesystem locations’ like app.gfx.buffers vs app.physics.bodies).

…e.g. there needs to be a way to grant specific code locations (which is also organized in a hiearchy of modules with functions as leaf-nodes) specific access rights into the data tree (e.g. the module gfx has full read/write access to app.gfx.*, a handful other modules have readonly access, but all other code doesn’t have read nor write access…

…and so on…

slonik-az · March 4, 2025, 6:25pm

Revenge of Fortran

Zonion · March 14, 2025, 10:23am

I though again about it when I saw this PR. It seems to be a recurrent pattern but I’m not sure to fully understand the benefits.

I get that instead of raw u32 you get type check because each enum is unique. But that means that it requires to use @enumFromInt and @intFromEnum each time you want to use it to access an array?

vulpesx · March 14, 2025, 12:24pm

I’ve seen multiple people seem to think avoiding built-ins makes code better, but that is highly dependent on what built-in is being used. @constCast is highly suspect code but it is sometimes needed, thats why it exists, just try a better way first :3
However casting between an enum and an integer is very, very normal.

Perhaps it’s because it’s verbose, but well, that’s just zig. Perhaps one day zig will let you index with enums directly, or perhaps it won’t.

Zonion · March 14, 2025, 1:10pm

In fact I wasn’t saying this because I want to avoid builtins, I use them quite often. It’s more about the potential overhead of adding a builtin call. I don’t know what code is generated when compiling builtins but I guess it adds at least one instruction?

I was more wondering about: is preventing user mistake by using an enum worth the extra operations (builtin calls) over telling the purpose of a parameter with aliasing like: const NodeIndex = u32?

Edit:
I checked this simple example and I’m more confused than before aha. I’m not the most familiar with Godbolt and assembly but the enum version seems to generate less instructions: godbolt.

I really don’t know how are handle builtins under the hood.

T0bee · March 14, 2025, 3:01pm

@enumFromInt and @intFromEnum are no-ops. const Index = enum(usize) { _ }; and const Index = usize compile to the same code.

Zonion · March 14, 2025, 3:09pm

I don’t really know assembly but in your example the produced code isn’t exactly the same but maybe they are equivalent.

TUSF · March 14, 2025, 4:32pm

What I do is this:

pub const Handle = enum(u32) {
    _,
    fn toIndex(handle: Handle) usize {
        return @as(usize, @intFromEnum(handle));
    }
    fn fromIndex(idx: usize) Handle {
        return @enumFromInt(idx);
    }
};

Mostly to make clear the intent. The builtins shouldn’t really do anything complicated aside from reinterpreting the number as a different .

Edit, might be a good idea to @truncate the index in the second method?

Sze · March 14, 2025, 6:56pm

I don’t think @truncate is a good idea, if some code creates an index bigger than u32 with truncate that would silently wrap back to some valid but wrong index, it is better if you instead get a safety check error in safe modes.

I would keep the index u32 instead of usize, that way other code can convert to u32 index and back without needing to worry, if the index gets coerced/widened to usize somewhere, then that code needs to check/make sure the index is still valid, before going back to a handle anyway.

andrewrk · March 14, 2025, 8:01pm

They are the same. One of the functions is tail-calling the other one.

vulpesx · March 14, 2025, 11:09pm

its particularly funny that raw integer function calls the enum function

mnemnion · March 14, 2025, 11:49pm

You can do that right now with an EnumArray, they’re nice I use them a lot. Doesn’t have to be a monotonic dense enum either, the type will comptime-construct a mapping for sparse values.

vulpesx · March 15, 2025, 12:08am

I am aware, I was talking about the use of the builtins, which EnumArray does use. I probably should have mentioned it, though.

chrboesch · March 15, 2025, 5:27pm

Do you have an example?