Why are allocator interfaces and implementations split across `std.mem` / `std.heap`, while IO is colocated under `std.Io`?

The allocator interface lives at std.mem.Allocator, while allocator implementations live under std.heap, for example std.heap.ArenaAllocator. IO, on the other hand, appears more colocated under std.Io, for example std.Io.Threaded.

This means that using allocators requires hopping between std.mem and std.heap, while IO looks like it has a single entry point. From a discoverability and memorability perspective, this asymmetry stands out to me.

Current Zig shape:

const std = @import("std");
const Io = std.Io;
const Allocator = std.mem.Allocator;

pub fn main() void {
    var alloc_arena: std.heap.ArenaAllocator = .init(std.heap.page_allocator);
    defer alloc_arena.deinit();
    const alloc = alloc_arena.allocator();

    var io_threaded: std.Io.Threaded = .init(alloc);
    defer io_threaded.deinit();
    const io = io_threaded.io();

    _ = io;
}

Hypothetical shape:

const std = @import("std");
const Io = std.Io;
const Allocator = std.Allocator;

pub fn main() void {
    var alloc_arena: std.Allocator.ArenaAllocator = .init(std.heap.page_allocator);
    defer alloc_arena.deinit();
    const alloc = alloc_arena.allocator();

    var io_threaded: std.Io.Threaded = .init(alloc);
    defer io_threaded.deinit();
    const io = io_threaded.io();

    _ = io;
}

My question is mainly about the design rationale here. Is the difference with std.Io intentional, for example due to layering or taxonomy, or is it largely historical? Were alternative layouts considered or rejected for specific reasons such as naming, API stability, or module size?

I personally find std.heap.page_allocator under std.heap reasonable, since it is a low-level primitive. My question is mainly about the interface/implementation grouping and discoverability.

Related: ā€œjuicy mainā€ was proposed as a way to reduce setup friction for newcomers, which is partly why I am asking about the structure here:
https://github.com/ziglang/zig/issues/24510

9 Likes

My guess is that it was historical. Moving std.mem.Allocator, or std.heap would be a pretty big change. There is an issue open about needing to audit the std lib before reaching 1.0, and I imagine that questions like this could be considered then.

16 Likes

I also have found std.mem.Allocator and std.heap.* to be an odd distinction, and I think std.Allocator and std.Allocator.* is a great move.

3 Likes

Agree. Allocators are usually given as a killer feature of Zig. They deserve prime API real estate.

1 Like

I’d be a bit more in favor of std.alloc. It’s an abbreviation, but so is std.mem.

Although the standard library is moving more toward ā€œfat typesā€, Io being the prime example, by which I mean: containers which can be instantiated, but also have a whole namespace of other goodies, I’m not convinced this is the best way to do it.

When you look at the docs for std, there’s a bunch of types at the top, and a bunch of namespaces below it. Some of the types are just types: if you click on ArrayList, that’s all you’re going to find, a function called ArrayList which does what you think.

Click on Thread, though, that being the longstanding example, and there’s lots of stuff in it: Mutex, Condition, and so on. Granted that they’re all topical, there’s not much point in having Mutex if your code doesn’t use Thread.

But the distinction I’m referring to is not visible. Which types are fat and which are skinny is guesswork. That makes std less discoverable, I wrestled with this a lot when I was learning the language.

I would go so far as to have std expose no capital-letter types at all, in fact. Pure lowercase namespace, so std.thread.Thread, std.thread.Mutex, std.alloc.Allocator, std.io.Io, and so on.

Instantiable types within instantiable types is fine, those should be return values or otherwise components of the type which they inhabit. So std.io.Io.VTable, not std.io.VTable, for instance. But back to std.io.File.

ā€œNone at allā€ is going too far, actually. std.option.Option would be a waste of a level. But so many things come in two flavors due to alignment, or four due to auto: I count fourteen types which could all go in std.map, and I think that would be nicer, especially for beginners. Could be fifteen, I’d put EnumMap in std.enum though. Definitely think everything should have one home in the standard library, it could be put in two places but… let’s not do that.

Now, this is a big change, and I have quite a bit of code which would need to be extensively modified in bouts of pure busywork. Many of us do, I nonetheless believe that it would make std much easier to learn, and that would be worthwhile.

10 Likes

If any re-distribution is going to happen prior to 1.0, it makes some sense that a full consideration is made and that everybody can look back years down the road and say, ā€œwell, it may not be perfect, but we treated it with respect and got it as close as we could with what we had prior to 1.0ā€. I’m a fan of the effort, and my vote is very similar:

  1. Thoughtful case-by-case determination of ā€œfatnessā€. I’m generally not a fan of foo.bar.Bar, but if the container ā€œbarā€ really meaningfully has many members, and one is quite meaningfully Bar, then fine. Otherwise, I honestly prefer std.Io over std.io.Io except for the fact that std.io.Condition or even std.io.Dir seems more natural than std.Io.same, because it feels like the (capitalized) ā€œIoā€ is then overextended, conceptually… unless that was simply the tradition, and consistent. But std.alloc.Allocator - well, this seems more ok, even though the Allocator interface is so central there, conceptually. So, ā€œthoughtful case-by-caseā€ feels necessary, and maybe that can distill to a uniform approach that feels pleasing to the masses. I feel myself gently leaning toward the fat-typing, even though std.Io.Condition feels a little weird; std.Allocator.Check or std.Allocator.*Config don’t feel any weirder, honestly… if I detach myself from the current heritage.
  2. NO duplicates/aliasing (EnumMap belongs either in std.enum or std.map, and too bad if it’s not quite where you expected - documentation could, of course, be responsible for redirects).
  3. Agreed, if std.io, then definitely (I’d vote for) std.io.Io.VTable, not std.io.VTable OR, possibly preferrably, std.Io.VTable, if fat-types win.

Why does my opinion matter? I honestly don’t carry much weight except, perhaps, as a newcomer, who therefore brings an anticipation of what organization might look like if heritage and implementation details were less visible. I’ve also been known to refactor my own organizational structures, often years down the road, for no good great reason other than aesthetic. There is something very inviting about thoughtful organization, I’m guessing most would agree. And I don’t think zig std is especially disorganized, even if it’s naturally mature enough to want to find a more satisfying organization.

1 Like

I think abbreviations are fine.

The main thing to consider with abbreviations is first-time discoverability. In Zig, first contact usually happens through the LSP, and for some users through the documentation. Those are the two primary places where people learn what something is.

If an abbreviation is used, it is important that the doc comment spells out the full name clearly. Abbreviations are excellent for long-term users because they compact code and reduce visual noise, but they must be consistent across std and remain within a high standard of readability.

Using common and correct English abbreviations also helps readability. Made-up or inconsistent abbreviations increase friction rather than reduce it.

A rough LSP-style example of what I mean:

std.alloc

/// Allocator (alloc)
/// Collection of allocator implementations and related utilities.

This thread is also relevant, since single-letter naming is effectively a form of abbreviation and raises similar discoverability concerns:
https://ziggit.dev/t/zig-api-naming-single-letter-generics-with-no-explanation/13539

1 Like

I agree, I also think it’s a pretty objective concept.

std.Thread is fat, because we commonly create a Mutex without going through a Thread. Thread instances don’t even have a function which returns Mutex, they’re conceptually related, that’s all.

Build, on the other hand, not fat, even though there are dozens of nested types involved. The types in that collection are either return values of b: *std.Build, or arguments to methods on b. There’s no reason to use, IDK, a Build.ExecutableOptions without a build instance around.

A case could be made for flattening it anyway, but I wouldn’t. Someone trying to figure out how build.zig works is almost certainly starting from a template, so it says b: *std.Build right there in the function signature, b: *std.build.Build wouldn’t get us much.

The idea would be to have only a few capitalized types hanging off of std, so the nature of the ones which remain is less important, since they’re all bespoke in one way or another.

Strikes me as a good precept.

Also, with LSPs in play, someone trying to find allocator might try to type std.allocator, but somewhere between std.al| and std.all| they’d see alloc as a completion. I agree that favoring some experience this way is usually a win, we have fs, os, fmt, and, for that matter, it’s std not standard.

I’m glad it’s not standard.InputOutput, and standard.input_output.InputOutput? Please no!

I agree and am fully in favor of this. It’s how I’ve written all of my libraries as well. However, the take I’ve seen from Andrew is that he prefers to use namespaces for disambiguation rather than purely for categorization.

It’s a topic that I’m sure will be revisited as the core team looks to solidify the std lib.

Any organization and naming scheme must not presuppose the presence of an LSP. Many veteran Zig users, including some Zig core members, don’t use an LSP. Instead of short names, I think a consistent naming scheme is vastly more important. I frequently grep and regexp search, and can tell the authors of Zig stdlib/compiler have had just that in mind. I alias to short-names where I think it makes sense.

3 Likes

I also dislike adding an extra container level in std, such as std.alloc.Allocator. I would much rather have something like std.Alloc directly.

  1. Extra container layers have real costs. They increase digging depth and line length: you have to write more to reach the same data structure, and long lines are often long simply because container names are repeated. This makes code harder to skim.

  2. This breaks a core Zig style rule: things that are types, or that return types, should be written in CapitalCamelCase [1]. Lowercase containers that primarily act as type holders blur the visual distinction between namespaces and types, which Zig normally keeps very sharp.

  3. This harms skimming and source-code reading by adding cognitive load for little gain. More structure has to be remembered in order to write or read the same concept that a closer-to-the-surface container would already achieve.

I am not in favor of massive fat structures either. Up to this point I have preferred some of the newer, fatter types, such as std.Io, but as I noted in my original post, moving everything into something like std.Alloc does not seem like a good idea [2]. My preference would be that std.Alloc contains user-facing allocation types and utilities, while std.Heap continues to contain the raw, platform-level heap allocators.

I agree with Andrew Kelley’s point that namespaces should exist primarily for disambiguation rather than purely for categorization [3]. In that light, types like HashMap and ArrayList are already distinct enough to stand on their own, and keeping std flatter improves readability and discoverability.

[1]: https://ziglang.org/documentation/master/#Style-Guide
[2]: https://ziggit.dev/t/why-are-allocator-interfaces-and-implementations-split-across-std-mem-std-heap-while-io-is-colocated-under-std-io/13528?u=nyx-xyn
[3]: https://ziggit.dev/t/why-are-allocator-interfaces-and-implementations-split-across-std-mem-std-heap-while-io-is-colocated-under-std-io/13528/9?u=nyx-xyn

Yes. This is in line with what I meant. An LSP hover ultimately surfaces information that already exists in the source; a grep/jump-to-definition workflow reaches the same comments and declarations directly, just in a different order.

In a grep-based workflow you typically end up at the definition site in std, where the doc comment is already present:

/// Allocator (alloc)
/// Collection of allocator implementations and related utilities.
const alloc = struct {
    ...
};

The LSP may show this without opening the file, but the underlying data is the same: source-level documentation. So abbreviations are not ā€œLSP-dependentā€; they are source-quality dependent.

On consistency, I agree: consistent naming and abbreviations across std matter a lot. For LSP and docs, consistency of the public-facing API surface (names you type, signatures you see) is critical. For grep/jump workflows, consistency inside the source also matters, because the source is what you are actually scanning and navigating.

I think an important point here is that searching/grep’ing frequently are of the form fn <rexp> which means the doc comment isn’t included at all, often exactly because you don’t want to hit doc comments, which may include references to other functions (not only to find things, but to do mass operations in a good editor). This is the reason I like really descriptive, logical and consistent function naming, even if long. Otherwise we’re in line it seems.

Agreed, and perhaps it’s not mere ā€œimportanceā€ (it’s all pretty important stuff!), but some more elusive term/metric. It makes complete sense that some capitalized types hang off of std; Io seems a pretty good candidate to me, and, by mental association, therefore so does Alloc (or Allocator). and Thread, and so on. It’s interesting that so many data structures are currently top-level (Array*, etc.)… but, since they’re used so ubiquitously, and since the motivation has been disambiguation rather than categorization, there would be understandable resistance to namespacing them.

Both seem valuable, and related… but perhaps the OP’s title itself indicates that what we’re looking at is exactly that: disambiguating between std.mem std.heap, etc? There was clearly a move from std.fs (and etc.) to std.Io - I think that’s in the same vein, and arguments like this motivate best results, even if they’re not perfect results.

+1. This does feel like a valuable distinction. I could see std.heap (namespace) making sense, whereas std.Alloc (fat type) seems to make more, esp. IF that was actually the Alloc(ator interface) type. But perhaps that ā€œworksā€ more for Io with the vtable design, and would be a force for Alloc <cringing at the notion of voting for big breaking changes, even if they’re pretty grep-fixable>. I appreciate that Alloc is NOT a candidate for the same vtable design that Io has; as mentioned earlier, if a fat, capitalized Foo is really just a namespace masquerading in type’s clothing, well then… (especially if it has no disambiguating purpose, but is merely organizational/categorizational ornamentation). Hmnn.

Oh crime! Never. This helps illustrate the notion that things are rather good, broadly speaking, already, both in terms of names/abbreviations and organization. It may be that a couple of little ideas will go a long way in helping all to find the trees in the forest and keep zig fun and frustration-free.

From the source you quote:

File names fall into two categories: types and namespaces. If the file (implicitly a struct) has top level fields, it should be named like any other struct with fields using TitleCase. Otherwise, it should use snake_case.

All files are types. There’s a distinction between instantiable[1] types, and namespace types. The latter get snek_cased.

What I’m getting at is that instantiable types which do double-duty as namespaces, but only some of them, with no indication which, is not a good way to organize the standard library. It’s fine to disagree with that of course, the argument I’m making is that subtypes which live inside an instantiable namespace should be dependent on the parent type, which is not the case with std.Thread, and is in fact inverted with the new std.Io.

Inverted is better, rather than worse, in my opinion. But I’d rather have the occasional duplicated ā€˜stutter’ like std.io.Io then remember a list of fat types I need to check when I’m trying to find things.

No one wants Std.Heap (they would both be capitalized the way you’re interpreting it, since they’re both types), because there’s no reason to do this:

const useless: Std.Heap = .{};

Basically there would be no distinction left, and we may as well capitalize every single Zig file. They’re all types.

Should we, then, dump all of crypto straight into std? What about posix? math? These aren’t disambiguations, they’re categories. Useful ones, the way I see it.


  1. One may in fact instantiate any type, but if it has no fields there is little point in doing so, and without member functions, none. ā†©ļøŽ

2 Likes

That seems like a really good important point.

Likewise, std.math, etc. - one could say that math.pow() disambiguates from thanos.pow(), but it’s pretty ā€œplainā€ to interpret math as a category, and it makes sense. Perhaps this could devolve into semantics, but leaning convention toward disambiguation, away from ā€œmereā€ endless (or contrived) categorization-frenzy, does seem valuable as well. That seems clearer to me now.

I misremembered the style guide. My point was about consistency, not about advocating any particular naming or casing rule.

This is not how I read the current std layout, nor what I’m advocating.

What I read std’s current form as, and what I think Andrew is advocating, is a mixed design, where commonly used, free-standing data types such as Io, Alloc, and Random have their types at the std root, while bigger bundled types and functions are placed inside broader named types. This is where things like math, crypto, and so on fall in.

The other design, as I understand it, that is proposed in this thread is that everything is sorted into broader named types. So you end up with std.io.Io, std.alloc.Alloc, while things stay the same for std.math., std.crypto., etc. In this model, no common types exist at the root. This way of doing it also has proes:

  1. It makes std more consistent, as there exists only one way things are organized.
  2. It is easier if you want to add more types inside a category, even after std stabilization (Zig 1.0). You could add a totally new Io, for example, as std.io.IoNew. This is a real pro in the ArrayList, HashMap category you now have.
  3. The std root becomes more clean and less noisy, as bundled categories are now the only thing in it.

I am not opposed to the second one as a std lib design. I think both models are fine. But as the std lib stands today, by what seems to be Andrew’s choice, I lean to the first variant because it reduces churn in std design.

In terms of the original question, these higher-level design choices don’t change the core friction: Alloc does not behave the same as Io (or Random) in terms of layout and discoverability. Because their layouts differ, assumptions of symmetry break, and users end up having to consult documentation, examples, or source code to bridge that gap.

3 Likes

Very well put. #2 is a truth with some real weight to it, I’m reluctant to admit, but it’s easier to see that in hindsight. The same may be true about subsequent levels of namespacing, but perhaps discoverability doesn’t mind taking a hit when you get further away from the ā€œessentialsā€.

I can say that I’m frequently happy to throw std.Random straight into the middle of code, but I’d scowl if I had to make it std.random.Random, and then I’d sure just go up and make a const Random = std.random.Random to keep from having to scowl. But I’d have no trouble regularly heading files with const Esoteric = std.foo.bar.esoteric.Esoteric because of course I’d rarely want to inline that in the middle of a function, especially if there were multiple references.

I also agree that the original post about alloc shouldn’t get lost in restructuring conversation - I certainly hope for ā€œalloc reallocationā€ - it constantly bugs me that FixedBufferAllocator is in std.heap. :slight_smile:

2 Likes

The rules for Zig module organization are the following, with respect to the fully qualified namespace:

  1. Eliminate ambiguity.
  2. Eliminate redundancy.
  3. Avoid English grammar.
  4. Avoid categories.

Examples of avoiding English grammar:

  • ā€œNotSupportedā€ vs ā€œUnsupportedā€
  • ā€œUsingThreadsā€ vs ā€œThreadedā€
  • ā€œStaticStringMapWithEqlā€ vs ā€œStaticStringMapEqlā€
  • ā€œSinglyLinkedListā€/ā€œDoublyLinkedListā€ vs ā€œLinkedListSingleā€/ā€œLinkedListDoubleā€
    • Most idiomatic: linked_list.Single and linked_list.Double

Examples of overcategorization:

  • std.enums - 100% of the things in there also have ā€œEnumā€ in the name.
  • std.heap - the name doesn’t even make sense for what it contains
  • std.fs - redundant with File or Dir

Counter example of good namespaces:

  • std.zig - communicates everything in this namespace is unstable / not protected by Zig semver
  • std.testing - communicates that stuff in here should only be used for unit testing
  • std.mem - not actually a category - it disambiguates e.g. mem.copy vs File.copy

When in doubt, don’t categorize.

18 Likes

This is very helpful. Where does it live?

The bullet examples suggest something more specific than ā€œgrammarā€ - like ā€œavoid verbs, conjunctions, and indefinite adjectivesā€ (ā€œnotā€/ā€noā€/ā€someā€ would be avoided, modifying the noun (as in ā€œUnsupportedā€ or ā€œIndirectā€) would be preferred… right?). Implicit here seems to be the preference for nouns. Huge +1, but also pretty natural.

This and some other comments suggest that you’re in agreement with some of the vibe of this thread, and that some reorganization along these lines is to be expected? For me, that question has perhaps remained ā€œunaskedā€, but implied… and your input is very helpful and encouraging.

2 Likes