0.16.0 bikeshed: io before alloc, alloc before io, not an issue?

See topic :wink:

From looking at the upgrade notes with regards to I/O implementations it definitely seems that the long-term vision is to position them similarly to allocators in that library authors should ensure that a downstream can choose their implementation and pass it in.

As such, it would seem to follow that a pattern will likely emerge with regards where to place it in the argument list - has there been any consensus on that yet, however premature this may be? I was almost assuming after allocators, but now I’m not too sure.

4 Likes

I like alloc first. It’s alphabetical, temporally primary, and in many ways more fundamental. That’s arguable, I guess, but I mean by it: many things (algorithms, manipulations, communications, …) require memory management, but fewer things (communications, …) require i/o.

3 Likes

I just did alloc first and io after for the simple reason that alloc came first lol. It just so happens that the places I’ve seen io be used in the stdlib also places it after the alloc argument. Although ultimately it doesn’t matter.

I’ve already noticed some inconsistencies regarding this in the standard library, e.g.

  • std.Io.Dir.rename takes io as the last argument (after all other parameters)
  • std.Io.Dir.readFileAlloc takes io first, then the path, then the allocator, and then options
  • std.process.run takes the allocator first, then the io

I think it would be nice to have some fix guidelines for this.

9 Likes
pub const ZigCtx = struct {
    .io: std.Io,
    .allocator: std.mem.Allocator,
};

???

It’s four words between em, that’s when I start thinking about passing by pointer. ¯\_(ツ)_/¯

3 Likes

I wondered if we’d end up seeing a context wrapper like this. I know your post is partially tongue-in-cheek, but it popped up in Go’s ecosystem very early, even before they stuffed it into the stdlib (which was a massive mistake.) The big issue was that everyone wrote their own with arbitrary functionality. It became an awkward and convenient namespace for utility functions and state that was only technically not global. Since you were passing everything around in it anyway, adding to it didn’t really feel any worse than the baseline bad it felt when you originally added it.

IMO wrapping it in a struct defeats the most interesting part of Zig’s design, which is being able to tell at a glance if something is doing IO or allocating. That gross feeling passing IO and the allocator around everywhere is a correct instinct. I’m hoping it leads to embracing the “functional core, imperative shell” ideology where most of your libraries are relatively stateless and do little IO deep in the call stack. If it pollutes your entire program and you aren’t writing a web proxy or something, then there probably is something wrong. The Haskell people get alot of things right :stuck_out_tongue:

11 Likes

That’s what I’m hoping and also noticing in my own code. I don’t want to pass to much and having “managed” structures, i.e. those with an allocator inside, are also kinda bad.

The problem here is, that now whenever you need even some basic synchronization like a mutex or a condition variable you need to pass IO which pollutes a lot of things. I wonder if we then see the “antipattern” of just setting an atomic to 0 or 1 for the most basic lock. Functionally for locks which aren’t hold a long time, a short spinloop on that atomic is more performant anyway.

2 Likes

I heard somewhere that the semi-official convention is to order arguments from least variant to most variant.
By that logic, it would make sense to put std.Io before std.mem.Allocator, since you generally only have one io for your entire program, but might have more than one allocator.
The reason why I call this “semi-official” is because it’s not really written down anywhere and even the standard library inconsistently follows it, so ultimately there’s no one dogmatic source to point to

10 Likes

Yeah this is super interesting. I don’t have a great answer, but it doesn’t particularly worry me. It feels like the problem Rust had w/ graphs or Go had with non-trivial error values very early on. It was a pain that caused alot of worry, but over a couple years the community wrote enough code that everyone just found the patterns that worked for the common case and it wasn’t an issue anymore.

To be fair, as someone who primarily writes servers and applications, my first instincts were “write more pipeline-y code” and “well yeah, knowing which functions could block and cause a deadlock is actually very much something I wanna know at the call site. That’s worth the parameter.”

But those are very much from an application developer’s perspective. For a library developer you don’t know the context your code is called in. You may add locking to make something thread safe so it works for more use cases, but this forces everyone using it single threaded to pass IO down the call stack to accommodate the feature. The ivory tower “real answer” is write libraries to cover a specific need and direct other use cases elsewhere…but there are practical limits to that and tradeoffs on both sides to the extent that the tension is reasonable.

5 Likes

By Haskell convention, arguments get sorted from least changing to most changing so it would unequivocally be fn(*Io,*Allocator) since you have 1 Io in the whole application. Of course it’s because of currying in Haskell, but I feel like it is helpful when you write a bunch of stuff like:

work(io, alloc, batch1);
work(io, alloc, batch2);

Since the part that changes is easier to see

7 Likes

I could not tell you how much, honestly. Somewhat.

This would be my inclination as well, although I hadn’t heard that heuristic.

type arguments before Io though, right? doThing(io, u8, allocator, ...) feels wrong.

2 Likes

Most likely it would still be “dependency injected dependencies” before other variables. So memory allocator / io (or vice versa) and then things like “age” (u8), “name” ([]const u8), “type” (some enum).

Er. I think an arbitrary convention is easier to reason about. In fact, you don’t have to reason!

And “type arguments before Io?” Oh, please yes! They’re the most invariant of all, after all, right?

3 Likes

I’ve settled on I/O implementations before allocators based on the discussion here (namely the invariant rationale) for z2d for the time being, barring any better arguments to the contrary. :wink: Thanks everyone for the insights!

3 Likes

We already have, in a way! std.process.Init :slight_smile:

3 Likes

core team consensus is io before allocator

18 Likes

I doubt that in real-world programs we will end up with passing io and alloc (or a context) to each and every function.

Wouldn’t it be more pragmatic to have init functions for libraries (or other forms of code modules) and supply these things there?

Then the module can store these in a private struct and the individual functions can get it from there.

You can then see if the module uses io or allocates at the module level. You loose the explicitness of seeing which of the functions use Io or alloc, but otoh this can be made explicit by their declared error results.

I know this is probably tongue in cheek.

But std.process.Init (or std.process.Init.Minimal) should not be passed down, half of what it contains will only be used in main, and even contains some duplicate data due to having its Minimal variant as a field.

If you have a bunch of common parameters make your own context type.


Generally no, the user of the library probably wont pass the same allocator to each function. And the library is decently likely to have some functions that expect a gpa like allocator and some that expect an arena like allocator.

So it doesn’t make much sense on either side.

Even with Io, it is not unexpected that some will use multiple implementations in their codebase.
Such uses will likely just be std.Io.Threaded.global_single_threaded for niche cases instead of their main Io.

eh no, you would only see that at 1 (one) point in the code, so it is almost a complete reduction of knowledge about the api.

And as an aside, not all errors in the Allocator and Io error sets are unique to them, OutOfMemory is often used when code runs out of buffer space, in addition to allocation failure.

1 Like

We’ll wait and see.

I was thinking this too. Even if it’s niche, why restrict someone to just one implementation? One of the great parts of explicit allocator selection is that it gets you thinking about how your memory is laid out, where (and why) you’re doing dynamic allocation, and possibly how you could avoid doing it or do it in a way that’s better for performance. Maybe the same thing happens with I/O. @badtuple similarly mentioned this earlier as well:

Having the state front and center promotes a “use it or lose it” mindset. I could say the same for explicit error sets as well, which (at least for me) helps promote thought surrounding error handling and working towards infallibility.

3 Likes