Why are allocator interfaces and implementations split across `std.mem` / `std.heap`, while IO is colocated under `std.Io`?

  • Avoid overcategorization

That’s a good rule. It’s not tautological, any more than “avoid overeating” is. Categorization for the sake of categorization is not helpful, we don’t want std.data.dense.ArrayList, and that sort of ontological exuberance has been seen in the wild before.

Is categorization. The only thing linked_list is doing is putting two disparate data structures into the same category. Which is good! (LinkedList(Single|Double is not so completion-friendly, but the ‘most idiomatic’ ones are).

I think a std.map is worth considering on that basis. Probably not a std.list.array.ArrayList and a std.list.linked.Single though. My sense is that what you mean by “avoid categorization” is that doing taxonomy for its own sake ends up with a deep tree, and we should prefer shallow ones.

What I’m gesturing at with respect to std.Thread is that it conflates a category (things which pertain to threads) with the type-case of that category, Thread. I think the redundancy of having a std.thread.Thread is worth the clarity of not having a std.Thread.Mutex, when a Thread neither takes a Mutex as an argument, nor returns one.

It could also be std.Thread, and std.thread.Mutex. That might end up confusing, but then again maybe not.

I don’t think the recent trend of putting everything which makes use of an interface, within the type of that interface, is bad at all, organizationally speaking. It retains the problem that when looking at the home page of std, it’s not clear which capital letters have “goodies” in them, and which are just types you can do stuff with.

One way around that is to have fewer capital letters straight off of std.

7 Likes

Maybe Mutex should just sit in std.Mutex. Maybe all core functionalities should be a the root level. I mean one thing we know from the database world is that hierarchical databases are a crappy way of organizing data. We use relational databases to store pretty much everything these days. Tying to categorization to location is very limiting.

Thread.Mutex always blocks the thread. Io.Mutex integrates with the Io implementation, potentially doing task-switching rather than thread parking.

7 Likes

I think some categorization is good. In C# we had system.collection with List, Hash, etc. Very easy to find, group, document. Of course then they immediately fucked it up by adding generics and then making system.collection.generic with all the new generic variants.

All that to say I think it’s not over categorization but proper categorization to add std.collection or some such and then put every collection type in there.

5 Likes

To speak in favour of categories - they help making things discoverable. If there was a std.data which was the home for all data structures (ArrayList, HashMap, etc). I can read that part of the standard library and understand which data structures are available to me. Without it, I have to read at least all top-level members of std to identify all the data structures before I can learn about them. I’ve done this (multiple times, for different categories)! It was a pain in the butt.

I wouldn’t write a book without chapters, especially a technical book. The standard library becomes a technical reference manual for how to use itself. It should have chapters.

I also wouldn’t write a book with 200 chapters when 10 would suffice. That is detrimental.

Following the conversation here, I think there’s a missing factor from this list. Type vs non-type. It seems very easy to abuse a Type style library. If a Type style library contains other types which the user is meant to use, haven’t we just re-invented Java-style class based libraries where the class is little more than a name-space? That seems wrong.

I think in this case it’s useful to see what happened in the evolution of the Python standard library. It started with a “category” style library os that contained equivalents of the C file interface. They also had os.path for manipulating path strings. Basically places to put lots of related functions.

This lasted for decades (up until 3.4) until some bright spark created pathlib which has a type Path which represents a file-system path, and methods off it are the operations you want to perform on the file-system object at that path. I’ve personally found this a very nice abstraction to use, and almost totally removes the need for the old libraries.

Why I think this works:

  • The abstraction isn’t based on the object you want to manipulate (a file, directory, pipe or device). It’s based on how you access those objects - through a path.
  • It successfully abstracts away the main cross-platform issue. Paths are represented differently on different platforms.
  • It has escape hatches for when you need to use the C API for something less generic.
4 Likes

This is true of course. But container types make nested namespaces, no way around that. File systems are very popular: I’ll note that us programmers, we could certainly keep code in a relational database format if we wanted to, and we still use repos.

DNS seems to scale well also. DNS has the same constraint as writing out a type: text is linear, and it’s better if there’s one way to write something instead of several.

That isn’t conclusive, but it’s at least suggestive.

Is there a suggestion to go with the observation? It would be possible to juice up Zig containers with some kind of metadata, even just reusing declarations, Clojure gets a lot of mileage out of a metadata system. I’m not opposed to this, nor am I personally advocating for it.

Everything still needs a canonical location within a namespace tree, though. That’s just a consequence of deep choices in the Zig implementation, which are also utterly pervasive in the field. When I start thinking about relationally-organized source code I get a bit dizzy. Could be pretty nice, I’m just saying.

Another pair of observations: Io.Mutex takes an *Io as an argument, Mutex and Thread do not have any such relationship. It relates to “thread”, but the way it relates to Thread is not so direct.

The difference is that this latter pair of observations helps me find Io.Mutex, and explains why it’s there and not somewhere else. I think it would be pretty helpful if every std.Foo.Bar in the collection have the Io/Io.Mutex kind of relationship, rather than the Thread/Thread.Mutex kind.

My picture of the good here is a top-level namespace which has a lot of fan-out into an otherwise fairly flat collection of functions and data structures, and more than that, any Capital types should be like Build or Io, where the goodies in the bag have a direct “returns or takes” relationship to the containing type. It’s not too far off from what we already have.

std.Allocator.Arena would be great as long as Allocator isn’t sharing capital-A with:

Putting all of those into array and map (and map would have eight or nine more) would help make the pattern clearer.

Because none of these have any goodies, they’re plain-old-data structures. There would be no way to know “oh ok, Allocator, obviously that contains all the allocation stuff”. It’s a big collection of lowercase bags, and a big collection of uppercase data structures which have a few bags in there. Which ones? Well, open them and find out.

Alternatively, they could be marked up in some distinctive way in the documentation, or sorted separately somehow. I don’t know how that happens with autodoc, but if it was wanted it could clearly be had.

I just want the front page of std to be as easy to use as possible, because that makes the language easier to learn, and to use even after learning it. I certainly look at the documentation quite frequently and don’t expect I’ll ever stop.

6 Likes

The most obvious solution is using hashtags. So something like this:

/// [description]
///
/// #synchronization #thread #io #async 
pub const Mutex = struct {
    // ...
};

/// [description]
///
/// #thread #io #async
pub const Thread = struct {
    // ...
};

For mapping to the file system, the first tag would denote the sub-directory where we have the hardlink. The other sub-directories will each get a symlink meanwhile.

For activating code suggestion in a IDE, you would type std.#thread.. The editor response by giving you a list. Once a selection is made, the text get collapsed to std.Mutex.

2 Likes

Quite frankly, I like the idea of splitting the allocators into two namespaces about the following rule: Does it need operating system support?

E.g. std.heap.smp_allocator and std.heap.FixedBufferAllocator should go into different namespaces under that rule.

But the current system doesn’t follow that rule. std.mem has the Allocator interface, and the allocator std.mem.ValidationAllocator. Every other allocator lives under std.heap.

3 Likes

From a discoverability standpoint, I think fat types are specifically helpful in the context of interface implementations, and for types with defaults and variants. For example, std.Io.Threaded and std.Io.Evented make a lot of sense to find under the Io namespace, as they are implementations of the Io type. If I navigate to the std.Io documentation, I’m probably looking to see how I can use std.Io, and instantiating concrete implementations is a critical part of that. However, closely related types like Queue and Mutex end up polluting the Types section, making it hard to find implementations of Io. Right now, it’s basically impossible to figure out how to declare a variable of type std.Io from the standard library docs alone, and I attribute this in part to the noise.

Consider a structure like this:

  • std.io.Io
  • std.io.Mutex
  • std.io.Queue
  • std.io.Io.Threaded
  • std.io.Io.Evented

This puts the related surface level interfaces in the same namespace, making it easy to discover the types that work together by looking at the docs for std.io. Then it puts concrete types under the interface, so if I navigate to the docs for std.io.Io with the intent of figuring out how to instantiate a value, I can discover how just by seeing Threaded and Evented under the Types section.

On the matter for variants, std.ArrayList is the ArrayList that (I assume) people should be defaulting to use, and only explore other options if they have a special reason to. There is already a std.array_list namepsace, but the only path to it from std.Arraylist is through clicking the hyperlinks in the source code.

I think this would make a lot of sense:

  • std.ArrayList
  • std.ArrayList.Managed
  • std.ArrayList.Aligned

Same with HashMap and the like. (I realized after writing this that the ArrayList variants I listed are deprecated, but I think my point here still stands).

2 Likes

All of this seems logical, but io.IO conflicts with the Guide principle “Avoid redundant names in fully-qualified namespaces” … I am one who dislikes repeats if they’re possible to avoid, too, and think there may be other ways to organize Mutex and Queue. std.Io, std.Io.Threaded, and std.Io.Evented (as is) seem perfectly sound. I don’t think std.Io.Mutex is ugly, and it does serve to differentiate from std.Thread.Mutex, as mentioned.

In that case, how about: std.Io => std.io.Runtime, with std.io.Runtime.Threaded, std.io.Runtime.Evented, std.io.Queue, and std.io.Mutex? A value of the (current) type std.Io represents an async I/O runtime environment, so calling it Runtime instead seems appropriate to avoid redundant naming.

As it is, it’s significantly more than that. It’s more like an operating system abstraction. It essentially defines what people use POSIX APIs for.

2 Likes

My assumption is that this is a migration, in 0.16 the preferred way to make an ArrayListAligned is going to be std.array_list.Aligned, and in the next release, that will be the only way. Or something like that, it’s not unlikely that following these links in a week won’t make much sense in context. Moving target!

Of course then there’s the question of ArrayList itself: either we have std.ArrayList and std.array_list.Aligned, and that’s confusing, or we have std.array_list.ArrayList, and that’s redundant. ArrayList is a function, so std.ArrayList.Aligned isn’t an option here.

I suspect this comes down to whether or not one is in the habit of localizing types. I do so almost always, I have an import list (which, these days, is kept at the bottom), so if I type std.io.Io, it will be const Io = std.io.Io; and that’s just once per file.

For whatever reason std.mem is an exception to this, or at least the functions are, I always localize Allocator, or almost always. Anyway.

std.io.Io doesn’t look very nice in a parameter definition, I think we’ll all agree.

Personally I want std to be a directory. It should have just enough names to a) peruse them in reasonable time and b) make it relatively easy to guess where the affordance I’m hunting for is located.

Capitalized types are no barrier to this at all, so long as the ones which are directly off of std are also directory entries. This doesn’t have to be a strict rule, just generally, but it isn’t at all true right now. I don’t want to count to make this exact, but easily 90% of the instantiable types off std (the Capitalized ones) are just, types.

I think it would be better if ~all of those went into containing namespaces, not in the interest of categorization, but in the interest of search.

Even Thread, which I’ve picked on a bit here, it would be quite alright as-is, so long as ~all the Capitals are directory entries. A newbie might think “well I’m sure there’s a semaphore type somewhere, hmm, tar, …, tz, ok no thread, oh but there’s Thread, ok, there it is!”

But if our newb has opened three or four Capitals already, and they’re just types, looking up there is not obvious. Obvious is good.

2 Likes

To define an aligned ArrayList, we should use std.ArrayList(T, .{ .alignment = alignment }). Why the heck are we specifying an option through namespacing?

Yes, this is what I’d always do if there were name redundancies in fully-qualified namespaces; I don’t think it’s that onerous, but it would perhaps reflect a soft read of the “avoid redundant names in fully-qualified namespaces” bullet.

It feels like it could be difficult to achieve both that AND unified/consistent root-level treatment (caps vs. no caps)… at least without some creative or thoughtful names. Seems above my paygrade anyway, but I like ideas churned up in this thread.

It’s actually eliding an option through namespacing, and it’s because Zig doesn’t have optional parameters. ArrayList is literally just this:

pub fn ArrayList(comptime T: type) type {
    return array_list.Aligned(T, null);
}

This is defensible on its own, ArrayList(T, null) when the vast majority of ArrayLists will just take natural alignment, is noisier than adding another name to std. Names themselves are cheap.

But to beat out another measure on the same drum, it’s profligate to have the “front page” of the std directory contain a thin wrapper function. That’s not a good use of the prime branch factor in my view. ArrayList might well be the most common data structure in the stdlib, but it could still live in std.array_list, and I think it should.

A maximalist commitment to no redundancy might end up with std.list.Array, std.list.array.Aligned, std.list.linked.Single, and so on. Is this :butterfly: categorization?

I dunno, it’s deeper nesting than I would like to see, just my taste. I think std.array_list.ArrayList, std.thread.Thread, etc., are good, for the same reason we have gorilla gorilla and names like that in biological taxonomy. One is the genus, one is the species, it’s not actually redundant.

1 Like

Not it’s not. It’s stupid as hell to create a sub-namespace just to avoid typing , .{}. It’s is a complete misuse of the concept of namespace. The very existence of namespaces in programming is an acknowledgement that different people perceive the world differently. Alice might think that abc means one thing while Bob might think that abc means something else. The possibility of conflicts between different mental pictures is why namespace as a concept exists in the first place.

sure it does! it’s a namespace containing a whole heap of random stuff

(sorry for the low-signal reply, lol)

1 Like

I think this philosophy ultimately leads to a question—should a file really be a namespace?
I believe the basic idea of ‘a file is a namespace’ may actually hinder this kind of namespace philosophy. If we think about namespaces from this perspective, we would realize that we should put the vast majority of symbols in the same file, because from a namespace point of view, they are flat under that namespace. However, we are also accustomed to not putting too much into a single file, which naturally leads us to start using namespaces for categorization.

Well, that’s the reason why Odin for example uses directories instead of files as namespaces, which also means that you can just use everything from the other files of the same directory without needing to do anything.

1 Like