So I’m working on a library which uses bitsets extensively. It allocates slices of memory, which need to be zero.
In C, of course, we use calloc
for this, and for good reason: in hosted systems, there tends to be zeroed memory available, and if that happens to be the case, we get our zeroes for free.
I was a bit surprised to not find an Allocator.calloc
, so I went digging.
Looking around, I was able to find an issue for Allocator.allocZeroed, which was closed as a duplicate of #2090, where I found the reasoning (quotes are @andrewrk):
I want to note that the Allocator API intentionally does not have a allocate-and-zero-initialize function, and likewise Zig itself does not have the concept of zero initialization. It was removed in 6a5e61a.
Because:
This goes against the core language design that “zero initialization” isn’t meaningful. This is one area where Go and Zig differ strongly.
The reasoning here makes sense to me. There’s even a quote in the documentation of std.mem.zeroes
which asks us to consider very carefully whether this is necessary, and that use of it might be a code smell.
I’m also strongly opposed to the idea that memory should have a zero default. It causes inefficiency when not needed, it encourages buggy and lazy code, and it subverts the use of optionals: all of these things are Bad.
This leaves me with a problem, however: I want optimal code. I’m following the salutary Ziggish practice of taking an allocator for the functions which allocate memory, but this means that the compiler can’t optimize @memset(mySlice, 0)
. Since the allocator is provided at runtime, and there is no guarantee that any given allocator will provide zeroed memory, it can’t know whether it’s safe to eliminate the @memset
while compiling. Meanwhile, I know that release mode can mostly give me my zeros for free, but the above means that it will run @memset
anyway.
If I were to reach for calloc
for internal allocations, and copy the final values over to the user-provided allocator, that would take control away from the user, and make the library useful in fewer circumstances. And I have no other need for libc
, so it would be a shame to link it just for this one thing.
I use a lot of Julia as well, and there are dozens of posts where people are trying to find the fastest way to get zeroed memory for their algorithm. I fully support rejecting “make zero a meaningful value”, and discouraging lazy programming which sets everything to zero out of habits formed using different languages.
But there’s still this way that zero is a special value, namely, an allocator is very likely to have a bunch of zeroes sitting around, waiting to hand over. At the same time, I follow the reasoning that allocators shouldn’t provide a special zeroing alloc
. The most common value to initialize memory to is always “what I put in it once I get it”, but zero is unquestionably the most common value after uninitialized
, and happens to be very cheap to provide, in many cases of interest.
So the proposal is: add allocWithScalar(self: Allocator, comptime T: Type, n: usize, comptime val: T)
, which returns a slice []T
where all elements are val
. This will only compile if the user can provide a comptime-known value for val
.
This way zero isn’t special, but if val
is 0
, allocators which have zeroed memory around can provide it optimally. Otherwise it will fallback to using @memset(newAllocation, val);
before returning the memory.
This is a function which would pull its weight. Zero isn’t a special value anymore, but it respects the reality that many allocators can provide zero cheaply. It expresses intention explicitly, favors correct code, and doesn’t provide the same moral hazard of tempting people to use it “just in case”.
Remember, the compiler can’t optimize alloc + memset, only the allocator can do that.
One can even imagine creating allocators which pre-fill with a non-zero value, and which could provide that value, instead of zero, cheaply. This isn’t so far-fetched that I can’t come up with an example: consider a garbage-collected allocator which prefills some pages with the gray bit already set. That can be SIMD optimized, and then allocation can spare a few cycles while the mutator is running.
Or a specialized MemPool which page-allocates a known default, and resets on free
. You can implement that now by writing functions to only take the special allocator, but then you can no longer swap in a testing allocator, logging allocator, or anything but your specialized implementation.
Or even simpler, let’s say that a program uses a lot of bitsets where most bits are expected to be one. Same basic argument, a specialized allocator can request pages from the OS and SIMD fill them with ff
, then provide that faster at the point of use.
It’s true that zeroed memory is the most common kind to want, and the cheapest to provide, in the general case. I’m making a case here that providing the general mechanism isn’t trying to sneak a zero allocator in the back door, but is instead a powerful and general mechanism, providing control and optimal code.
I think this wouldn’t lead to code bloat. Let’s say a given allocator can provide one value cheaply (in most cases this is zero), so allocWithScalar
has a comptime if
statement: one branch is the cheap value, the other branch uses @memset
. So this shouldn’t monomorphize into more functions than would be useful to provide, right? Which is one function for an allocator which doesn’t have preinitialized memory available, and two functions for the ones which do.
If I’m right about that (unclear), then allocWithScalar
would become the one obvious way to allocate to a default value. Either you get a performance boost, or you don’t, but it clearly expresses the intention “give me memory preinitialized to this value”, and the not-optimal case is identical to alloc
followed by @memset
.
I didn’t want to add this as a proposal to the issue board, because the “proposal” button says “no thank you, at this time”, and one must be respectful of that sort of choice.
By beginning the discussion here, it can be examined on its merits, without imposing a maintenance burden on core.