Allocators and thread safety (and not only)

long story short - std.mem.Allocator should support

    /// Returns true if implementation is thread-safe
    isThreadSafe: *const fn (*anyopaque) bool,

Allocators:

Zig doesn’t have a default allocator…All of Zig’s standard library, and most third party libraries, require the caller to provide an allocator if they intend to allocate memory

Several allocators from std are thread-safe:

Several are not:

ThreadSafeAllocator:

Wraps a non-thread-safe allocator and makes it thread-safe.

But it will not help me : I can wrap caller’s allocator, but I can’t prevent usage of non-wrapped one by caller.

BTW - In order to understand which allocator from std are thread safe or not I need to read source. And I am not sure that what I wrote above is right.

Also will not help to ask caller in readme or help or comment “to provide thread-safe allocator”, specially if caller also got it from upper code.

So as user of Allocator “interface”, I need to know whether it’s thread-safe.

What do you think about adding to std.mem.Allocator

    /// Returns true if implementation is thread-safe
    isThreadSafe: *const fn (*anyopaque) bool,
1 Like

Well, I see the problem this solves, and I don’t have a better solution.

But this would require the vtable to have an additional function pointer, to a function only used once, and that in an effectively comptime-only manner, since the reason to use that function is to refuse to run with a thread-unsafe allocator. That seems far too heavyweight.

I’m adding this to my collection of reasons why giving no compiler support at all to the type-erased interface pattern is suboptimal, btw. If there were a distinct primitive type involved, it could be possible to comptime inspect the parent type of any actual calls, and give a compile error that way. But I don’t want to go on a whole side-quest right now about how all that might be accomplished.

3 Likes

One thing I dislike about Zig allocators is how the term “allocator” is used to different things. That creates confusion which in turn leads to suboptimal design. It’d be helpful, I believe, to restrict “allocator” to mean only one thing–std.mem.Allocator–an interface from which you can allocate memory. Things that provide this interface should be either a “memory source” or “allocation strategy”, with the latter dependent on an instance (or possibly more) of the former to obtain actual memory. The heap is a source of memory, for instance, while debug or arena is a strategy.

2 Likes

I see what you mean, but this is also done with Readers and Writers as well, you still have to call writer() on a BufferedWriter in order to get an AnyWriter.

Not sure if that’s an argument for or against, though. I don’t find the conventions in play especially confusing, and didn’t when I was getting started, but more clarity is generally a good thing.

Practical languages have limitations in just how far you can express constraints in the type system. Zig may be on some of the lower end of that scale.

Where something is inexpressible in the type system, then reach for tools (e.g. static code analysis, some extra-language metadata system, which Zig ecosystem currently lacks because of the volatility of a pre 1.0 language) and documentation.

It would be possible to implement allocators that verify that they are only used on a given thread by caching the thread id of the first allocation and asserting that subsequent allocations and deallocations are on the same thread. This could be arranged to exist only for safe builds. This is arguably better than checking the result of a runtime method.

3 Likes

Agreed, but what I find interesting about this case in particular is that Zig has very good type reflection, which can be used to comptime-verify a lot of properties, giving a sort of ad-hoc (or general, if you prefer) ability to add type-level constraints.

But the type erasure in the Allocator interface means that it’s a place where we can’t take advantage of that ability. A convention which simply establishes that an allocator should have a declaration pub const is_thread_safe: bool = false; or true, would work fine, with just the occasional nudge to someone writing a new allocator to declare it if it wasn’t.

But we can’t check it, by the time we see the Allocator the backing type is long gone.

1 Like

Giving things distinct identities allows us to attach attributes to them. We can say that if something is a data source, then it’ll have a method called isThreadsafe(). Right now, we can’t define comptime interface for these things because we don’t have names to refer to them. “Allocator” is already used by the vtable interface.

The deeper problem is that code which can’t see the backing type might need to know things about that backing type to function correctly.

Anything with Allocator in the name, other than Allocator itself, is a data type which can produce an Allocator, whether as a singleton, wrapping another type with Allocator in the name, or by calling allocator() to get it. So we have enough of a convention to say that these types need to provide this or that bit of information, if we want.

But the code that needs it still can’t get it.

C/C++/C# “Default allocators” are thread-safe.
I think it’s the reason that this implicit propery of allocator was not taken upon consideration in std.mem.Allocator approach

Also mulithreading is not used widely in current Zig projects (my feelings)
And usually GPA(DebugAlloctor) and std.testing.allocator are used
Both are thread-safe

But in production code we need to know wherther allocator is thread-safe or not.

The cost of adding isThreadSafe is much less than prevented damage

I view this as primarily a problem for libraries.

An application, yes, you need to know if an allocator is thread-safe, if you’re going to use it in a multi-threaded context. However, consulting the documentation is a complete solution to that need.

Library code might not function correctly without a thread-safe allocator. It should be able to compile error out if the Allocator of a non-safe allocator reaches it. But that information is not available at the point of use.

1 Like

what’s wrong with run-time checking via IsThreadSafe?

I understand the desire to solve the problem via zig features like comptime, reflection etc
But sometimes old plain approach will be good enough

This might be controversial, but I think, when it comes to multithreading (specifically allocating in one thread freeing it in another), having a global allocator is the right solution.
Otherwise you just end up dragging these allocators through your code and even if it’s thread-safe, there is also a whole life-time mess that comes along with it, thread safety doesn’t help if the allocator is a local arena that’s destroyed before the second thread accesses it.

And with a global allocator you can just ask the user to have a @import("root").threadSafeAllocatorForLibraryXyz declaration, making it obvious that this is required to be thread-safe.

4 Likes

I do not find your take that controversial, addressing both the lifetime issues as well as the issue of ignoring the documentation of required thread safety.

Coming from experience with languages universally providing thread safe global allocators, Zig’s take to make allocator use explicit was one of my main points to play more with it. Expanding on this, I envision similar strategies with use of threading. Library code should not use threading behind the scenes, but instead require parameters like thread pools to make its potential use of multithreading obvious.

1 Like

The only thing wrong with it is that it makes a compile-time error into a run-time error. That’s automatically bad.

“Thread-unsafe allocator passed to function which requires thread safety” is a type mismatch, basically, just one which doesn’t happen to be captured by the type system.

If there were a way to determine this condition from an Allocator instance, but only at runtime, well ok. Not ideal, not a big deal. But it’s worse than that, there’s just no way to do it†, not without some clairvoyant knowledge of how the backing type works.

† I’m not counting adding a function and function pointer to Allocator which gets this across at runtime, because it’s vastly less important than that solution would need it to be.

Thread safety is just a particular instance of a kind of information that is useful to associate with a type. It’s partly why languages like C# have an attribute system: the properties that are interesting to embue in a type is open-ended, and orthogonal to the signatures of it’s methods, which is the domain that interfaces are mostly about.

I ihink that “give me allocator” idiom is suitable for std or similar
“utilities” libraries when caller can expect what is done under the hood.

Now I am working on new async. application protocol.
Allocator reveived during init()… will be used:

  • in different threads
  • lot of “alloc”/“free”

As you write above, not only tread safety is the issue, also I don’t want to get ArenaAllocator.

Also I can not use DebugAllocator, becuase as far as I understand
it’s Singleton and may be initialized in the process with thread-safety disabled.

OK not we have two issues with allocators

  • thread-safety
  • life-time mess (ArenaAllocator)

ArenaAllocator also is not suitable for lot of alloc/real free operations means re-using of prev. allocated memory.

We need default process wide allocator alloc/free style

  • multithreaded
  • process life time

This allocator may be provided also as std.mem.Allocator to std and other libraries supported “give me allocator” idiom.

Special Allocators like ArenaAllocator will be used for very specific purposes and not as common solution

I’m still a newbie, so forgive me if I perhaps misunderstand a few concepts.

As far as I understand, if the distinction between thread-safe allocators and non-thread-safe allocators was made during compile time, the detection of whether an allocator fulfills certain properties could work through duck-typing, i.e. all allocators are accepted by taking an alloc: anytype argument, and then we can check at compile-time whether a particular allocator was fulfilling the suitable properties or not (whether by checking a const to be true or a method being existent, etc).

For some reasons, Zig moved away from this approach and does instead do dynamic dispatching, such that we have a concrete type for all allocators. That reduces binary size and may come with some other advantages that I don’t fully understand yet (maybe also syntactical advantages).

But now, we want or need to distinguish between (at least) two “classes” of allocators. Those which are thread-safe, and those which are not.

Now my naive take (remember I’m still learning) would be to have two different types for that, e.g. let’s say Alloc and ThreadSafeAlloc. But we don’t have inheritance, yet we might want to treat a ThreadSafeAlloc like an Alloc in some cases. For this, couldn’t ThreadSafeAlloc not simply provide an allocator() method, which returns a (pointer to a?) value of type Alloc? Or maybe just a field?

Then we would have something like:

const std = @import("std");
const Allocator = std.mem.Allocator;

const ThreadSafeAlloc = struct {
    allocator: Allocator,
};

fn foo(alloc: ThreadSafeAlloc) error{OutOfMemory}![]u8 {
    // Let's assume we need a thread-safe allocator here.
    return try alloc.allocator.dupe(u8, "Please free me!");
}

pub fn main() !void {
    const alloc = ThreadSafeAlloc{ .allocator = std.heap.smp_allocator };
    const message = try foo(alloc);
    std.debug.print("{s}\n", .{message});
    alloc.allocator.free(message);
}

But perhaps that’s syntax hell?

1 Like

perhaps :joy:

You can’t really store an anytype in a struct without it being a generic. I think that’s the most important reason but there might be other.

2 Likes