Generic Managed Implementation

Currently the managed variations of ArrayList are marked as deprecated and I think that’s the right choice, but providing a way to opt-in to higher level ways of managing memory I think is valuable to have.

The dream would be:

// AFAIK this can't be done
test "struct managed memory" {
    // we opt-in to the choice of having the struct keep track of it's allocator
    var arr: Managed(std.array_list.Aligned(u8, null)) = .initCapacity(std.testing.allocator, 10);
    defer arr.deinit();

    try arr.append(10);
}

I’ve been able to create a working proof of concept that gets as far as the following:

// sc == Static call
// ic == Instance call
// see full implementation below for details
test "struct managed memory" {
    var arr: Managed(std.array_list.Aligned(u8, null)) = .sc("initCapacity", .{std.testing.allocator, 10});
    defer arr.ic("deinit", .{});

    try arr.ic("append", .{10});
    try arr.ic("append", .{20});
    try arr.ic("append", .{30});
}
Full implementation
// zig 0.15.2

const std = @import("std");

pub fn Managed(comptime T: type) type {
    const ti = @typeInfo(T);
    if (ti != .@"struct") @compileError("Can only manage structs.");

    const M = struct {
        allocator: std.mem.Allocator,
        managed: T,

        /// `init` methods will need to return the type `T`, wrapped in the managed struct `M`
        fn willMakeManagedInstance(comptime name: []const u8) type {
            if (errorlessType(fnReturnType(T, name)) == T) return @This();
            return void;
        }

        /// Static call
        pub fn sc(comptime name: []const u8, args: anytype) willMakeManagedInstance(name) {
            const method = @field(T, name);
            const method_ti = fnTypeInfo(T, name);
            if (method_ti.params.len > 0 and method_ti.params[0].type == T) @compileError("Use ic instead.");
            if (errorlessType(fnReturnType(T, name)) == T) {
                return @This(){
                    // WARN: just assumes the first arg is the allocator, should actually check
                    .allocator = args[0],
                    // WARN: just a hacky fix to not address errors for now
                    .managed = @call(.auto, method, args) catch @panic("oom"),
                };
            }
        }

        // Instance call
        pub fn ic(self: *@This(), comptime name: []const u8, args: anytype) fnReturnType(T, name) {
            const method = @field(T, name);

            // TODO: should probably check if the param of the method being called actually needs an allocator

            return @call(.auto, method, .{&self.managed, self.allocator} ++ args);
        }
    };

    return M;
}

// Example

// test "struct managed memory" {
//     var arr: Managed(std.array_list.Aligned(u8, null)) = .initCapacity(std.testing.allocator, 10);
//     defer arr.deinit();
// 
//     try arr.append(10);
// }

test "struct managed memory" {
    const ManagedArrayList = Managed(std.array_list.Aligned(u8, null));
    var arr: ManagedArrayList = .sc("initCapacity", .{std.testing.allocator, 10});
    defer arr.ic("deinit", .{});

    // ideally it would be nice to create an actual function in some way so that you could just
    // do `try arr.append(10)`.
    try arr.ic("append", .{10});
    try arr.ic("append", .{20});
    try arr.ic("append", .{30});
}

// Type Helper functions

/// Get function type info of named function.
fn fnTypeInfo(comptime T: type, comptime name: []const u8) std.builtin.Type.Fn {
    const field = @field(T, name);
    const field_ti = @typeInfo(@TypeOf(field));
    if (field_ti != .@"fn") @compileError("Named field is not a function.");

    return field_ti.@"fn";
}

/// Get return type of named function.
fn fnReturnType(comptime T: type, comptime name: []const u8) type {
    const method = fnTypeInfo(T, name);
    const method_rt = method.return_type.?;

    return method_rt;
}

fn errorlessType(comptime T: type) type {
    const ti = @typeInfo(T);
    if (ti == .error_union) {
        // We don't care about the error part in this example.
        return ti.error_union.payload;
    }
    return T;
}

This was just an afternoon experiment so am sure could improve it a bit further but the current comptime limitations I think would prevent my ideal case. I’m using array lists as the example but the extended goal is that this is generic to anything that does allocations in it’s functions.

For some context, I’m coming from a background where I’ve done a lot of tutoring and teaching programming across a lot of languages and skill levels, so accessibility and ease of use is something I’m always thinking about even when working on my own projects.
I like zig a lot and would be cool if I could have it in my cards of choice when someone asks me about learning to make something.

1 Like

I had this idea already, I decided against it for the following.

The only reason to have managed containers is to associate an allocator with a collection at runtime.
All other reasons are not important to zig or even go against the zig zen.

Most of your Managed implementation is overcomplicated by trying to pass the allocator implicitly. One of the main reasons managed containers have been deprecated is a preference for the allocation clarity you get from unmanaged containers.

There is also the complication of which argument is the allocator, with the upcoming Io interface you will find that you can’t just assume the allocator is the first argument. There are also APIs that may deem other parameters more important than the allocator and put them before it.

You can simplify your current and future implementation of Managed by removing that “feature” and just accessing the unmanaged collection directly. That also retains the allocation clarity zig prefers.

At that point Managed is just a struct with 2 fields and no functions, such a type is trivial to add to any project that needs it, so why should it be in std?

Another big reason against a std.Managed is that when you do need it you likely have multiple containers you want to associate with an allocator, you would not want to store the same allocator multiple times. Such a managed collection is best implemented on a case by case basis.

4 Likes

Thanks for the reply! Some things I’m still trying to wrap my head around:

All other reasons are not important to zig or even go against the zig zen.

I feel like I’m not understanding this. Both in why zig would find certain reasons unimportant and which part of the zen it ties to. Is the intention that the user should always be the one to decide and implement management?

One of the main reasons managed containers have been deprecated is a preference for the allocation clarity you get from unmanaged containers.

I think the allocation clarity of unmanaged containers has some trips that got me looking for a managed implementation. When passing an allocator to a function like append, there’s no guarantee an allocation will occur, just that it might need an allocation. Originally (when unmanaged containers was released) I thought that requirement of an allocator was to signify allocation will happen.

Long lived unmanaged containers felt like you need to write a managed container wrapper anyways unless it’s expected to drill the allocator through a parameter at every function layer. Then if you are passing it through every layer then risk of passing the wrong allocator seems to create a risk of memory leaks?


In higher level projects such as things like small games, simple websites/servers, random CLI tools, the convenience to sacrifice some management and optimisation for something 90% of the way there feels nice to work with. It also makes good examples of how to manage things so when I do need that last 10%, I can use the existing management as basis.

Perhaps I’m also being a bit pedantic in putting the language through the lens of “if someone brand new to programming”. A lot of the people I make things with in the spare time aren’t as technically inclined, so I don’t really get to reach for zig as much since I know there are a lot of questions I’d have to answer with “just because” since I know a full explanation isn’t going to fly for someone like that.

this is getting maybe slightly off-topic, but it is interesting to me that std.Io implementations sometimes need an allocator (but not all do) and wind up being “managed” structs.

3 Likes

To be fair, I’ve rambled a fair bit so it does encapsulate a lot of areas for discussion :sweat_smile:

IIRC it’s been said that the std library is being developed for the sake of the compiler first. So when the focus becomes “what do developers want out of the std library” we might see some significant API tweaks and changes but the abstract ideas (like std.Io) stay the same.

1 Like

The easier thing to do is, if they do go ahead and remove all the managed structures from the language, just vendor them in a repo.

Interpret doctrine as damage, which it is, and route around it.

2 Likes

My question would be:
do you actually use lots of operations on that managed instance?

If you only ever call append on it and then let some other code do something with the data, I think that it might make sense to use something like a ListAccumulator that only has an append function and contains the allocator and arraylist.

Concrete (static) interface over Generic Managed collections

That way the code that collects data can have a simple interface to add its data to a list, then on the processing side of things you can just dig into the internal unmanaged ArrayList and use all its operations directly, without creating generic code for all the operations that you won’t use anyway.

I think I am most likely to want something more managed when I want code with a simple interface, but I think in those situations it is better to just write some kind of struct that presents that simple static interface to that code explicitly, instead of relying on a generic way to have a sort-of matching overly generic static interface. I think the more important part for that kind of code is that it calls a handful of functions on some given type to do its job, so I think it is better to do that directly instead of trying to avoid writing that type.

benefits

The benefits of writing that type is that you can explicitly document what the api expects, you can also add verification logic for example if data has some kind of ordering constraints and if people need to understand or adapt the code it is easier to understand and change towards different needs. For example if in the future you decide to store the data in some other ordered data structure you just have to change the implementation of your accumulator type and the processing side, but not everyone who used the static interface defined by the accumulator type.

There is also the added benefit that the static interface is much more likely to be more minimal and what is actually needed and expected, which makes the code easier to understand and adapt.
If you constantly work with lose generic managed collections of stuff, then I think it takes a lot more work to see which parts are actually grouped together and part of a similar pattern of usage and which parts are just generic random one shot usages that are basically in-place hacks that should be part of the intended design of the interface and implementation, but leaked into the usage code because the interface is overly generic.

Basically I don’t think such a generic approach to create managed data structures is advisable, because ideally grouping unmanaged data structures would also group multiple together and give a good and easy to understand static interface to the user at the same time, that documents how it is supposed to be used, without overloading the user with many functions they aren’t supposed to call anyway.

If you create such structs that group things together, those also can explain domain specific things like that some arraylist will always contain a head node which contains a length (or allows to compute it) followed by length element nodes, or similar stuff.

zig zen

Going overly generic seems like a way to avoid writing code, but the zig zen tells us to optimize for reading and understanding code. If you use too much generic code than eventually you need to fill in more and more blanks in your head instead of being able to see it right in the code and eventually everyone hits the limits of their working memory.

At that point I usually want to throw away the code and rewrite it in a less generic way, but ideally we would want to be able to incrementally refine it (but if understanding the code requires more working memory then you have, rewriting can be easier which is one of the reasons I try to avoid generic code that doesn’t seem really needed (except if it gives a huge benefit without costing a lot)).

wrong approach (in my opinion)

I also think that starting with generic is the wrong way around, start with the specific and only after you have seen multiple repeating uses of your specific code, then you start pulling out the repeating things in such a way that they disappear or become simpler.

When you start with creating generic things you end up abstracting over things that don’t matter and didn’t need to be abstracted away (or at the wrong place or level). I also think that this post is relevant Semantic Compression, because when you write code bottom up you basically do what Casey calls semantic compression, when you over engineer generic solutions top-down you don’t end up seeing the concrete specific scenarios before you come up with your abstractions.

So come up from the bottom to the high level with your abstractions, instead of trying to descend down from the top trying to impose your ideal onto the low level. You should spend 90% of your time dealing with the low level and concrete and then find better and better ways to combine and build those up to become more highlevel compositions. If you spend 90% on the high level you are in javascript land where other people need to write clever jit-compilers to untangle those high level fantasies and re-structure them into something that can be run half way decent.

I think both forms of thinking are needed, but the high level thinking seems to be over hyped and over-published in the form of blog posts etc. and I think people should spend more time working with the lowlevel legos and actually stacking those together. (Instead of (sometimes) demanding that languages become more mud like so that they can avoid learning how to stack legos)

You then can think about the top/high-level patterns once you are done creating the lego city and can spot the patterns, but ideally once you write your blog post about it, you also make it clear that you built it up brick by brick from the bottom up, instead of only dreaming it up top-down without ever placing a brick. My sentiment here is: the bricks are important, so don’t spend so much thinking time floating in the clouds.

6 Likes

Really good explanation! and good call out to the post by Casey.

I agree that for the language, my generic approach is a “muddy” way vs being more explicit. Your wording has made it click for me that yes I am trying to work top-down rather than bottom-up.

I think both forms of thinking are needed, but the high level thinking seems to be over hyped and over-published in the form of blog posts etc. and I think people should spend more time working with the lowlevel legos and actually stacking those together.

I’d like to try angle this from another perspective (though I agree online can certainly have annoyingly over hyped discourse).

I would say that the vast majority, if not everyone, starts learning to code through some high level thinking. The resources to get lower level understanding can sometimes be brutal to get through without the right guidance. To tie my experience, when my artist friend asks me to help make a small game as a birthday present, I want them to think about the fun creative bits and not so much on the lower level parts (which is where my intention for more broad access to managed containers came from).

I actually do mostly think top-down but then do the implementation in some kind of mixed, mostly bottom-up way.

The main thing I always have in the back of mind is: What do I actually want/need to do? And then I drill down by asking what I need for that.

When I then arrive at a point where I have some more concrete mostly self-contained thing I need (and I haven’t already written) I write that first. That could be something more complex like a key-value store, parts like a concurrent hashmap, or a constraint solver, or even just a single function to calculate something. The key for me is this fractal/tree-like thinking where I traverse top-down, to figure out what I need, and then bottom-up to build the parts.

Crucially this allows me to think about and tinker with the API for the thing before actually
implementing it, which helps reducing scope and, in my experience, often leads to easier to use code. Because I am coming from the top down, this naturally results in quite minimal, specific interfaces.

But this relationship is not only top-down. When I arrive at some part and see, that I can’t do it the way I thought of or it’s inefficient or whatever, I need to reconsider the levels above and maybe adjust them.

1 Like

I’d never use this, but I’ve been nerd-sniped into quickly implementing a version of this now, with IMO some nicer compile errors, and that handles the allocator being in any position, meaning it should work with any container:

test "managed example" {
    const gpa = std.testing.allocator;

    // Just pass the unmanaged container directly as an argument, even if it means
    // duplicating the gpa argument.
    // Most of the time, you'll probably want to use a decl-literal anyway
    // For static calls, users can just go through the Inner type directly
    var managed: Managed(std.ArrayList(u8)) = .init(gpa, try .initCapacity(gpa, 10));
    defer managed.call(.deinit, .{});

    for (0..10) |i| {
        // calls that don't require an allocator work
        managed.call(.appendAssumeCapacity, .{@intCast(i)});
    }

    // it can work with any type
    // initialization with .decl literal:
    var hash_map: Managed(std.StringArrayHashMapUnmanaged(u8)) = .init(gpa, .empty);
    defer hash_map.call(.deinit, .{});

    try hash_map.call(.ensureUnusedCapacity, .{ 2 });

    // because we use DeclEnum and wrap ArgsTuple, compiler errors point closer to the source location and are shorter:

    // error: expected type 'u8', found '*const [9:0]u8'
    hash_map.call(.putAssumeCapacity, .{ "hello", "forty-two" });
    //                                            ^~~~~~~~~~~

    // error: enum 'meta.DeclEnum(array_hash_map.ArrayHashMapUnmanaged([]const u8,u8,array_hash_map.StringContext,true))' has no member named 'putAssumeCapacit'
    hash_map.call(.putAssumeCapacit, .{ "world", 42 });
    //            ^~~~~~~~~~~~~~~~~
}

// Implementation:

pub fn ManagedMemberArgsTuple(comptime Function: type) type {
    const fields = std.meta.fields(std.meta.ArgsTuple(Function));

    var i = 0;
    var field_types: [fields.len - 1]type = undefined;
    for (fields[1..]) |f| {
        if (f.type != Allocator) {
            field_types[i] = f.type;
            i += 1;
        }
    }
    return @Tuple(field_types[0..i]);
}

pub fn Managed(comptime T: type) type {
    return struct {
        gpa: Allocator,
        inner: T,

        pub const Inner = T;
        const Self = @This();

        pub fn init(gpa: Allocator, inner: T) Self {
            return .{ .gpa = gpa, .inner = inner };
        }

        pub fn call(
            self: *@This(),
            comptime decl: std.meta.DeclEnum(T),
            args: ManagedMemberArgsTuple(@TypeOf(@field(T, @tagName(decl)))),
        ) @typeInfo(@TypeOf(@field(T, @tagName(decl)))).@"fn".return_type.? {
            var args_tuple: std.meta.ArgsTuple(@TypeOf(@field(T, @tagName(decl)))) = undefined;
            comptime var i = 0;

            inline for (0.., &args_tuple) |j, *arg| {
                if (j == 0) {
                    arg.* = switch (@TypeOf(arg.*)) {
                        *const T, *T => &self.inner,
                        T => self.inner,
                        else => @compileError("not a member function"),
                    };
                    continue;
                }

                if (@TypeOf(arg.*) == Allocator) {
                    arg.* = self.gpa;
                } else {
                    arg.* = args[i];
                    i += 1;
                }
            }

            if (i + 1 <= args.len)
                @compileError("more than one allocator in argument list, use .inner field to call function explicity");

            return @call(.auto, @field(T, @tagName(decl)), args_tuple);
        }
    };
}

But please don’t use it.

4 Likes

Welcome to ziggit, @dotmrjosh.

What I like is the converse - knowing that a function that doesn’t take an allocator definitely won’t be doing any allocation.

I’ve thought: “well, it makes perfect sense for all the bottom-level stuff (including std) to be unmanaged, but, for my lib or app, I’m perfectly free to pair the allocator with some appropriate struct.” However, I’ve come to feel that “if I make this function call foo(), then since it doesn’t take an allocator, then even 10 levels down, where it does stuff I don’t know much about, I at least know it’s not doing allocations.” For many programming endeavors, this is not at all useful. But I’m glad there’s a language like zig where, if that pattern is followed, it can be useful in this way. There are other ways to indicate intelligence like this about what will happen “down below”, but I think this way is a perfectly fine way, and I appreciate that it at least be enforced to the surface of std; from there, I can choose to continue to appreciate it, or I can couple my allocator in a meaningful struct myself, and “simplify” that struct’s functions by not burdening higher callers with the allocator arg. It’s really quite easy to do, as demonstrated; as has been mentioned elsewhere, it hardly seems like value added to add such a simpleton to std, when you’d probably just stash your own allocator in a meaningful struct if that’s what you want to do anyway. And it would be one more thing to have to maintain. (I’m assuming some argument to still be of the type, “this would be a good thing for all to benefit from”; if, on the other hand, you’re just looking to cleverly stow an allocator in your lib or app, I think the advice is: “more power to ya”. You just might want to comment, e.g., in the struct’s init(), that this is what you’re doing, so that a user of your code realizes that they won’t have this granularity visible to them as soon as they hand off that allocator in the init().

You’re an educator; so, I think zig can be a great language to educate somebody about with all allocator guns out - learning about this quintessential of software engineering might be considered a bit of a lost art, thanks to GCs and modern trends to “focus on what matters”. For those who aren’t up for that; teach them python.

4 Likes

The converse idea I think has finally made it click! Thanks for the really clear explanation.

Finding the right place to put that boundary from unmanaged to managed doesn’t feel as uncertain now and am sure as the language gets closer to 1.x docs and guides will help support this.

I think zig can be a great language to educate somebody about with all allocator guns out - learning about this quintessential of software engineering might be considered a bit of a lost art

Certainly agree and really enjoy using zig to explain fundamental concepts. I think now more than ever we need more people understanding these core ideas vs as you say “the things that matter”.

1 Like