Accept allocator or slices in my API?

kj4tmp · November 30, 2024, 8:51pm

Currently, my library has zero allocations, and an API that looks like this right now:

pub fn init(
    port: *Port,
    settings: Settings,
    subdevices: []SubDevice,
    process_image: []u8,
    frames: []telegram.EtherCATFrame,
) !MainDevice {
    if (frameCount(@intCast(process_image.len)) < frames.len) return error.NotEnoughFrames;

    assert(frameCount(@intCast(process_image.len)) <= frames.len);
    return MainDevice{
        .port = port,
        .settings = settings,
        .subdevices = subdevices,
        .process_image = process_image,
        .frames = frames,
    };
}

This is the most important API of the library, and will be called by all (non-existent) users. It provides the main “object” of the library, and this is where all the variable-length mutable memory that the library will use is provided, (the slices).

This means a typical use case looks like this:

const std = @import("std");
const gcat = @import("gatorcat");

// large const network configuration struct
const eni = @import("network_config.zig").eni;

pub fn main() !void {
    var raw_socket = try gcat.nic.RawSocket.init("enx00e04c68191a");
    defer raw_socket.deinit();
    var port = gcat.Port.init(raw_socket.networkAdapter(), .{});
    try port.ping(10000);

    // Since the ENI is known at comptime for this example,
    // we can construct exact stack usage here.
    var subdevices: [eni.subdevices.len]gcat.SubDevice = undefined;
    var process_image = std.mem.zeroes([eni.processImageSize()]u8);
    const used_subdevices = try gcat.initSubdevicesFromENI(eni, &subdevices, &process_image);
    assert(used_subdevices.len == subdevices.len);
    var frames: [gcat.MainDevice.frameCount(@intCast(process_image.len))]gcat.telegram.EtherCATFrame = @splat(gcat.telegram.EtherCATFrame.empty);

    var main_device = try gcat.MainDevice.init(
        &port,
        .{ .recv_timeout_us = 4000, .eeprom_timeout_us = 10_000 },
        used_subdevices,
        &process_image,
        &frames,
    );

    try main_device.busINIT(5_000_000);
    try main_device.busPREOP(10_000_000);
    try main_device.busSAFEOP(10_000_000);
    try main_device.busOP(10_000_000);

}

Notice that all the slices are being initialized from a comptime known eni (“ethercat network information”) which is a large const struct containing a variable length of configurations. This means the slices are highly related, and need to match up with each other. The true “souce of truth” here is the eni.

An alternative approach would be to accept the eni and an allocator for my API:

pub fn init(
    port: *Port,
    settings: Settings,
    eni: ENI,
    allocator: std.mem.Allocator,
) !MainDevice {
   // ... do all the initialization and allocation of the 
   // slices in here so my users don't have to care or
   // see them
}

What would you do? The slices or the allocator?

I see the following factors:

Slices has a pro that users can more easily see and more easily reason about how much memory they are using, which is important for embedded folks. Most applications will only use the stack and not heap allocate for maximum real-time performance.
For the allocator option, users who really care about memory usage (embedded) will have to use a fixed buffer allocator or similar and “guess” how much memory their implementation will require. I could perhaps provide an API that could calculate the memory usage from the eni to assist this?
The allocator option could reduce my public api surface dramatically and reduce the potential for bugs from users initializing these slices incorrectly.
The library has added benefits for comptime known eni (the example shows minimal stack memory usage), but I do not want to require comptime known eni in general, even though must users will have comptime known eni, because I want to retain the ability to dynamically scan networks and construct an eni at runtime for edge-case advanced users.

squeek502 · November 30, 2024, 10:50pm

In the standard library, it’s fairly common to provide both. One example:

Many more examples in std.unicode.

EDIT: Sorry, somehow missed that you were asking about an init function. Making a data structure optionally allocate is not as straightfoward as the examples above, so feel free to ignore this post. I’m not sure there’s a ‘correct’ way to approach the dilemma in the OP.

kj4tmp · December 1, 2024, 2:28am

I think I can provide both ways, since I only need to allocate once (during my init), but I would need to document well and provide 2 different deinit functions, one that calls free and one that does not.

I think I will try implementing the allocator way and see if that bites me.

mnemnion · December 1, 2024, 3:00am

It looks like you’re making arrays/slices using comptime-known values from the ENI file, right?

That suggests to me that you might want a function which takes an ENI, specializes the fields based on that data, and then returns it as a type. That can be allocated to the heap with allocator.create, or just put on the stack.

If the struct doesn’t need to allocate and free while using it, there’s no reason to include an allocator as a field on it. If the sizes are only runtime-known then that’s a different kettle of fish.

kj4tmp · December 1, 2024, 3:03am

I came up with the following implementation for the allocator case:

Library code:

pub fn init(
    allocator: std.mem.Allocator,
    port: *Port,
    settings: Settings,
    eni: ENI,
) !MainDevice {
    const process_image = try allocator.alloc(u8, eni.processImageSize());
    errdefer allocator.free(process_image);
    @memset(process_image, 0);

    const frames = try allocator.alloc(telegram.EtherCATFrame, frameCount(eni.processImageSize()));
    errdefer allocator.free(frames);
    assert(frameCount(eni.processImageSize()) <= frames.len);
    @memset(frames, telegram.EtherCATFrame.empty);

    const subdevices = try allocator.alloc(SubDevice, eni.subdevices.len);
    errdefer allocator.free(subdevices);
    const initialized_subdevices = gcat.initSubdevicesFromENI(eni, subdevices, process_image) catch |err| switch (err) {
        error.NotEnoughSubdevices => unreachable,
        error.ProcessImageTooSmall => unreachable,
    };
    assert(subdevices.len == initialized_subdevices.len);

    return MainDevice{
        .port = port,
        .settings = settings,
        .subdevices = initialized_subdevices,
        .process_image = process_image,
        .frames = frames,
    };
}

pub fn deinit(self: *MainDevice, allocator: std.mem.Allocator) void {
    allocator.free(self.process_image);
    allocator.free(self.frames);
    allocator.free(self.subdevices);
}

/// returns minimum required size of allocated memory from the ENI
pub fn estimateAllocSize(eni: ENI) usize {
    return @sizeOf(u8) * eni.processImageSize() +
        @sizeOf(telegram.EtherCATFrame) * frameCount(eni.processImageSize()) +
        @sizeOf(SubDevice) * eni.subdevices.len;
}

User code:


const std = @import("std");
const assert = std.debug.assert;

const gcat = @import("gatorcat");

const eni = @import("network_config.zig").eni;

pub const std_options: std.Options = .{
    .log_level = .info,
};

pub fn main() !void {
    var raw_socket = try gcat.nic.RawSocket.init("enx00e04c68191a");
    defer raw_socket.deinit();
    var port = gcat.Port.init(raw_socket.networkAdapter(), .{});
    try port.ping(10000);

    const estimated_stack_usage = comptime gcat.MainDevice.estimateAllocSize(eni) + 20000;
    var stack_memory: [estimated_stack_usage]u8 = undefined;
    var stack_fba = std.heap.FixedBufferAllocator.init(&stack_memory);
    var stack_gpa = std.heap.GeneralPurposeAllocator(.{}){ .backing_allocator = stack_fba.allocator() };
    defer _ = stack_gpa.deinit();

    var md = try gcat.MainDevice.init(
        stack_gpa.allocator(),
        &port,
        .{ .recv_timeout_us = 4000, .eeprom_timeout_us = 10_000 },
        eni,
    );
    defer md.deinit(stack_gpa.allocator());

    try md.busINIT(5_000_000);
    try md.busPREOP(10_000_000);
    try md.busSAFEOP(10_000_000);
    try md.busOP(10_000_000);

The general purpose allocator seems to add about 15 kB of overhead to 2.5 kB of allocation but user can choose a different allocator if they care I guess.

kj4tmp · December 1, 2024, 3:20am

just using the fixed buffer allocator without the GPA seems to only have 8 B of overhead, which is cool. Wonder where that comes from. Probably alignment

mnemnion · December 1, 2024, 4:24pm

It’s still not clear to me whether or not the ENI data represents comptime-known values, including values derived from comptime-known values.

If the answer is yes, an approach like this is most suitable:

fn MainDevice(eni: ENI) type  {
    return struct {
        process_image: [eni.processImageSize()]u8,
        subdevices: [eni.subdevices.len]SubDevice,
        frames: [frameCount(eni.processImageSize())]telegram.EtherCATFrame,
        port: *Port,

        // Member functions etc. go here
    };
}

User code can then simply use @sizeOf to determine how much memory the struct takes, and allocate it wherever it pleases.

My hunch is that the ENI numbers are configurable at comptime, since they appear to represent aspects of a specific piece of hardware which won’t change at runtime. This won’t give you value-agnostic code which can read a configuration at runtime out of a data file, however. If you wanted to get really fancy, you could pass a comptime flag which would construct the various fields as having slice types instead of array types, and get the best of both worlds.

I’m fairly confident embedded use is going to have a firm compile-time grasp on the necessary values, so the ability to specialize on the actual device would be pretty useful there.

kj4tmp · December 3, 2024, 5:15am

You are right that it is almost always known at comptime known, but:

It would be nice if I could provide both implementations (your type returning version and the allocator version), but I’m not sure how I could do it easily.

Maybe this is a real use case for Documentation - The Zig Programming Language @inComptime?

mnemnion · December 3, 2024, 5:48pm

I don’t think @inComptime will be useful here, because a function returning a type is always in comptime. A function with a type parameter doesn’t have to be, as long as the parameter itself is comptime-known (which it must be, being a type and all).

You’d want something like this:

pub const EniReadKind = enum(u1) {
    static,
    dynamic,
};

// Note that for .dynamic a dummy eni must be provided,
// There are other ways to handle that but it's the
// simplest. 

pub fn MainDevice(comptime kind: EniReadKind, eni: ENI) type {
    const static = kind == .static;

    return struct {
        process_image: if (static) [eni.processImageSize()]u8 else []u8,
        subdevices: if (static) [eni.subdevices.len]SubDevice else []SubDevice,
        frames: if (static) [frameCount(eni.processImageSize())]telegram.EtherCATFrame else []telegram.EtherCatFrame,
        port: *Port,
        // This is an option if you need an Allocator,
        // although my sense is that all allocation is
        // initialization here, so maybe not
        allocator: if (static) std.mem.Allocator else void,
        // This would catch any use of the allocator field in
        // the static context, and it's a zero-size field so
        // free at the point of use.

        // Member functions etc. go here
    };
}

There are some unsolved problems here, like how to provide type-specific initialization in an ergonomic way, but the general outline is fairly practical. Arrays and structs both have a .len which may be accessed, they index the same way, so the actual in-use logic of the type should Just Work for both cases, and if I’m correct that memory only needs to be allocated when created and destroyed, then those are the only member functions which have to take the value of the kind parameter into account.