Allocator swapping and type consistency

Hello,

When writing Zig I like to use the following pattern, as it allows me to swap out my allocator based on optimizemode or even build script flags (which ive never done before but I think there are probably tons of reasons to do this) in a nice way:

pub fn main() !void {
    var alloc_impl = switch (builtin.mode) {
        .Debug => std.heap.DebugAllocator(.{}).init,
        else => std.heap.ArenaAllocator.init(std.heap.page_allocator),
    };
    defer _ = alloc_impl.deinit();
    const allocator = alloc_impl.allocator();

    _ = allocator;
    // ...
}

const std = @import("std");
const builtin = @import("builtin");

The Problem is, that this stop working once I want to use an allocator like smp_allocator or c_allocator. I wonder if it would be a good idea to make all allocators be inside a struct, have init, deinit and allocator functions, or alternatively provide a wrapper for the ones that dont.

What is your opinion on this? Does anyone else even do this?

The Zig 0.14.0 Release Notes suggest to use something like

var debug_allocator: std.heap.DebugAllocator(.{}) = .init;

pub fn main() !void {
    const gpa, const is_debug = gpa: {
        if (native_os == .wasi) break :gpa .{ std.heap.wasm_allocator, false };
        break :gpa switch (builtin.mode) {
            .Debug, .ReleaseSafe => .{ debug_allocator.allocator(), true },
            .ReleaseFast, .ReleaseSmall => .{ std.heap.smp_allocator, false },
        };
    };
    defer if (is_debug) {
        _ = debug_allocator.deinit();
    };
}
4 Likes

While I get that this works it does feel quite messy. Someonething I have tried is to write a thin wrapper around smp allocator myself which does work nicely. Its really just a struct eith noop methods and allocator() returning smp allocator.

Yeah I usually wrap main allocator in struct like this:

const AppAllocator = struct {
    pub const is_debug = blk: {
        if (builtin.os.tag == .wasi) break :blk false;
        break :blk switch (builtin.mode) {
            .Debug, .ReleaseSafe => true,
            .ReleaseFast, .ReleaseSmall => false,
        };
    };

    debug_allocator: if (is_debug) std.heap.DebugAllocator(.{}) else void,

    pub const init: @This() = switch (is_debug) {
        true => .{ .debug_allocator = .init },
        false => .{ .debug_allocator = {} },
    };

    pub fn allocator(self: *@This()) std.mem.Allocator {
        if (builtin.os.tag == .wasi) return std.heap.wasm_allocator;
        return switch (is_debug) {
            true => self.debug_allocator.allocator(),
            false => std.heap.smp_allocator,
        };
    }

    pub fn deinit(self: *@This()) void {
        if (is_debug) _ = self.debug_allocator.deinit();
        self.* = undefined;
    }
};
1 Like

It’s messy because it’s condensing logic,
this should look better.

pub fn main() !void {

    const is_debug = if (native_os == .wasi) false
        else switch (builtin.mode) {
            .Debug, .ReleaseSafe => true,
            .ReleaseFast, .ReleaseSmall => false,
    };

    var debug_allocator: std.heap.DebugAllocator(.{}) = .init;
    defer if (is_debug) {
        _ = debug_allocator.deinit();
    };

    const gpa = 
        if (native_os == .wasi) std.heap.wasm_allocator
        else if (is_debug) debug_allocator.allocator()
        else std.heap.smp_allocator;
}
2 Likes

There is an attempt to hide this logic add std.heap.AutoAllocator by gooncreeper · Pull Request #23432 · ziglang/zig · GitHub

4 Likes

It’s worth noting that this will always compile the DebugAllocator into your program, even if it’s not used. Only top level declarations are lazily compiled. Putting something like this at the top level of your main file may be better.

const is_debug = builtin.mode == .Debug or buildin.mode == .ReleaseSafe;
var dba: std.heap.DebugAllocator(.{}) =
    if (is_debug)
        .init
    else
        @compileError("Should not use debug allocator in release mode");

pub fn main() !void {
    defer if (is_debug) {
        _ = dba.deinit();
    };
    
    const gpa = 
        if (builtin.os.tag == .wasi) std.heap.wasm_allocator
        else if (is_debug) dba.allocator()
        else std.heap.smp_allocator;
    // ...
}

Yes that is how zigs lazy evaluation works, but unused variables do get optimised out, at least in release modes which is all we care about.

1 Like

Unused variables inside functions are not optimized out in ReleaseFast nor ReleaseSafe. Check the difference in binary size when defining the unused dba inside main vs outside with ReleaseFast.

That would completely break comptime conditionals in zig.
Both codes produce the same binary:

“cached (207260B) ~12789 lines filtered” vs “cached (195250B) ~12495 lines filtered”. Where does that difference come from?

Seeing a lot of DebugAllocator in the diff between the two exact same binaries :stuck_out_tongue:

 .Linfo_string315:
-        .asciz  "hash_map.HashMapUnmanaged(usize,heap.debug_allocator.DebugAllocator(.{ .stack_trace_frames = 0, .enable_memory_limit = false, .safety = false, .thread_safe = true, .MutexType = null, .never_unmap = false, .retain_metadata = false, .verbose_log = false, .backing_allocator_zeroes = true, .resize_stack_traces = false, .canary = 10534666094765928719, .page_size = 131072 }).LargeAlloc,hash_map.AutoContext(usize),80).Metadata"
+        .asciz  "x86_64_sysv"
 .Linfo_string316:
-        .asciz  "*hash_map.HashMapUnmanaged(usize,heap.debug_allocator.DebugAllocator(.{ .stack_trace_frames = 0, .enable_memory_limit = false, .safety = false, .thread_safe = true, .MutexType = null, .never_unmap = false, .retain_metadata = false, .verbose_log = false, .backing_allocator_zeroes = true, .resize_stack_traces = false, .canary = 10534666094765928719, .page_size = 131072 }).LargeAlloc,hash_map.AutoContext(usize),80).Metadata"
+        .asciz  "incoming_stack_alignment"
 .Linfo_string317:
-        .asciz  "size"
+        .asciz  "?u64"
 .Linfo_string318:
-        .asciz  "available"
+        .asciz  "builtin.CallingConvention.CommonOptions"
 .Linfo_string319:
-        .asciz  "hash_map.HashMapUnmanaged(usize,heap.debug_allocator.DebugAllocator(.{ .stack_trace_frames = 0, .enable_memory_limit = false, .safety = false, .thread_safe = true, .MutexType = null, .never_unmap = false, .retain_metadata = false, .verbose_log = false, .backing_allocator_zeroes = true, .resize_stack_traces = false, .canary = 10534666094765928719, .page_size = 131072 }).LargeAlloc,hash_map.AutoContext(usize),80)"
+        .asciz  "x86_64_win"
 .Linfo_string320:
-        .asciz  "hash_map.HashMapUnmanaged(usize,heap.debug_allocator.DebugAllocator(.{ .stack_trace_frames = 0, .enable_memory_limit = false, .safety = false, .thread_safe = true, .MutexType = null, .never_unmap = false, .retain_metadata = false, .verbose_log = false, .backing_allocator_zeroes = true, .resize_stack_traces = false, .canary = 10534666094765928719, .page_size = 131072 }).LargeAlloc,hash_map.AutoContext(usize),80).empty"
+        .asciz  "x86_64_regcall_v3_sysv"

Weird, there is no DebugAllocator references in my binaries. What’s your target?

~/d/p/tmp âť± nix run github:Cloudef/zig2nix#latest -- build-obj -OReleaseFast a.zig
~/d/p/tmp 5.7s âť± nix run github:Cloudef/zig2nix#latest -- build-obj -OReleaseFast b.zig

~/d/p/tmp 5.9s âť± objdump -d a.o.o | grep DebugAllocator
~/d/p/tmp | 0 1 âť± objdump -d b.o.o | grep DebugAllocator
~/d/p/tmp | 0 1 âť± du -h a.o b.o
2.1M	a.o
2.1M	b.o

The diff I posted was from the godbolt link you sent. Can you show the output of du -b a.o b.o?

2184832	a.o
2176872	b.o

Seems there’s slight difference which doesn’t really seem to make sense to me.
The code produced is the same, but the scaffolding seems to be affected?

Simpler comparision without all the std.io junk
https://godbolt.org/z/vf5P6K6nf

seems, a.zig puts this in the assembly

0000000000000000 <lel>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   be 00 10 00 00          mov    $0x1000,%esi
   9:   31 d2                   xor    %edx,%edx
   b:   5d                      pop    %rbp
   c:   e9 4f 01 00 00          jmp    160 <heap.SmpAllocator.alloc>
  11:   66 66 66 66 66 66 2e    data16 data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
  18:   0f 1f 84 00 00 00 00
  1f:   00

compared to b.zig

0000000000000000 <lel>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   be 00 10 00 00          mov    $0x1000,%esi
   9:   31 d2                   xor    %edx,%edx
   b:   5d                      pop    %rbp
   c:   eb 02                   jmp    10 <heap.SmpAllocator.alloc>
   e:   66 90                   xchg   %ax,%ax

It might be a bug, but I think it’s actually expected behavior. I was trying to solve this problem a while back (specifically I was trying to make a variable that would be a compile error to reference in a release build), and I only realized this was the solution when I heard Andrew specifically clarify that top level declarations are lazily compiled.

By declaring the debug allocator inside the function, it is eagerly compiled, which includes the DebugAllocator code, but since it is never used in the function, it doesn’t change the output of that functions code. ReleaseSmall goes through the effort of cleaning up the dead DebugAllocator code, but not other build modes.

Top-level declarations being lazily analyzed is own thing, but branches on comptime variables also trigger dead code elimination as the non eliminated branches won’t refer to any of the declarations before. It might be that zig’s dead code elimination is not yet that great.

const std = @import("std");
const builtin = @import("builtin");

export fn lel() [*]const u8 {
    const is_debug = if (builtin.target.os.tag == .wasi) false
        else switch (builtin.mode) {
            .Debug, .ReleaseSafe => true,
            .ReleaseFast, .ReleaseSmall => false,
    };

    var debug_allocator = if (is_debug) std.heap.DebugAllocator(.{}).init
        else {};
    defer if (is_debug) {
        _ = debug_allocator.deinit();
    };

    const gpa =
        if (builtin.target.os.tag == .wasi) std.heap.wasm_allocator
        else if (is_debug) debug_allocator.allocator()
        else std.heap.smp_allocator;

    return (gpa.alloc(u8, 4096) catch unreachable).ptr;
}

Here is version that produces smaller binary for example :person_shrugging:

~/d/p/tmp âť± du -b a.o b.o
73720	a.o
73752	b.o

In release builds, the type of debug_allocator is void rather than DebugAllocator, which is why the DebugAllocator code is not included. If you did this instead

    var debug_allocator: std.heap.DebugAllocator(.{}) = if (is_debug) .init
        else undefined;
    defer if (is_debug) {
        _ = debug_allocator.deinit();
    };

It would include the DebugAllocator code I bet.

Yeah, but it was more to showcase my bafflement for the bad dead code elimination I guess. Also I’d expect the void version to give equal size to your code, as the top level declarations would not be analyzed. But idk, maybe some compiler dev can come and chip in with more insight.