Stack frames in stdlib getting too deep?

pasta · June 19, 2026, 11:12pm

Started to hit some instances where the 10-frame leak detection in std.testing.allocator isn’t deep enough for some parts of the standard library. In this case the 11th frame would have told me where the put call in question was from.

[DebugAllocator] (warn): memory address 0x77e540720000 leaked: 
.../zig/x86_64-linux-0.16.0/lib/std/mem/Allocator.zig:142:26: 0x105e3d2 in rawAlloc (std.zig)
    return a.vtable.alloc(a.ptr, len, alignment, ret_addr);
                        ^
.../zig/x86_64-linux-0.16.0/lib/std/mem/Allocator.zig:286:40: 0x108f7d5 in allocWithSizeAndAlignment__anon_9891 (std.zig)
    return self.allocBytesWithAlignment(alignment, byte_count, return_address);
                                        ^
.../zig/x86_64-linux-0.16.0/lib/std/mem/Allocator.zig:274:89: 0x108f49b in allocAdvancedWithRetAddr (std.zig)
    const ptr: [*]align(a.toByteUnits()) T = @ptrCast(try self.allocWithSizeAndAlignment(@sizeOf(T), a, n, return_address));
                                                                                        ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1477:53: 0x12441c6 in allocate (std.zig)
            const slice = try allocator.alignedAlloc(u8, max_align, total_size);
                                                    ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1434:29: 0x1242794 in grow (std.zig)
            try map.allocate(allocator, new_cap);
                            ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1295:30: 0x124258d in growIfNeeded (std.zig)
                try self.grow(allocator, capacityForSize(self.load() + new_count), ctx);
                            ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1115:34: 0x124046b in getOrPutContextAdapted__anon_44507 (std.zig)
                self.growIfNeeded(allocator, 1, ctx) catch |err| {
                                ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1100:56: 0x12400f5 in getOrPutContext (std.zig)
            const gop = try self.getOrPutContextAdapted(allocator, key, ctx, ctx);
                                                        ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1026:52: 0x123ff3a in putContext (std.zig)
            const result = try self.getOrPutContext(allocator, key, ctx);
                                                    ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1023:35: 0x123fdaf in put (std.zig)
            return self.putContext(allocator, key, value, undefined);
                                    ^
(additional stack frames may have been skipped...)

This seems like it’s getting a little more common these days, though that may be subjective. Is there a mechanism to temporarily up the stack limit?

VoilaNeighbor · June 20, 2026, 3:49am

It’s less of a problem but more of everyday life for a programmer – error traces are never too beautiful…

But I do think std lib can mark pure-forwarding functions as inline. It’s good for reading the intention – this is just a forwarding function.

pzittlau · June 20, 2026, 4:00am

Though it makes other things harder. inline isn’t purely about semantics. It has a real effect on the performance and sometimes even the correctness of a program.

vulpesx · June 20, 2026, 4:26am

they are refering to functions that are similar to:

fn foo (a: u32) u32 {
   return bar(a, 5);
}

inlining such a function will almost certainly be faster, and likely have little affect on binary size.

std already inlines some such functions specifically to reduce stack/error traces!

fn wasm_freestanding_start() callconv(.c) void {
    // This is marked inline because for some reason LLVM in
    // release mode fails to inline it, and we want fewer call frames in stack traces.
    _ = @call(.always_inline, callMain, .{ {}, std.process.Environ.Block.global });
}

Though that example inlines at the call site, not function definition.

pzittlau · June 20, 2026, 4:46am

On a single function call scope definitely. On a whole project bases I’m not convinced. There many code paths converge on different points in this “pipeline” to the thing, that actually does something. If you force inline all of them, the compiler has to follow that, which will likely make the tree wider than before increasing both cache pressure and binary size.

Doing this just to save some screen and head space for some, undoubtedly often unnecessary call stacks, seems wrong to me. A better solution for me would be to be able to setup filters or “just” use a tool afterwards, that filters. Similarly to how when using debuggers one often tells it to skip frames of libraries(especially std).

floooh · June 20, 2026, 7:12am

I’ve had a feeling for a while that the stdlib is starting to suffer from a serious case of C++ -ism (e.g. a bit too much DRY, for instance look at the callchain of putContext => getOrPutContext => getOrPutContextAdapted, and then a similar chain of calls just for the grow-allocation, that looks way too granular to me…, and IME code like this also is the main reason when debug performance is a lot worse than optimized performance).

floooh · June 20, 2026, 7:20am

Those deep call chains of very simple functions will almost certainly be inlined and ‘collapsed’ into a single resulting call anyway in optimized mode, while in debug mode they just hurt performance (it’s fine when debug mode is 2x slower than optimized, but beyond that it can be problematic).

Personally I wouldn’t use inline either, but instead try to structure the code in a way that such deep call chains of slightly-more-specialized functions don’t happen. IME this often happens when there’s no strict separation of ‘public interface functions’ and ‘internal implementation functions’. E.g. the implementation of public functions should almost never call other public functions.

Sze · June 20, 2026, 11:29am

I wonder whether there should be a parameter similar to -freference-trace[=num] but for the number of default debug allocator stack frames.

VoilaNeighbor · June 21, 2026, 8:57am

That is a problem but maybe on the other end of the pole – too “less” C++.

Without implicit metaprogramming, we have to make things explicit including the dull forwarding code.

pasta · June 21, 2026, 3:54pm

FWIW one thing that confused me here was the return_address - I had thought that address was the one being used for the “bottom” of the allocation stack. Is that not what’s intended? e.g. shouldn’t it be saying that the allocation at hash_map.zig:1477 leaked, not the call down at Allocator.zig:142?

gooncreeper · June 21, 2026, 8:30pm

You can create your own DebugAllocator instance and specify the stack trace length with DebugAllocatorConfig.stack_trace_frames:

test {
    var debug_gpa: std.heap.DebugAllocator(.{ .stack_trace_frames = 16 }) = .init;
    defer _ = debug_gpa.deinit();
    const gpa = debug_gpa.allocator();

    ...
}