Stack frames in stdlib getting too deep?

Started to hit some instances where the 10-frame leak detection in std.testing.allocator isn’t deep enough for some parts of the standard library. In this case the 11th frame would have told me where the put call in question was from.

[DebugAllocator] (warn): memory address 0x77e540720000 leaked: 
.../zig/x86_64-linux-0.16.0/lib/std/mem/Allocator.zig:142:26: 0x105e3d2 in rawAlloc (std.zig)
    return a.vtable.alloc(a.ptr, len, alignment, ret_addr);
                        ^
.../zig/x86_64-linux-0.16.0/lib/std/mem/Allocator.zig:286:40: 0x108f7d5 in allocWithSizeAndAlignment__anon_9891 (std.zig)
    return self.allocBytesWithAlignment(alignment, byte_count, return_address);
                                        ^
.../zig/x86_64-linux-0.16.0/lib/std/mem/Allocator.zig:274:89: 0x108f49b in allocAdvancedWithRetAddr (std.zig)
    const ptr: [*]align(a.toByteUnits()) T = @ptrCast(try self.allocWithSizeAndAlignment(@sizeOf(T), a, n, return_address));
                                                                                        ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1477:53: 0x12441c6 in allocate (std.zig)
            const slice = try allocator.alignedAlloc(u8, max_align, total_size);
                                                    ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1434:29: 0x1242794 in grow (std.zig)
            try map.allocate(allocator, new_cap);
                            ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1295:30: 0x124258d in growIfNeeded (std.zig)
                try self.grow(allocator, capacityForSize(self.load() + new_count), ctx);
                            ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1115:34: 0x124046b in getOrPutContextAdapted__anon_44507 (std.zig)
                self.growIfNeeded(allocator, 1, ctx) catch |err| {
                                ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1100:56: 0x12400f5 in getOrPutContext (std.zig)
            const gop = try self.getOrPutContextAdapted(allocator, key, ctx, ctx);
                                                        ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1026:52: 0x123ff3a in putContext (std.zig)
            const result = try self.getOrPutContext(allocator, key, ctx);
                                                    ^
.../zig/x86_64-linux-0.16.0/lib/std/hash_map.zig:1023:35: 0x123fdaf in put (std.zig)
            return self.putContext(allocator, key, value, undefined);
                                    ^
(additional stack frames may have been skipped...)

This seems like it’s getting a little more common these days, though that may be subjective. Is there a mechanism to temporarily up the stack limit?

It’s less of a problem but more of everyday life for a programmer – error traces are never too beautiful…

But I do think std lib can mark pure-forwarding functions as inline. It’s good for reading the intention – this is just a forwarding function.

1 Like

Though it makes other things harder. inline isn’t purely about semantics. It has a real effect on the performance and sometimes even the correctness of a program.

they are refering to functions that are similar to:

fn foo (a: u32) u32 {
   return bar(a, 5);
}

inlining such a function will almost certainly be faster, and likely have little affect on binary size.

std already inlines some such functions specifically to reduce stack/error traces!

fn wasm_freestanding_start() callconv(.c) void {
    // This is marked inline because for some reason LLVM in
    // release mode fails to inline it, and we want fewer call frames in stack traces.
    _ = @call(.always_inline, callMain, .{ {}, std.process.Environ.Block.global });
}

Though that example inlines at the call site, not function definition.

On a single function call scope definitely. On a whole project bases I’m not convinced. There many code paths converge on different points in this “pipeline” to the thing, that actually does something. If you force inline all of them, the compiler has to follow that, which will likely make the tree wider than before increasing both cache pressure and binary size.

Doing this just to save some screen and head space for some, undoubtedly often unnecessary call stacks, seems wrong to me. A better solution for me would be to be able to setup filters or “just” use a tool afterwards, that filters. Similarly to how when using debuggers one often tells it to skip frames of libraries(especially std).