The Clang C++ compiler now has a C++ extension which adds lifetime safety analysis and lifetime annotations to the language which can automatically detect in compile time if there are any temporal safety bugs, e.g. dangling pointers.
By compiling C++ code with Clang with the flag -Werror=lifetime-safety-all, all these temporal safety bugs can be made into compile-time errors and lifetime annotations enforced.
I wonder if Zig should have something similar to the Clang C++ compiler in order to prevent temporal safety bugs.
there are already some movements in this direction! the below pull request was merged before the move to Codeberg, so is present in either 0.15.2 (likely but idk) or certainly in 0.16.0
i donât currently have enough interest to read the Clang documentation, so youâll have to forgive me for not knowing what bugs it can catch, but since youâre interested, iâd encourage you to have some clicks through the PR above and the linked issues to get a sense of the community and the core teamâs thoughts
One thing that Zig beginners struggle with - particularly those unfamiliar with manual memory management - is returning pointers to local variables from functions.
This is challenging to address, because it is legal to return an invalid pointer:
fn foo() *i32 {
return undefined;
}
This is a perfectly valid function - the illegal operation only occurs if the returned pointer is dereferenced. Even then, itâs legal to have a function that unconditionally invokes illegal behavior:
fn bar() noreturn {
unreachable; // equivalent to foo().*
}
Given this function, the expression bar() is equivalent to the expression unreachable.
So how then, can we make it a compile error to return an invalid pointer from a function? Syntactic pedantry. We forbid all expressions that trivially (i.e. without type checking) lower to return undefined with the justification that the expression should instead be written canonically as return undefined.
Thus the following compile error was born:
fn foo() *i32 {
var x: i32 = 1234;
return &x;
}
test.zig:3:13: error: returning address of expired local variable 'x'
return &x;
^
test.zig:2:9: note: declared runtime-known here
var x: i32 = 1234;
^
Itâs not just function returns in Clang. Itâs any scoped variable, as well as with freeing memory on the heap. As far as I recall, Zig currently doesnât have any similar compile error messages for those cases.
Iâve been exploring this space the last weekends.
You can do similar analysis directly on Zig AIR and it catches some trivial bugs, like detecting when the returned value contains a pointer to a stack or arguments.
Use after free is harder in Zig because free is just a regular function unlike C++ where delete is actually special. But maybe detecting x.* = undefined would already help a lot.
Youâll notice this Clang extension leverages custom annotation where the user explain to the compiler who is borrowing what.
Good point about free being a function in Zig instead of a keyword as in C++.
C++ actually has a similar problem; if you use the malloc and free functions from the cstdlib library to allocate and deallocate memory instead of the new and delete keywords, the Clang C++ compiler will not be able to detect use after free.
I wonder if lifetime / temporal safety in Clang C++, Rust, and Mojo is possible only because the languages have RAII and/or destructors. Neither Zig and C have RAII or destructors.
Edit: The Cake C compiler has lifetime safety for its dialect of C with extensions, but it also has destructors through the _Dtor annotation:
Also both Clang and GCC had a âtrivialâ check for returning a pointer to stack data for a long time, independent from the lifetime-safety-all stuff (and itâs even in the default warning set):
No: destructors are a tool to prevent memory leaks, not a tool to ensure all accessed memory is live. And leaks are safe: can cause DoS, but not RCE or arbitrary memory disclosure.
Dtors make memory safety âharderâ, the are the opposite of garbage collection. In Ada, freeing memory is unsafe for that reason.
âŚalso Clangâs ARC feature (Objective-C Automatic Reference Counting (ARC) â Clang 23.0.0git documentation) has surprisingly complete lifetime and ownership tracking in Objective-C for proper ARC support in mixed-language projects (ObjC and plain C). For instance you can place an Objective-C object reference (âidâ) in a C struct, and ARC will âjust workâ (in older Clang versions this only worked in ObjC++ because it required support from RAII).
AFAIK this compile-time lifetime and ownership tracking is mainly used to remove redundant retain/release calls, but this is also quite conservative. To really compete with manual lifetime management, ARC also needs a lot of manually placed hints).
One thing I also sometimes footgun myself with is that in Zig itâs not clear what happen when you pass a []const u8 to a function. Is the function allowed to copy the ptr and read later into it ? or as soon as it exit itâs fine ?
I hacked together a prototype where Zig would only allow copying pointer with addrspace(.global) so that a function declaration would need to declare if it allows itself to store the pointer or not.
It was quite disrupting because it meant that the caller would need to opt-in to have a ptr argument being copyable.
I then modified the allocator API to always return addrspace(.global) pointers (because itâs generally ok to keep a reference to a heap allocated struct)
but I realized it basically just implemented some basic stack capture detection, but it didnât detect bugs related to arenas.
Anyway I think there is some graph coloring algorithm that could statically detect which objects belong to which arena/allocator, and enforce there are no pointers in the wrong direction. But I need to do more work.
The big challenge here is that people arenât restricted to the std library allocator interface; they can write their own custom allocator interface and allocators, how does one determine which structure in the program is an allocator interface or allocator?
My personal rule of thumb is that a âreference parameterâ (e.g. a slice or pointer) is always only âborrowedâ for the duration of the function call. If the called system needs to hold on to the referenced data beyond the function âlifetimeâ it must make a copy of the referenced data instead of storing the reference. That simple rule alone basically eliminates 99% of potential memory corruption issues
For the std library allocators, I think one thing that one can do is to redo the API for the allocators; create a new struct in the std called âpointer to allocated objectâ or something like that, that consists of a pointer to the memory as well as a nullable pointer to the allocator. When an object gets allocated via an allocator it returns this struct rather than a raw pointer. When an object gets freed by the allocator, the allocator itself in the pointer struct gets set to null, and then one can check if the allocator is not null before dereferencing the pointer.
If youâre worried about âuse-after-freeâ, I think that generation-counted index handles are really the best solution for memory-unsafe languages (Handles are the better pointers).
I have a deep aversion against randomly allocating unique objects on the heap, because this is the root reason for all memory-safety- and memory-management-related performance problems
This idea probably requires encapsulation / private fields in structs in the Zig language to actually be safe, as otherwise one can just access the raw pointer field in the struct and still do use-after-free.