Type resolution redesign, with language changes to taste
Author: Matthew Lugg
(as always, happy to answer any questions anyone has about this, whether it’s about the language changes or the compiler internals!)
ooh! I have some questions:
How could MultiArrayList.bytes’ alignment result in a dependency loop that wouldn’t already be a loop in another aspect?
How does accessing a field (with a runtime allowed type) of a comptime only type at runtime semantically work?
Will we be able to switch on packed structs soon ![]()
How could
MultiArrayList.bytes’ alignment result in a dependency loop that wouldn’t already be a loop in another aspect?
Zig’s comptime semantics make type resolution harder to do. Consider the definition const A = struct { ptr: *align(@sizeOf(A)) u32 };. Many programming languages would be able to look at (their equivalent of) this code and resolve that (assuming pointers are 8 bytes) A has a size of 8 bytes and ptr therefore has type *align(8) u32. This is because these language can determine that ptr is a pointer without evaluating that @sizeOf(A) expression. Unfortunately, this doesn’t work in Zig, because types don’t have a fixed grammar: comptime evaluation means that they are specified as arbitrary expressions which can do basically anything. As a result, the only reasonable choice Zig can make is to evaluate the type expression in full. So in this instance, to determine the type of the fields of A, the compiler needs to know how big A is, which triggers a dependency loop. This is pretty much exactly the case that MultiArrayList finds itself in (sometimes).
The question then becomes why this used to work. In short, this worked because the compiler had accumulated various hacks and special cases to make this example (and many others like it) work. Unfortunately, these special cases came with unacceptable downsides. For instance, they made the compiler behave differently depending on the order in which code was semantically analyzed, which is something Zig cannot allow since it completely breaks incremental compilation (and some other stuff). These special cases could also lead to really confusing compile errors, were a frequent source of compiler crashes, and all in all were kind of fundamentally broken. Sacrificing the abilities they gave us is obviously unfortunate, but in this case the loss is small enough that it definitely looks to be worth it for all of the positive effects.
How does accessing a field (with a runtime allowed type) of a comptime only type at runtime semantically work?
Let’s say you have the type const Foo = struct { a: comptime_int, b: u32 };, and a global constant const val: Foo = .{ .a = 123, .b = 456 };. Even though Foo is a comptime-only type, the Zig compiler will actually emit the value val into the final binary anyway! It does so by pretend that “primitive” comptime-only types—in this case comptime_int—are zero-width types (like void), and just lowering everything else. So, the compiler will lower val into constant memory as just a single u32 with value 456. Then, when you do &ptr.b on a *const Foo to get a *const u32, the compiler continues to pretend that comptime_int is zero-width, and gives back the address of the field b under that assumption (today that’d be the same address as ptr itself, although of course Zig doesn’t guarantee that since Foo is a normal struct). At that point, you have a valid pointer to a u32, and can just load from it as usual!
This “pretend comptime-only primitives are void" trick is all there really is to it: it lets you nest comptime-only types, have slices of them, etc, all with just one rule. We actually already had all of this logic in the compiler—it’s necessary even without the changes I made here, because if you take &val.b at comptime, that needs to be a valid pointer at runtime!—so this wasn’t so much an intentional choice as just something which naturally popped out when I changed the compiler so that it considered *const Foo a runtime type.
Oh, and the final piece of the puzzle is that x.y means (&x.y).*; that is, it first takes a pointer, and then dereferences it. So if you have ptr: *const Foo as above, then you can just do ptr.b at runtime (no &) and get a u32 out.
Just saw this on HN:
Same as above, change
.{}to.emptyin a few places, due to removal of deprecated defaults
I didn’t see this mentioned in the devlog post, but this just gave me a little heart attack. Please let this deprecation only be about stdlib types, but not the language feature to define default values in the struct declaration (this would be really bad for sokol-zig, since the entire API design of the sokol libs depends on that feature).
Is there a list somewhere about upcoming and recent deprecations? It’s a bit hard to keep up ![]()
(PS: ok, the sokol samples are still building with 0.16.0-dev.2821+3edaef9e0, so I guess all good - it took a little while until the index.json file on the download page was updated and zvm could pick up the latest version, so I couldn’t check earlier).
Please let this deprecation only be about stdlib types
Oh, yes, to be clear, I’m just talking about the default values for std.ArrayList. There are not any plans to remove struct field default values from Zig, don’t worry!—but we are removing a lot of uses of them from the standard library because they have been heavily overused in the past (see this part of the langref for our stance on when default field values are appropriate). The default field values on all of the containers in the standard library (ArrayList, HashMap, ArrayHashMap; in theory others too, but they might not actually have alternatives yet!) are considered deprecated and will be removed at some point.
Aside from doc comments (the doc comment on ArrayList has mentioned for the past year or so that default-initialization of it is deprecated), you can usually learn about deprecations from the release notes. This particular instance was mentioned in the 0.14.0 release notes here. To be fair though, the word “deprecated” only appears at the end of that section which is really discussing a language feature, so it wasn’t as obvious as it should have been—I ought to have put it in the “List of Deprecations” instead. Sorry about that, I’ll take more care in the future to make deprecations obvious!
Not to derail the thread too much but I would love to see some syntax in the future which allows to combine the = .init convention with being able to override specific field values in a single assignment. The whole area of struct initialization could need some love tbh. Init-blocks as workaround are sometimes fine, but sometimes also a bit of a hassle.
syntax like functions?
Also you don’t have to set default values for every field
syntax like functions?
…shitty workaround ![]()
(to elaborate: this approach smells to much like C++ to me)
Also you don’t have to set default values for every field
In the sokol libs that’s needed because most functions have a single ‘option-bag-struct’ parameter with ‘sane defaults’, like this (the args passed into sg.makePipeline has many more options, but those are initialized to ‘sane defaults’ and only the values that differ from the default are provided):
state.dbg.pip = sg.makePipeline(.{
.shader = sg.makeShader(shd.dbgShaderDesc(sg.queryBackend())),
.layout = init: {
var l = sg.VertexLayoutState{};
l.attrs[shd.ATTR_dbg_pos].format = .FLOAT2;
break :init l;
},
.primitive_type = .TRIANGLE_STRIP,
});
…in reality the whole thing is even more ‘interesting’ because the sokol libs are implemented in C and are designed for C99 designated init (where “zero means default”), e.g. the above looks like this as C code:
state.dbg.pip = sg_make_pipeline(&(sg_pipeline_desc){
.shader = sg_make_shader(dbg_shader_desc(sg_query_backend())),
.layout = {
.attrs[ATTR_dbg_pos].format=SG_VERTEXFORMAT_FLOAT2
},
.primitive_type = SG_PRIMITIVETYPE_TRIANGLE_STRIP,
});
…some default values actually depend on other values, so patching in the default values actually happens inside the sg_make_pipeline call (by assuming that all zero-initialized fields are defaults). This is carried over into the Zig bindings by declaring zero as default for all struct fields (the idiomatic way would be that all struct fields are optionals and initialized to null, but apart from the size overhead this would incur those structs need to be C layout-compatible with the C structs anyway
)…
PS: the above code snippets also demonstrate nicely why Zig init-blocks can result in more code than the same thing via C99 designated init => ‘idiomatic Zig’ would probably be to replace the nested array with a slice, but then you have a piece of data ‘dangling off’ the main struct which requires dealing with ownership and lifetime details of the data referenced by the slice. E.g. also not a great solution (at least not a great general solution).
I’d have a peek at how vulkan-zig does this, in case you’re looking for suggestions / feedback? The bag-o-options struct is extremely common there, and the usage from Zig appears to be a little bit more ergonomic to me than this. (In particular, I can always just type &.{ .{ // yadda yadda } } for the stuff that is stored off of the main options struct, and I’ve never needed an init block.)
Something like this perhaps?
// In your struct
pub fn initFields(fields: anytype) YourStruct {
var the_struct: YourStruct = .init;
inline for (std.meta.fields(@TypeOf(fields)) |f| {
// Could provide a nicer error here?
@field(the_struct, f.name) = @field(fields, f.name);
}
return the_struct;
}
// then
var with_fields: YourStruct = .initFields(.{
.override = 42,
});
You could write a generating function for these functions with the signature (T: type, decl_literal: @TypeOf(.enum_literal) fn(anytype) T. Only have to do it once, then it’s just
pub const initFields = overrideFn(YourStruct, .init);
Just a thought. ¯\_(ツ)_/¯
The Vulkan C API uses pointers to structs instead of nested structs, for the simple initialization case this isn’t much of a problem, but I prefer nested structs because then the whole thing can be trivially copied by value without having to deal with the refererenced-by-pointer-structs.
Granted for the specific case of passing an options-bag struct into a function this usually isn’t a problem, because the top-level input arg struct usually isn’t copied to a place that would outlive the ‘dangling structs’.
pub fn initFields(fields: anytype) YourStruct …
Meh ![]()
Maybe it’s just me, but I prefer syntax sugar over helper functions for such things (or let’s better call it ‘syntax salt’, because just a little bit is enough heh). With helper functions (no matter how clever they use comptime magic) you need things like naming conventions and ‘best practices’, while syntax is usually unambigious.
Now I’ll sound like a crazyman, but I actually think that JS/TS has some nice features to steal when it comes to struct initialization that would fit into static-type-system languages like C or Zig (mainly the ... spread/rest operator both for ‘structs’ and arrays).
My main reservations about that particular pattern is that neither ZLS nor the signature itself indicates that anytype actually means Partial(@This()) here.
While the language shouldn’t necessarily have to bend over backwards for LSP compatibility, the lack of argument autocompletion compatibility and doc comment propagation here severely harms discoverability.
I believe this is particularly relevant for the bag o’ options pattern because such structs are frequently wide and/or deeply nested, so finding which option(s) you want to change from function-wide doc comment(s) can be difficult.
I’m not sure if I was trolling either, tbh.
It seems like a good place to spend some syntax, sure. What would it look like? .init{ .field = "blah" }; seems kludgy. Maybe not?
True, and nothing short of AI is going to be able to read code like that and ‘know what it means’.
But! In this case, what I would do is this:
var my_struct: SomeStruct = .{
// add the fields I need, with full
// ZLS support
};
Then go full mini surround and type saisf.importFields<Ret>. I have a text object s for struct, which detects .{ ... }, so this means “surround this struct with a function call .importFields”.
I also taught it to recognize that f for function can have a leading @.
That doesn’t solve the general “anytype tax” on ZLS, but it would work great in this specific case.
Doesn’t having syntax for .init{ .x = y } bring back all the same problems as default values though? I can’t see it being added, because all it means is that the language will have gone from the pattern foo: Foo = .{ .bar = "baz" } to foo: Foo = .init{ .bar = "baz" } as the cause of problems.
A function for initialization is better here, as it can actually enforce the invariants needed by the struct.
Congrats @mlugg. That looks like a beast.