2024 Survey of System Languages

https://wiki.alopex.li/SurveyOfSystemLanguages2024

7 Likes

However, Iā€™m not yet convinced that Zig actually a smaller, simpler language than Rust. In the section on Odin I grump at it a little for handling lots of special cases with ā€œjust add one more little featureā€ instead of finding large, powerful features that specialize to handle a lot of different things. I think Zig does this better; its ā€œbig featureā€ is palpably ā€œcompile-time evaluationā€, and it makes that feature do a startling amount of heavy lifting. But I still feel like it still has the ā€œjust add one more little featureā€ problem to some extent; lots of things with no better home just get added to the list of over 120 compiler built-in functions, from @addrSpaceCast() to @wasmMemorySize() . Besides that, looking at error-value handling and error sets exposes a whole new sub-language around them, then more about result types and locations behind pointers, and so on. These may get refactored into libraries or other language features as time goes on. In any complex system there tends to be a cycle of reincarnation that alternates between ā€œadd new thingsā€ and ā€œrefactor to generalize special casesā€, and in reality, 120 built-in functions is not exactly a bloated mess, especially when you actually use about 10 of them regularly. But it still feels a little spooky to an outsider.

Anā€¦ alternative Idea of what makes a language complex. Especially since, if i read this repo correctly, rust has far more compiler-builtins than zig, its just that they are indistinguishable from regular function calls or macro invocations. Honestly, and I hate to cast aside criticism without introspection, I donā€™t get it. Is it that the code feels disorganized and therefore is remeinicint of the ā€œkitchen-sinkā€.

4 Likes

I donā€™t get the criticism of defer either. People like to say ā€œDefer isnā€™t a destrucctor, it doesnā€™t have move semantics with RAIIā€ but I donā€™t really understand this line of thinking. To me, defer is just a manual destructor. But maybe Iā€™ve just gotten past the stage where I think RAII is king and Iā€™m more able to see instances where deallocation isnā€™t a good idea (i.e. in arena allocators)? But yeah, I felt the assessment was okay up until the criticism you quoted, that seemed to be a bit of a stretch and I feel like Rust is holding too much weight but maybe thatā€™s just me?

2 Likes

Judging the ā€œcomplexityā€ of a language by the number of compiler builtins it exposes ā€“ rather than evaluating its semantics and expressive power ā€“ is the kind of logic which makes you think Brainfuck is the simplest and most elegant language in the world.

8 Likes

Yeah that line of thought is strange, but to some extent I agree that what is and is not a builtin feels a bit arbitrary. Some builtins are very general and some are very specialized. Not an issue really but I was confused the first time I looked at the builtin list. Also Iā€™m sure itā€™ll change a bit before 1.0.

Thinking a bit more about it I think that zig can appear complex for a beginner used to other languages with more specialized syntax and constructs. For example, I was very confused my first couple of days when I wanted to write an anonymous function, but apparently couldnā€™t because no specialized syntax exists. Instead I started thinking of increasingly convoluted ways to do it which at least gave me the impression that zig code would be complex. Later I saw how to do anonymous functions and I started to understand that zig has few complex features, but the intersection of features gives rise to complex behaviour not apparent at first sight. This might be part of what the author feels avout zigā€™s complexity.

3 Likes

A Zig file is implicitly an anonymous struct constant with its definitions as values. You import the code in the file foo.zig by writing const whatever = @import(ā€œfooā€); and it just assigns that struct to a constant in your program, the same as any other constant. So, all the programs in your modules are just treated exactly like any other value. This is incredibly based, to the level that the only other language I know that really embraces it is Lua. However, it has a cost: it means you have to do all the lookup of function names and such at runtime via indirect loads/jumps rather than more efficient direct onesā€¦

Highlight mine.
What the hell is this guy talking about? Failing something so basic puts in check everything that he is saying about any of the languages.

5 Likes

@LucasSantos91 this seems like an uncharitable reading, given that

ā€¦ unless you can optimize out all the lookups to values you know are constant. Oh look, Zig really likes optimizing away lookups of constants. So unless Iā€™m missing something, you just get modules that can be manipulated exactly like any other data in the language,

is right after that :0)

He is saying that calling an imported function is an indirect call, and the compiler may be able to devirtualize it. Itā€™s completely wrong. The calls are direct, and they have to be, no ā€œmaybeā€. In fact, adding dynamic calls requires extra work from the programmer.
In essence, he is confusing declarations with fields. He thinks that importing is this:

//a.zig
pub fn foo() void{}

//b.zig
const module = @import("a.zig");

//He thinks module is equivalent to:
const module = struct{
  foo: *const fn()void,
};

// But in reality module is equivalent to:
const module = struct{
  const foo = the original foo;
};
3 Likes

This is one of the crazy cool things that Zig does. Zig generics are just functions that can take and return types the same way as any other value. Theyā€™re just all evaluated at compile time, and RTTI fills in some gaps to let you do a bit of inspection of types at runtime as well. Zig can do a lot of compile-time evaluation, so its compile-time type functions end up acting a lot like C++ or Dā€™s templates.

(emphasis mine).

This is more of a question than a critique. I thought reflection and Type Inspection in zig was limited to comptime, am i wrong about that?

RTTI is not the proper name of this. I think he is refering to the fact the you can print the type name with @typeName. Itā€™s not a form of RTTI, itā€™s just that the name gets stored somewhere in the binary and the print function is called with a comptime-known pointer. Same thing happens with fields and enum names.

6 Likes

This might be somewhat hard to gleam through the light-touch humorous writing, but I am pretty sure that what is written rather unambiguously states the following:

  • Language A can semantically model its modules as record values with fields storing functions.
  • Then, calling a function in a module is semantically (module_value.f)(args), a field lookup & indirect call.
  • Which is pretty terrible for performance, and immediately puts you into Lua/Python league.
  • But thereā€™s a trick ā€” the value of a module is actually known at compile time, so if your compiler is smart enough to partially evaluate module_value.f at compile time, you can generate direct call.
  • And Zig in particular is pedantic about evaluating at compile time that is possible to evaluate at compile time

Additionally, while it is true that Zig models files as

struct {
   const foo = fn(); // struct with a declaration
}

and not as

.{
   .foo = @as(fn(), ...),  // struct with a function-body field
}

or

.{
   .foo = @as(*const fn(), ...), // struct with a function-pointer field
}

It doesnā€™t actually matter for the issue in question. Itā€™s not whether this is a declaration or a function pointer, but rather about the thing being known at compile time.

Consider

const std = @import("std");

const std2: struct {
    cwd: *const fn () std.fs.Dir,
} = .{ .cwd = &std.fs.cwd };

pub fn main() void {
    const d = std2.cwd();
    std.debug.print("{}\n", .{d});
}

I am pretty sure that std2.cwd would be a direct call here.


Now that I think about this, it feels like Zig doesnā€™t really need declarations?

struct {
    x: u32,
    fn f() void {}
}

could be a syntactic sugar for something like

.{
  struct { x: u32 },
 .{ .f = &fn f() void {} },
};

That is, a pair of struct type, and a value holding all declarations (which, ok, is foiled by the absence of function expression syntax :stuck_out_tongue: ).

1 Like

And this really glides over macros and the grief and complexity they bring.

ā€œcomptimeā€ is a really, really big deal because it is the exact same code I write at runtime. This means that I can force the code to execute at runtime and be debuggable.

No one who does ā€œmacrosā€ does this. Every ā€œmacroā€ system is a weird DSL that is opaque to debugging. Rustā€™s macros are particularly opaque with some fairly weird limitations by virtue of operating on token streams that do or do not have access to wider compilation information.

The one thing that I still donā€™t have a good answer for in Zig is ā€œreference countingā€. If you want to write a program with reference counting, Zig (and C) really arenā€™t the languages you should be using. ā€œdeferā€ simply isnā€™t enough. (Iā€™m purposefully omitting the larger discussion of whether ā€œreference countingā€ is a good idea on modern processors)

1 Like

The one thing that I still donā€™t have a good answer for in Zig is ā€œreference countingā€. If you want to write a program with reference counting, Zig (and C) really arenā€™t the languages you should be using. ā€œdeferā€ simply isnā€™t enough. (Iā€™m purposefully omitting the larger discussion of whether ā€œreference countingā€ is a good idea on modern processors)

I donā€™t think it really is that bad. If you forget to increment the count, then you get a segfault with a stacktrace, if you forget to decrement the count you get a stacktrace from the leak check in the allocator. So if you have only a small number places where you even touch the refCount, then itā€™s easy to find and fix these problems.
Sure it is also super annoying to use in Zig, but I think thatā€™s good because it forces you to try and look for better solutions.

I suspect that the authorā€™s mixup comes from reading about incremental linking plans. This is a feature hidden behind -fincremental because itā€™s still too buggy to ship right now, but in the past, it worked by making all function calls indirect, so that functions could be swapped out for new ones without touching the rest of the executable data.

It no longer works that way - now we have a trampoline (i.e. replace moved function body with a direct jump to new one), so the calls are indeed direct, even when they have been moved to somewhere else in the executable file. But I can understand the confusion, especially because the communication about this and related features is strewn about chaotically among many years and locations.

4 Likes

But I still feel like it still has the ā€œjust add one more little featureā€ problem to some extent; lots of things with no better home just get added to the list of over 120 compiler built-in functions, from @addrSpaceCast() to @wasmMemorySize()

Iā€™m new to Zig, but Iā€™d +100 this.

There is no good reason why @memset is a builtin rather than a library call, at least from my uneducated newcomer perspective. The fact alone that the same @ notation is used for imports, type operations (oh so many type conversions), magical types like @Vector and then just regular functions is beyond me, too.

I get that things like @cmpxchgStrong maybe need some special notation, and in C stuff like this gets differentiated with __ prefixes, but Zig extends this to many other things.
SIMD stuff, in particular, is in a weird limbo, with some functions only available in std.simd, some via @splat like calls (and most completely unavailable unless you go via C api), but then SIMD reduce operations use yet another different syntax.

Note: I donā€™t think that amount of builtins is problematic, but inconsistency of it feels weird.

3 Likes

I think itā€™s reasonable to ask why some things are builtin and others not. And I also agree that, being builtin, it can be hard to discover what builtins are available when you need them. Certainly there could be some nice organization. (FWIW, many builtinā€™s have std library equivalents that just call the builtin).

However, i think that is different than what is being put forward. The ā€œKitchen Sinkā€ imagery conjures up a language that is trying to add every feature in the world (indeed C++ and Rust are both accused of such behavior), which I think can pretty well be demonstrated to be false. Just look at all the rejected requests for features and the fact that async was removed. Andrew and the Zig team have done a lot of work to make sure features work well and fit in at the language level.

I think having builtins as a clear line between what is part of the core language and what is part of the standard library/ecosystem is really helpful.

I think that is much better than:
ā€œthe builtins just are what we happen to use in the implementation of the standard library, that hides the builtins from you and is required to be usedā€

I donā€™t know which languages in particular tend more to the latter (because I havenā€™t analyzed it in detail), but explicit is better and eventually it will make it easier for people to implement the language or even create alternative implementations, once the language becomes more specified.

But even without other implementations, it is just nice from the point of making it easier to understand what is going on, how bootstrapping works, making sure the language can be used without the standard library etc.

Hiding builtins in a standard library makes that library mandatory, makes semantics of the language less clear and the only thing you get is a more pretty interface that at worst restricts your access to the builtins and at best is a 1to1 copy of what the builtins can do (but because it does nothing would be pretty pointless).

I think the worst case is when the language only gives you some abstracted way to write code that generates something complete independent of the code you wrote, where you have no way to influence what actually gets generated on target hardware, because then you are always at the mercy of hoping the compiler does what you want it to do, at that point the hardware is so abstracted away that you canā€™t really optimize for the hardware from a user program and instead have to edit the compiler. Languages like that can be useful, but I donā€™t want that in a lowlevel language.

So personally I think every language that doesnā€™t make its builtins the primary thing how the language grounds itself to actually do something useful (without having to encode every single thing in weird extra syntax, for example I want access to @popCount but I donā€™t need that as a popCount operator) ends up in a worse place where it may present the user some neat interface, but doesnā€™t allow the user to access and use what the language/standard-library implementation itself uses to implement itself.

I donā€™t like languages that make the user a 2nd class citizen that needs to consume and accept specific libraries and has to fork the compiler or the standard-library if they want to do something a bit different that wasnā€™t intended by the standard library.

For those languages I would ask:

  • where are your builtins?
  • why are you hiding them?
  • can the language be used without the standard library?

I think it is great when you can use builtins and directly create your minimal alternative standard library with it, that only contains what you need for a specific project, I think especially people working on their own operating systems or embedded projects can benefit from that.

So I think I would claim the opposite:
There is no good reason why @memset should be a library call instead of a builtin.

The primary function of builtins isnā€™t to be user friendly, it is to make it easy to implement the compiler, by having a set of operations that can be used by the compiler in its own implementation and additionally they also can be an easy way to expose specific capabilities that donā€™t necessarily deserve their own syntax.

With Zig the standard library or 5 other libraries can put arbitrary abstractions over those builtins to make them neater in whatever way the authors of the library think, the good thing here is that nobody is required to use those abstractions if they donā€™t provide enough benefit.

Could the builtins become more consistent in some ways? Probably, but I think that is more of a low priority goal until the language gets closer towards 1.0.

6 Likes

Yeah, no, in terms of kitchen and sink it doesnā€™t feel like the C builtins by far, thatā€™s for sure.

Despite that, I think there is a much clearer line in terms of what C builtins are (mostly code that you cannot reasonably directly implement in C itself, most of these are wrappers around raw assembly blocks. Iā€™d guess this reasoning might have yielded @memset actually, which was a C implementation in the olden days, but I guess Zig may just have a pure assembly one, but thatā€™s picking ā€˜is notā€™ against ā€˜cannotā€™ - you could write @memset in Zig, easily).

The notation / syntax feels a bit more explicit as well, since mostly you get the __builtin prefix so there is no confusion.

I think the author just worded himself poorly and was trying to say that Zig can optimize a declaration lookup to a nop in a sense :^)


If we get function expressions would it not be logical to remove the special function syntax sugar instead?

Similarly how you write const T = struct{}instead of struct T {}, in an ideal world it would be const foo = fn() void {} instead of fn foo() void {}.

Continuing that thought, if declarations are removed from the language and comptime members are kept, comptime members can fill the same role declarations fill right now.

For example this is how I imagine what this would look like;

const Foo = struct {
  n: i32,

  comptime init = fn(n: i32) Foo { return .{ .n = n }; },
  comptime magic_number: N = 42,
  comptime N = i32,
};

Regardless of semantics, I think Iā€™m inclined to agree: Zig does not need decls (if we get blessed function expressions). As a bonus without ā€œnamespacedā€ declarations there wouldnā€™t be any way to create a global mutable state anymore ā€” at least not any I can think of.

Itā€™s not coming.

2 Likes