Zig; what I think after months of using it

Yeah I think that for the most part, there are a lot of things that would be possible, once the compiler infrastructure matures I think once #615 becomes mature, and can output tons of internal information, It will be a lot easier to simply build a tool around the output of the server, and add safety. Once could easily imagine a future, where you have the compiler, and you have multiple tools consuming the internal output for providing completion, static analysis, even leak detection, etc etc.

2 Likes

Hm, not sure here. There’s certainly a lot of historical baggage as to how strings used to be implemented, but I think these days design is more or less settled?

  • A string is a valid utf-8 sequence.
  • It is ā€œtransparentlyā€ utf-8 – byte slices and strings are interconvertible without data copy
  • String supports fallible indexing by utf-8 byte offsets (this in turn gives you all standard string processing methods like starts_with, split_lines, etc)
  • String comes in with a bunch of ascii-based processing (split_ascii_whitespace, to_ascii_uppercase, etc), which has well-defined semantics for ascii and leaves everything in place
  • String supports iteration by unicode code points (it is also customary to call these char or rune and give a specific type, but I think that’s actually a mistake: the only correct way to use unicode code points is when implementing text encoding, which you shouldn’t be doing in user code)
  • If you are a systems programming language, you stop here.
  • Otherwise, you add dependency on the living Unicode standard, and add a bunch of extra iteration methods for grapheme clusters and such, tests of unicode properties (is_whitespace as opposed to is_ascii_whitespace), and manipulations (to_upper_case as opposed to to_ascii_upper_case).

It’s not that we don’t know how to do strings, it’s rather they bring relatively little additional value if you stick to system-programming unicode-properties-agnostic subset, and require either a new builtin type (making language more complex) or linguistic abilities to make user-defined-types look exactly as builtins (making language more complex).

1 Like

Right here is where I get off on the train. A valid utf-8 sequence is a kind of string, yes. ā€œProbably mostly valid utf-8 sequence, we hopeā€, is another kind of string.

My experience is that we can do basically anything with the latter we can do with the former. Assuming validity is a performance optimization, sometimes, but it’s one which is brittle because any validation is subject to TOCTOU problems. Unicode defines various ways to deal with invalid sequences, which can be used in situ.

I’d rather not use a type which obligates my program to crash or throw an error any time a bad byte sneaks into a string. Most of the time.

I definitely don’t want to validate megabytes of string if I’m going to search them for known-good sequences and don’t care if most of it is garbage. What if I’m looking for strings inside mostly-garbage (a binary for example)? It’s better if the tools work, which ā€œis a valid UTF-8 sequenceā€ precludes.

Not every consensus is actually good.

4 Likes

The only point I’d push back on is the idea that simplicity isn’t a feature.

If Zig wants to fulfil its very lofty goal of eventually replacing C, not just in systems programming, but in the broader C ecosystem, language simplicity is one of the only features that ultimately matters.

You need something that can replace C89 and C99 as the ā€œleast common denominator supported on every platformā€, and that means a language simple enough that it is reimplementable from the language standard.

There’s still a long way to go, but Zig is a far better candidate for that than Rust.

11 Likes

Agree, I think once the dust settle down, and they achieve the compiler they want, they should spend some times on simplifying the compiler as much as possible, and drafting a standard definition of the language.

1 Like

Yeah that would solve alot of my comptime problems. Make a lot of my thoughts about how to improve comptime go away completely. As my code’s comptime is parallel to my build script’s runtime.

2 Likes

Maybe zig would benefit from a failing allocator so we can test memory allocation error paths? Apparently someone tried. Also, I remember reading here an idea about applying the ā€œallocator as a function parameterā€ pattern to syscalls - then it would be possible to test syscall failures too.

Sorry for going off topic here.

I think that is planned for the upcoming io as parameter changes, but I am not totally sure.

At least in theory you could put all the things that use syscalls behind some kind of interface.

std.test.FailingAllocator

8 Likes

std.testing.checkAllAllocationFailures (article about it)

8 Likes

Worth noting that this disqualifies the theoretical string type for use with filesystem paths, environment variables, and process arguments.

10 Likes

Walked exactly the same path with Rust.

Actually I find the idea of a borrow checker cool, and it was the least of my problems with Rust. My main beef was that I was not able to understand the documentation for the trait system, and how it should be specified in syntax. It was very confusing and i experienced the forum to be unfriendly, when I asked questions.

2 Likes