Yeah I think that for the most part, there are a lot of things that would be possible, once the compiler infrastructure matures I think once #615 becomes mature, and can output tons of internal information, It will be a lot easier to simply build a tool around the output of the server, and add safety. Once could easily imagine a future, where you have the compiler, and you have multiple tools consuming the internal output for providing completion, static analysis, even leak detection, etc etc.
Hm, not sure here. Thereās certainly a lot of historical baggage as to how strings used to be implemented, but I think these days design is more or less settled?
- A string is a valid utf-8 sequence.
- It is ātransparentlyā utf-8 ā byte slices and strings are interconvertible without data copy
- String supports fallible indexing by utf-8 byte offsets (this in turn gives you all standard string processing methods like starts_with, split_lines, etc)
- String comes in with a bunch of ascii-based processing (split_ascii_whitespace, to_ascii_uppercase, etc), which has well-defined semantics for ascii and leaves everything in place
- String supports iteration by unicode code points (it is also customary to call these
char
orrune
and give a specific type, but I think thatās actually a mistake: the only correct way to use unicode code points is when implementing text encoding, which you shouldnāt be doing in user code) - If you are a systems programming language, you stop here.
- Otherwise, you add dependency on the living Unicode standard, and add a bunch of extra iteration methods for grapheme clusters and such, tests of unicode properties (
is_whitespace
as opposed tois_ascii_whitespace
), and manipulations (to_upper_case
as opposed toto_ascii_upper_case
).
Itās not that we donāt know how to do strings, itās rather they bring relatively little additional value if you stick to system-programming unicode-properties-agnostic subset, and require either a new builtin type (making language more complex) or linguistic abilities to make user-defined-types look exactly as builtins (making language more complex).
Right here is where I get off on the train. A valid utf-8 sequence is a kind of string, yes. āProbably mostly valid utf-8 sequence, we hopeā, is another kind of string.
My experience is that we can do basically anything with the latter we can do with the former. Assuming validity is a performance optimization, sometimes, but itās one which is brittle because any validation is subject to TOCTOU problems. Unicode defines various ways to deal with invalid sequences, which can be used in situ.
Iād rather not use a type which obligates my program to crash or throw an error any time a bad byte sneaks into a string. Most of the time.
I definitely donāt want to validate megabytes of string if Iām going to search them for known-good sequences and donāt care if most of it is garbage. What if Iām looking for strings inside mostly-garbage (a binary for example)? Itās better if the tools work, which āis a valid UTF-8 sequenceā precludes.
Not every consensus is actually good.
The only point Iād push back on is the idea that simplicity isnāt a feature.
If Zig wants to fulfil its very lofty goal of eventually replacing C, not just in systems programming, but in the broader C ecosystem, language simplicity is one of the only features that ultimately matters.
You need something that can replace C89 and C99 as the āleast common denominator supported on every platformā, and that means a language simple enough that it is reimplementable from the language standard.
Thereās still a long way to go, but Zig is a far better candidate for that than Rust.
Agree, I think once the dust settle down, and they achieve the compiler they want, they should spend some times on simplifying the compiler as much as possible, and drafting a standard definition of the language.
Yeah that would solve alot of my comptime problems. Make a lot of my thoughts about how to improve comptime go away completely. As my codeās comptime is parallel to my build scriptās runtime.
Maybe zig would benefit from a failing allocator so we can test memory allocation error paths? Apparently someone tried. Also, I remember reading here an idea about applying the āallocator as a function parameterā pattern to syscalls - then it would be possible to test syscall failures too.
Sorry for going off topic here.
I think that is planned for the upcoming io as parameter changes, but I am not totally sure.
At least in theory you could put all the things that use syscalls behind some kind of interface.
Worth noting that this disqualifies the theoretical string
type for use with filesystem paths, environment variables, and process arguments.
Walked exactly the same path with Rust.
Actually I find the idea of a borrow checker cool, and it was the least of my problems with Rust. My main beef was that I was not able to understand the documentation for the trait system, and how it should be specified in syntax. It was very confusing and i experienced the forum to be unfriendly, when I asked questions.