Should the standard library be "batteries included"?

That’s probably true, the more layers you have involved in such a transition (e.g. the language+standard lib, a framework, multiple libraries which depend on the framework, libraries which depend on these libraries again and so on), the longer such a transition takes place and the more likely it is that such a transition takes longer or even fractures the ecosystem for a long time.

One thing which at least the Discourse search functionality doesn’t find in this topic is that batteries included standard libraries lessen the risk of supply chain attacks because it’s harder to get in. (After all, where is it more likely that an attack succeeds: one big target where people are working full time on the software or multiple dependencies where some are doing it on the side as a hobby? And you just need to get in in one place.)

Thanks to all the attacks (and even worms) happening right now in other ecosystems, I even start to see people (even more so managers) actively judge languages as more risky if the standard library is small, even more so if there isn’t one (or a few) big dependencies which could take the place of a batteries included standard library (like e.g. Qt in C++).

4 Likes

mark as deprecated and move into an external package

Honestly hadn’t seen or heard of this approach but love the idea. I think one good example already as part of Zig’s history is BoundedArray

IMHO the compatibility breaking transition in Python alone wasn’t the problem, but instead that Python3’s new string stuff is simply a shitty design from the ground up.

Just one detail among many, but changing the return type (bytestream vs string) of the file read method dependening on a runtime argument of the file open method is just completelyt bonkers. Not using the “UTF-8 everywhere” philosophy for the string encoding was the main problem though (that would have allowed to define the string type as a ‘view’ on a bytestream).

The unicode string handling in Python3 is basically a step backward even from Python2, that was the annoying thing about the Pythin2-to-3 transition, Python3 broke backward compatibility for no good reason except making everything related to string handling worse compared to Python2.

Also one big difference between Python and Zig is that one is duck typed, and the other is statically typed. Introducing breaking changes in a language that only tells you that something is wrong after you run the code and hit that specific breakage is much more problematic than in a statically typed language where compilation breaks in places that need updating.

1 Like

I can kinda understand that design.

When you are reading a file in, you have no guarantee that the file actually has text. It can also have binary data. So using a bytestream instead of a string makes sense.

But when you want to read a file as text, not expecting binary data, you want a string (at least in languages where string are not handled in the way they are in e.g. Zig). And that normally means a conversion to whatever encoding is used by the string type from whatever encoding was used for the file (which may not be Unicode but one of the extended ASCIIs).
Now should this be done automatically for you? Probably not, at least imo. But the Python devs said yes to that and then it makes sense to also have the appropriate return type to show that guarantee.

1 Like

Obviously we have different opinions about if Python 3 string/bytes is better or worse than Python 2 :smiley:.

Your point about static typing is very good; this significantly minimizes the risk of breaking changes to introduce new bugs at runtime.

2 Likes

The problem is basically that it is impossible to detect the correct text encoding from a stream of bytes, so might just as well assume utf-8 (which is backward compatible with ascii).

All that Python3 needed was a manual function to “view this bytestream as a string” (AFAIK Python2 had that already anyway), this needs to accept an encoding as extra input, and the default should be UTF-8 (the alternative of ascii codepages has only survived on Windows, and even there you shouldn’t assume that a text file is encoded with the system codepage - you basically need to know the correct codepage)..

There are details like BOM and UCS-2 (which then was shoehorned into UTF-16), but these are all Windows-isms. Whether it makes sense to support those deprecated hacks in Python via helper functions is arguable, but IMHO Python should never attempt to make magic guesses when it comes to text encoding.

Python doesn’t try to detect the encoding, except looking for BOM. Other than that, it uses the locale encoding, which I think is very sensible.

If they released Python 2.8 with both b"" and u"“, they could have avoided a lot of the pain. The u”" syntax had to be backported into Python 3 for the migration to be possible.

Anyway, one interesting part of the Python 3 migration was that people really started migrating once Python 3 has useful features that they could not get in Python 2, and that was mostly asyncio.

There are many people still on Zig 0.14, who have no plans to upgrade. It’s painful for little benefit. Batteries included stdlib could be the carrot to bring them, but it has to be well executed and not disallow previous patterns.

2 Likes

As a beginner coder, I did choose Go over Python because Go promised to be backward compatible. It’s great if you start you application 2 years later and it still just works.

In fact, I use bash often instead of Python for the same reason.

I found this a interesting read in that regard, maybe slightly offtopic.

It’s a great read but in my opinion Zig shouldn’t be this level of “battery included”. My opinion is that the Zig stdlib should be somewhere in between C stdlib and C++ stdlib.

Basically trimming the fat keeping only the building block, for example a crc32 function doesn’t need 5 different implementation from 5 different library author. Yet it’s used in many many places, that’s a fundamental block.

Io, Allocator, fs, net, path, etc etc. And then I truly believe an std-contrib package would be the more, high level, experimental, and potentially breaking api. For example if simd namespace was downgraded to std-contrib we could basically get the same level of sharing as HuggingFace new Kernel tab where everyone can share gpu kernel. But instead it could be filled with tons of simd optimized functions, and alternative for a lot of more niche things.

This is the only way to get a battery included, without a huge burden on maintenance and without really breaking anything for the users. Since by using the contrib they know some of the api, may be removed, changed, etc

2 Likes

I would argue that long term std.Io would be that, since the goal is to make it possible for multiple (fully independently developed) libraries to use that interface and integrate seamlessly into the execution.

Another carrot could be the separation of the build runner and the build graph creator (or however you would call that), which is currently happening, IF the resulting format also becomes stable/a public API/format since that would make it possible to create your own build runners way easier.

One project idea which I have (of way too many) would be to create a “distributed” runner. As in you say e.g. distributed-zig-build test (which would be equivalent to zig build test) and it would run the test step on different “devices” (which differ in e.g. architecture and OS) natively in parallel over the network and merge the output of all of them into one big tree on your end.

And this build separation would make this imo way more approachable to start.

2 Likes

It worked this until recently GitHub - ziglang/std-lib-orphanage: No-longer-maintained MIT licensed code that needs a new home · GitHub but i guess now you are expected to just copypaste code to your project.

The problem is just that different people consider different things “fundamental building blocks” (with overlap) since the fundamental building blocks depend on the problem domain you mainly work in.

For example andrewrk seems to consider HTTP a basic building block. The main developer of Hare considers HTTP very high level (too high level for a stdlib even), but they consider regex a basic building block (which isn’t in the Zig stdlib at all currently, not even something for globbing).

But well, I would structure the stdlib WAY differently than it is right now anyway.

6 Likes

Python doesn’t try to detect the encoding, except looking for BOM. Other than that, it uses the locale encoding, which I think is very sensible.

Python 3.15 is going to default to UTF-8 though. IMO that’s a better default.

1 Like

I don’t hesitate ask

If we say ‘batteries included’, do we mean posix-style APIs too?

I don’t think Andrew thinks HTTP is a basic building block (i mean maybe he does), the current state of the stdlib, is that most of what is in here is what the compiler needs, that’s why there is an HTTP module, or an ELF namespace, those aren’t “fundamental” but as they have said many times the current focus is the language, and the std is mainly a byproduct shared by them, but when the language is “done” they’ve said that they want to then focus on shipping a curated and well designed std.

TLDR what we have currently isn’t really an std, it’s just a byproduct but they do intend to audit #1629 it, and decide where to take the std.

And I do think, that they very much know what is a fundamental building block, and i do not believe that it is related to “the” problem domain, or rather what is a fundamental building block is specifically something that across domains remain the exact same.

If you are doing bio-computing, or embedded firmware, or high level web framework, a crc32 is always a crc32 it doesn’t change. a split function doesn’t change. This is what I have in mind when i think of basic building blocks. With the exception of Data structure, since like Io/Allocator/Random/Reader/Writer to me Data structure are also a kind of interface.

Which leads me to why I think it’s a great idea to split the std, between core/contrib, with one stable, battle tested primitives, from which contrib can be built, or any other libraries for that matter.

2 Likes

I think it’s a mistake to think the choice is strictly in std or not. The issue is not about the package, it’s about trust, stability, and support. “std” is a proxy for highly trusted, highly stable, and highly supported. These attributes can be acheived by packages other than std (and not necessarily by ZSF because it is only one possible highly trusted, stable, and supported source).

It’s probably better if different domains are in different packages so they can evolve independently.

There is the question of “what belongs in std_universal_core”. CRC32 yes, HTTP, no. Red-Black-tree, debatable.

3 Likes

True, I think with the way things are currently going right now when it comes to supply chain attacks we are going to see a lot of ecosystems going back to a more Qt-like dependency structure. As in a few very huge packages (which are made up of multiple modules) instead of many small packages.

1 Like

To some extend i can agree, but the opencv example is really brilliant in my opinion. It’s a really good way to manage contribution too, the core is stable, battle tested, curated, minimal, and fundamental, and there are cross-domain pieces like that, say you have i’m thinking stuff like asBytes() or asValue() it’s not like memory model is going to change tomorrow, same for addWithOverflow() it will be the same 10y from now.

But HTTP evolves, some people just need 1.1, some need HTTP3 for quic, some need websocket, and I think there is some values in electing one package in contrib to focus people’s work and attention, and to facilitate interoperates between Zig code. All of that without making the burden of maintenance heavy, it’s kind of a second circle of trust, and it pairs well with zig’s code reuse philosophy.

To me it’s like Linux, having so many distros does hurt adoption, because people don’t really know what to choose and it does dissipate energy, that could be channeled better.