Should the standard library be "batteries included"?

Hi.

I think that standard library (not only in Zig, but everywhere) is not for dealing with tar files,
it’s more to handle stuff in RAM (allocations, strings, queues and alike), because handling
files of any format on any storage device is a quite complicated things.

Including any file format (tar, xz, jpeg whatever) features into Zig’s stdlib would create a mess,
all these and other formats are separate huge projects and they should be placed into their’s own libraries in Zig, not into stdlib.

This is my personal opinion, nothing more.

Note that Zig standard library already has support for tar extraction (and planned support for zip) as well as decompression support for DEFLATE, gzip, lzma, lzma2, xz, zlib, and zstandard. It also has compression support for DEFLATE and zlib.

For what gets included in the standard library, see:

Unsure if tar archiving will be needed internally or if a PR adding it would be merged, but it’s definitely not far fetched in terms of what’s in the standard library now.

8 Likes

Standard HTTP-client? In a standard library for a general purpose language which Zig is supposed to be??? For py/tcl/other-hi-level-langs it is ok, but not for Zig.

There is no standard “simple” trim in libc, not to mention “standard http-client” in Zig :triumph:

No reason to be discontent here, in many languages you need to use the standard library (bringing in a big thing), while in zig you can write programs that don’t rely on the standard library, so I would say it isn’t quite as problematic, what gets included in the short term.

And I think in the long term, there are many zigguanas which are interested in making as much as possible, into packages that are only used if really necessary.

There is also a whole issue about, eventually reviewing and redesigning the whole standard library, I think that is one of the things that will be considered there.

2 Likes

It already exists: zig/lib/std/http.zig at master · ziglang/zig · GitHub

Everything in the standard library is technically up for potential removal as per audit the standard library API and implementations · Issue #1629 · ziglang/zig · GitHub, but things that the compiler depends on are very likely to survive (which, as mentioned, the package manager requires a HTTP client).

2 Likes

However the HTTP client (and server for testing) does not need to be available in the standard library. It can be implemented in a separate anonymous module that is used only by the compiler.

2 Likes

If it has to be in the compiler, why not expose it? It has to be good quality code to make into the compiler, just let people use it.

1 Like

Because if you expose it you have to maintain compatibility, and that’s a big burden.

As an example, in the Go Language project the go command keeps all the internal API private. The compiler use a different implementation for the tokenizer and parser, compared the the public implementation in the std.

1 Like

Zig doesn’t have to maintain it, if it doesn’t need it anymore. I have an algorithm that allows Zig to change whatever it needs, while allowing users to have fearless dependency on the std lib. I call it the “infinite forwards compability algorithm”. When Zig deletes something you depended on:

  1. Go to Zig’s Github page.
  2. Find the commit that deleted the things you needed.
  3. Copy the red lines and paste it into your code.

Battle-tested this when Zig deleted user32.

5 Likes

Interesting that you mention Go. I’ve been thinking about replying to this topic precisely with Go in mind. As I see it, one of the most influential factors in Go’s success has been its batteries included standard library. People constantly mention how they love just installing Go and being able to build web servers, frameworks, microservices, parsers, etc. without any additional dependencies. Even when they do have to go hunting for a dependency, they wish they can find one developed by the Go team, or at least curated or blessed by them, so they can be confident of its quality and reliability. I’m not saying this is the ideal mentality to have regarding what a standard library should include, but it seems to be the dominant mindset among many developers.

IMO, this is the real issue here. Should the standard library have the same compatibility guarantees as the language itself? If so, then it definitely should not be batteries included since the sheer breadth of areas it covers will undoubtedly see paradigm shifts in the future, and thus will inevitably be outdated.

Maybe a solution could be a four tier structure:

  1. language
  2. standard library
  3. curated extended library
  4. ecosystem

Each with their own compatibility guarantees.

13 Likes

Another solution is versioning the std, or adding editions containing snapshots of the std in a separate repository.

7 Likes

As a forced topic starter, I would like to ponder over a library (no
matter, “standard” or not) content/API. Here I mean libraries designed
for comminicating with various servers using some application level
network protocol (http, postgres, resp and so on).

Briefly: do not mix CPU bound stuff (request constructing, reply
parsing) and I/O bound stuff (connecting, request sending, reply
receiving) in a (networking) library.

Now I will try to explain as clear as can why I think this way. Here is
some (event driven) state machine which I use in production DAQ systems:

state machine description
$init
$conn
$hitx
$hirx
$autx
$aurx
$idle
$fail
$wait
+init M0 conn

@conn M1
@conn M2
+conn M0 hitx
+conn M3 fail

@hitx M1
@hitx M2
+hitx M0 hirx
+hitx M3 fail

@hirx M1
@hirx M2
+hirx M0 autx
+hirx M3 fail

@autx M1
@autx M2
+autx M0 aurx
+autx M3 fail

@aurx M1
@aurx M2
+aurx M0 idle
+aurx M3 fail

@idle M2
+idle M3 fail
+fail M0 wait
+wait T0 conn

Do not be scared, the “syntax” is extremely simple:

  • $ designates a state
  • +<state1> Mx <state2> means state1 → state2 transition when Mx happens
  • @<state> Mx means “when Mx happens, do smth, but do not leave the state”

If you are not very lazy, let’s convert this
textual description of the machine into graphical form:

Do the following:

  • take a sheet of paper and a pencil (or a diagramm drawing tool, dia for instance)
  • for each line starting with $ (dollar sign) draw a circle, put the name inside; these are our states
  • for each state:
    • draw transition arrows (+conn M0 hitx => draw an arrow from conn to hitx and mark this arrow with M0)
    • draw action arrows (@autx M1 => draw a loop arrow for the state)

Have you guessed what is this SM for? It’s a machine that perform a
connection to PostGres DBMS (with md5 authentication). And a bunch of
such machines in essence is a connection pool.

Some notes before I ask my main question. In every state where this machine is sending
something to a server (hitx, autx) it has these two reactions:

  • @hitx M1 // ok
  • @hitx M2 // failed

M1 and M2 are messages that coming from other machine,
similar to this one

Similarly, in every state where we are receiving, connector machine has
reactions to M1 (success) and M2 (failure) messages being sent by an RX
machine when it’ done.

So, the question is - what do I expect to be in a postgres library with such a construction?
The answer is quite obvious - I want CPU bound stuff only (how to make requests,
how to interpret DBMS answers), I do not want no connect/read/write things there.
I’d better do i/o in a way that is most suitable for an application.
In the case above (PG connector) I wanted to achieve high concurrency level
After each action/transition we go to an event loop and a program is able
to do many-many other things while connector is in the process of… connecting :slight_smile:

In my header for PG I have API like this:

int pg_mk_fhelo_msg(char *user, char *database, struct dbuf *b);
int pg_ck_bhelo_msg(struct dbuf *b, struct pg_error *e);
u32 pg_count_notifications(struct dbuf *b);
.....

About async/await/coroutines: although Zig is not ready for the
cooperative multitasking within a single execution thread, but… what
are going to do with this? I happened to hear something about “monkey
patching” in Python with regard to gevent/greenlets or so.

OK, let’s look at http in Zig from a bird’s perspective:

/opt/zig-0.12/lib/std/http$ ls -1
Client.zig
Headers.zig
protocol.zig
Server.zig

Aha! I see protocol.zig and Headers.zig. This is probably a good
sign. If it is possible to use these two (and they do not use any i/o)
without the other two, it’s just wonderful, this fits nicely into my
concepts of a network library design. However, I did not look deeper.

I think having a batteries included library (to a reasonable extent) is quite nice, Having to constantly reach for other’s code for quick prototyping or explorations is a bit annoying, doesn’t mean that the STD should have everything, But I think the way it is right now is quite nice, I really enjoy the variety of data structures, Having a lot of them makes it easy and painless to explore the optimal solution for your needs. In C The same process requires you to often just build your own data structures, and this makes the development much longer than it needs to be. In Zig I can quickly put something together, swap the Data structures around see which one fits best. It’s really a big plus in my opinion. Now for all the extra stuff I don’t think it’s necessary per say, but if they use it for their internal needs, they might as well make it available.

8 Likes

Go criteria that define what is included in the standard library:

3 Likes

Aside question…
Is there a definition of “ecosystem of programming language” term somewhere? I couldn’t find anything that would satisfy me :frowning:

Yeah I guess it’s a term that’s being applied to programming languages more recently (I believe mainly in the Rust community.) To me, the ecosystem is all the tools, projects, and libraries written in the language or to service users of the language, beyond the language’s core team.

1 Like

I’m not a language designer or maintainer, so I’m coming at this as a user. I think of two languages that kind of exhibit both ends of the spectrum. Python and Rust.

Pros for having batteries included:

  • being able to write code that just works when run, without reaching for dependencies. This is particularly useful for a language like Python where dependency management is quite poorly supported. In Rust this is less so, as dependencies are grabbed during build.

  • having some security and stability assurances are nice. Few third party libraries ever reach the level of trust that a std lib gets.

  • having some “basic” functionality is nice. In Python it’s nice that I can use the standard japn parser. In rust it’s annoying to have to always reach for serde for serialization of any format. That being said, the scope of what is “basic” is different for different languages.

Cons of batteries included:

  • time that could go to making language better is spent maintaining less important code.

  • what is considered “basic” may become outdated. Look at all the standard library packages that Python has deprecated recently. This may be fine for Python, but for a language with the aspirations of Zig, this is less doable.

  • inertia in adopting better implementations by third parties? I’m not sure this one is real but it could be something that happens

2 Likes

I think a good middle ground would be data structures and algorithm, In Zig Data structures and algorithm are painless to implement, but if code is made to be robust and longstanding, than I think one goal of the std should be to provide, a very solid common ground of Data structures, because Data structures in a sense are sort of the most basic interface of any programming language.

If you use an event driven design, and you reach for a library that also use an event driven design, but both of you have different interfaces for your priority queue, than you either have to put some glue, or straight up build your own, This is a silly example but the point is that if there was one very good standard queue, than everyone could use it as a mean to ease distribution.

On top of that data structures and algorithms are very stable, it’s not everyday that we invent a new data structure so useful that it completely invalidates another one. So they can be implemented once and very rarely updated in practice, which lighten the burden on the maintainers.

Data structures and algorithm are also very error prone, especially when they involve more complex systems, having only one implementation that’s used by a lot of different codebase that have very different usage, would allow every user to collectively report bugs which would enhance the robustness for everyone else’s code.

Data structures and algorithm’s availability also speed up prototyping speed, and reduce the cost of exploration, which really aligns with Zig’s strive for optimal solutions.

2 Likes

Agreed on commonly used data structures, but…

  1. Interface to an OS API is of primary importance, and Zig is the best here amidst all those
    marshalling oddities .

  2. data structures (lists, stacks, queues, hash tables, trees, tries etc)

Some (very long) time ago I had an itch in the domain of hash tables/maps,
read man 3 hash (or maybe some other, I just do not remember), did not like it,
and, you guessed - I implemented my own sooper-pooper hash table, yeah.

It’s really good that Zig did not incorporated data structures
into it’s syntax as bash/python did - but slices are really grate,
it’s one of the biggest improvements compared to C.

  1. Data formats / protocols / (not to mention event handling)…
    no, stdlib is not the place for them

If you want to have all these universes in the default Zig distro,
maybe it would be better just level them up? I mean - do not place stuff
like http below /opt/zig-0.12/lib/std, instead put that up. :alien:

1 Like