Zig community documentation effort

A couple weeks ago there was long thread about the poor state of Zig documentation. The OP suggested creating a wiki where the community can help out. I suggested that a public Git repo is a better option based on the stronger sense of the permanence and also greater convenience.

I have gone ahead and created such a repo at Codeberg. In the project wiki page, you’ll find a description of the idea behind the project and some basic guidelines.

I’ve created pages for builtin functions so far. The repo is sort of usable. You can clone it and stick in your workspace and see for yourselves if it’s a convenient way of looking up information.

Let me know what you think and if you’re interested in helping out. The bootstrap phase of the project involves mainly copy-and-pasting. You won’t need to put in a huge amount of time to make a difference. A few minutes a week across multiple people will add up. The key thing is to get the ball rolling.

18 Likes

A few random comments based on what I see. Take them to be worth what you paid for them :wink:

One of the best things this project could do is add working simple but useful examples. For sustainability, it needs a way to pull these examples from working buildable code with a CI process so that breakage is instantly detectable. Such a process would have prevented the typo in the @popcount page’s code (“vector” is misspelled).

There will need to be branching/tagging organization to capture Zig version specifics, and a guide for contributors for using that system correctly.

I looked at @popcount, and I’ve always curious what are the actual use cases for this and similar primitive. The info so far didn’t help with motivation for the builtin’s existence.

I looked at one that mentions a builtin working with “vectors”, but this term is ambiguous and I think is specific to @vector types, not general “vector-ish” things that people bring their previous expectations to. Docs like this need to be precise in terminology, and hence there is a place for a glossary of terms.

1 Like

The existing examples aren’t very good. I just copied them over from the existing reference. They really aren’t examples at all. Just unit tests.

@popCount is something that you know you need when you need it. I think it’s used in data compression.

This looks interesting. You remind me that I need to get the ball rolling in my Zelf Book too.

Trying to make it readable without any preview is also a good idea, but I feel like a website (even a Codeberg Pages website) would help it too. Is something like that planned?

I am interested in creating documentation, though my own project (the “Zelf Book”) needs attention by me too. So I think I’ll contribute when I have the time…

And a small note: All your issues about the language reference are titled with “LanGAUge” instead of “LanGUAge” :slight_smile:

I’m excited!

I actually use @popCount’ a lot, it’s great when you are working with a bitfield:

Example 1, counting the number of valid entries. Let’s say I have a bitfield where each number has 1’s where an entry is valid or not.

fn countValidEntries(T: type, bitfield: [ ]const T) usize {
   var ans: usize = 0;
   for (bitfield) |tmp| {
      ans += @popCount(tmp);
   }
   return ans;
}

Also works with vectors.
Other use case, iterating over 1 bits on a bitmask, you can do that with a while loop, but if you popcount you can get a more predictable branch!

fn method1(T: type, mask: T) void {
   var copy = mask;
   while (copy != 0) {
      const bit = @ctz(copy);
      copy &= copy - 1;
      foo(bit);
   }
}

fn method2(T: type, mask: T) void {
   var copy = mask;
   for (0..@popCount(mask)) |_| {
     const bit = @ctz(copy);
     copy &= copy - 1;
     foo(bit);
    }
}
// The previous code was bugged I Just noticed...
// It's fixed now

Actually, here’s a snippet from an actual project:

    fn bitMaskFilterPass(bitField: []@Vector(4, usize), bitmasks: []const u32, firstBitFieldIndex: usize, needleMask: u32,
        atomicInterrupt: *const std.atomic.Value(bool),
        stateAddr: *CachedStateCompletion,
        comptime settings: struct {
            optimization: union (enum) {
                SIMDonly: enum {
                    IgnorePreviousZeroes,
                    RespectPreviousZeroes,
                },
                Dynamic: void,
                SparseCachedOnly: void,
            },
        }
        ) void {

        // std.debug.print("needleMask: {b}\nOther masks:\n", .{needleMask});
        // for (bitmasks, 0..) |bm, i| {
        //     std.debug.print("{}: {b}\n", .{i, bm});
        // }
        
        // TODO find the best way to do this constant and switching thingy. You could even take comptime information.
        const oneBitVsManyCutoff = 20;

        var currentOffset: usize = firstBitFieldIndex * bitsPerVector;
        defer stateAddr.* = .{ .bitMaskPass = @intCast(currentOffset / bitsPerVector) };

        for (bitField[firstBitFieldIndex..]) |*currentVectorPtr| {
            if (atomicInterrupt.load(.monotonic)) return; // Early return if we get interrupted

            var vec = currentVectorPtr.*;
            if (@reduce(.Or, vec) != 0 or (settings.optimization == .SIMDonly and settings.optimization.SIMDonly == .IgnorePreviousZeroes)) {
                

                for (0..4) |i| {
                    // TODO - could potentially use a pointer instead of totalOffset and save some CPU cycles per loop.
                    const totalOffset = currentOffset + i * @typeInfo(usize).int.bits;

                    vec[i] = switch (settings.optimization) {
                        .SIMDonly => |SIMDsettings| blk: {
                            const result = SIMDfullFilter(@ptrCast(@alignCast(&bitmasks[totalOffset])), needleMask);
                            break :blk switch (SIMDsettings) {
                                .IgnorePreviousZeroes => result, // This allows us to pass in undefined memory for vec[i]
                                .RespectPreviousZeroes => vec[i] & result,
                            };
                        },
                        .Dynamic => switch (@popCount(vec[i])) {
                            0 => 0, // Do nothing if everything is zero
                            1...oneBitVsManyCutoff => sparseCachedFilter(vec[i], @ptrCast(&bitmasks[totalOffset]), needleMask),
                            // & because it's possible current vector has discarded entries for different reasons than bitmask, so we want to keep them discarded
                            else => vec[i] & SIMDfullFilter(@ptrCast(@alignCast(&bitmasks[totalOffset])), needleMask), //If we are above that, SIMD math
                        },
                        .SparseCachedOnly => sparseCachedFilter(vec[i], @ptrCast(&bitmasks[totalOffset]), needleMask),
                    };
                }

                currentVectorPtr.* = vec;
            }

            currentOffset += bitsPerVector;
        }

    }
6 Likes

As mentioned in the project wiki, the plan is to make the materials available through a JavaScript API so that web sites providing Zig lessons can make use of them. From a learner’s standpoint, it’s more convenient to have everything under one roof. Moreover, people who create Zig lessons are best qualified to contribute to the project. My thinking is that if they would be more inclined to help out if we help them gain more page views.

2 Likes

I think it’s an excellent idea to have a doc project be a repo of md-formatted text files. (This is how I keep my own notes. And — like many folks I’m sure — I even wrote my own script to stitch them together into html for easy reading and navigating).

A few general comments on docs and doc projects:

It’s hard to write good docs. And, for a number of reasons, it’s hard to get contributors to write them.

Reference docs are a specific type of document. For some projects, the source code doc comments constitute the reference docs. I see Zig’s std doc comments appear to already be rendered at https://ziglang.org/documentation/master/std/. I think it would be a mistake to just pull doc comments and put them into md files somewhere else.

In thinking about the different kinds of docs a project might have, I see a handful of different types:

  • tutorials / guides : teach the user how the thing works
  • autogenerated reference docs : pithy details, usually generated from the doc comments. See Zig’s std docs. Tools for this: doxygen, javadoc, rustdoc, … the zig compiler (?). You might be able to access these from the repl if your language has this (see Python, Julia, and Janet).
  • full reference doc: see the Zig Language Reference or the Python language reference
  • manual / man pages : like autogenerated reference docs but often longer and not generated from source code doc comments (example, try $ man 3 printf). Easy to access from the shell.
  • examples : includes some supporting comments
  • cookbook: like examples, but focused on one particular task, and includes more explanation (see the Haxe cookbook).
  • quick-ref : useful if you’ve already learned the thing but haven’t used it in a while and need a refresher.
  • cheatsheet : very brief and compact (example https://perldoc.perl.org/perlcheat).

Looking at that list, I think there’s room there for multiple separate community doc projects. :slight_smile:

I think that, if you want to make a “man pages for Zig” project, but don’t want to step on the source code doc comment’s toes, a good solution might be to find a way for a command line program to pull the doc comments and then append your own prose to supplement the doc comments.

The Clojuredocs project does something like this: it looks like they produce the reference docs, and then allow users to submit examples, and also “notes” (prose). Alas, most users prefer to add examples (with explanatory comments) rather than writing separate prose. (In fact, many users may not even know the “notes” section is there — I didn’t for quite a while.) See, for example, their docs for if-let.

1 Like

My 2¢ as a lurker to the documentation thread you mentioned, and someone who agrees that the state of Zig documentation should be improved. I appreciate your progress on this, so please don’t take my criticism personally :slight_smile:

TL;DRI worry that this project will ultimately provide less value than desired because it remains oriented around the merge-request model of knowledge curation.

The advantage of the merge-request model is that it is possible to realize a vision for the body of work as a whole. Forcing all new content through the merge-request bottleneck means that all new content can be reviewed for consistency and correctness. For official products such as the Language Reference, this is of course necessary. Glancing at your project wiki, I see that there is a particular technical vision that you would like to realize. If the goal of the project is to realize that particular technical vision, than the merge-request model probably makes sense.

But if the goal is instead to create a maximally-useful, up-to-date, community-centered repository of knowledge for the Zig programming language, then I have observed that the disadvantages of the merge-request model have proven to be a consistent obstacle to that goal.

Forcing all new content through the merge-request bottleneck means that all new content is dependent on the attention of the maintainers – or in practice, the one maintainer. Take for example https://zig.guide. This is an excellent resource that I have used multiple times myself. But unfortunately, for much of the last year it remained out-of-date because it depended on a maintainer to merge in new content. This is not in any way a dig at Sobeston – thank you for curating such a useful resource! I mention zig.guide to illustrate that with a subject matter that changes as quickly as Zig, a resource that relies on a single bottleneck for new content can quickly and inevitably become out-of-date.

So what do I recommend? I believe that a project oriented around the wiki model of knowledge curation is the best fit for Zig community docs at this time.

While the merge-request model is approval-first, depending on (usually one) maintainer to merge in new content, an actual wiki is moderation-first, where a scalable admin team can clean up spam and garbage and the mostly-trustworthy community as a whole can crack on with writing new articles, updating old ones, creating lists, tutorials, traveler’s diaries and whatever else might be useful.

A-thousand-and-one video game wikis have shown that this approach works, even for subject matter that changes rapidly and demands a depth of coverage. I think that an actual MediaWiki instance (or a similar modern offshoot) would be more resilient than any of the Git-centered resources that have sprung up over the last few years. And of course, Ziggit already has this as part of the site!

I wish that I could say that I had the free time right now to get this off the ground. But I write all this up in the hope that one of the many members of the community who do have the time and interest in improving documentation will take this path and see where it leads.

Thanks for reading.

5 Likes

From what I’ve seen over the years: if you have a wiki for content that people want, and where anyone can contribute, you end up with a lot of vandalizing/spam for maintainers to clean up. That’s not sustainable.

One way to combat this is to have a simple password set up for the wiki such that only trustworthy community members can edit it (because only they are given the password). (Sorry, I can’t remember where I saw this implemented…)

But then, if you’re going to do that, you may as well just use a codeberg repo of markdown files: after a person makes a few edits and has shown themselves to be trustworthy, you can add them as a contributor so they may edit the docs at will. (And this is, in fact, what @chung-leong is proposing as described in his wiki/readme).

If you have a doc site / wiki that mostly consists of small, self-contained articles (for example, if the wiki was for tips for growing different kinds of vegetables), then this style doc site may work very well.

But if you have a site where the docs are longer, and require more care to write, then from what I’ve seen, you end up with a mess that drives good writers away. It goes like this: Author A takes time to write a quality article that has room for improvement. Author B comes along and makes their changes to it, but in the process degrades author A’s work (moves sections around, introduces inconsistencies, etc.). Author A doesn’t want to argue with author B, but also doesn’t want to fix up author B’s work, and so author A loses interest and leaves.

So a “gated” wiki, or a repo with a hands-off many-contributors approach, has its drawbacks too. Tanstaafl :person_shrugging: . IMO, a thoughtful maintainer with a vision of what they want the project to be, and who can say “no” when necessary, is extremely valuable. :slight_smile:

2 Likes

Hello! I find this idea (and the current state of the execution) not only very good but very easy to get useful fast. Imo, we do not need perfect rn, but some way to get us moving fast when version changes or when you wanna do a specific thing but you dunno how to do it in zig. For example, something that gets very frustrating for me all the time are three things: read and write a JSON, open a file (specially with Io being changed rn) and casting ints and floats (all the @as and @floatFromInt). None of those are specially difficult, but all of those require me to always look it up, not find whatever I search for and then open past projects to remember what did I do xd

I would mainly do two things to kickstart this very fast:

  1. Topic Priority List: Try to prioritize what we want to cover and remove everything which is not finished (or list it in an index). I suggest to open a thread in ziggit for ourselves to expose which were our pain points and be able to fast good quality finished posts quick.
  2. Reorganize current organization: I feel that organizing the documentation as the language makes it less useful for people who does not understand language internals. I think those people do not need help haha.

So, a concrete example. Imagine we have a folder called input/output and we have files covering different use cases: folder about files another with print… in print we could have astdout/stderr where we could explain how to write to stdout: why we do it like that and why do we need a buffer, a link to Loris video about what actually is print, and the code examples.

I think what I propose is maybe a different approach to what is propsed here? idk, just giving some ideas anyway :))

A wiki would be nice.
My general experience with whatever kind of documentation is that it is often too much material wildly spread all over the place and you can’t find anymore what you are looking for.

3 Likes

I don’t agree. The Arch Linux wiki is an excellent counter-example to this: it’s very useful, full of content that people want, anyone can contribute as long as they create an account. And yet, much like Wikipedia, it has a comprehensive code of conduct and contributing guide + motivated contributors that enforce those rules + talk pages for settling editing debates, and of all this means that it works and that >95% of all pages are of truly high quality.

7 Likes