Writergate migration experience

Here’s my experience on migrating the JSON-RPC library ZigJR to Zig 0.15.1. The Git log has about 30 commits. It took about 2 weeks of on and off working on the migration whenever I had time (rarely more than two hours on a session). Most of the work were actually migrating the tests. I used the tests as a driver for the migration - enabling the tests one by one and seeing what broke in the library, then fixing them.

The std.Io.Reader/Writer are an improvement over the old IO API. Most of the functionality has replacement in the new API. Buffering is mostly built in so don’t need to worry about buffered IO.

There’re some subtlety in the API needs close attention. std.Io.Writer.Allocating can change its writer interface’s buffer address when growing capacity, so any saved references to the buffered data can become invalid. Always use writer.written() to get the current buffered data. I had to track the offset positions of the read in HTTP header & value in FrameData rather than saving their []const u8 slices.

One thing that needs improvement is the way to get stdin and stdout. May be I did it wrong, but there’re quite a bit of ceremony, stdin & stdout. It might be a turn-off for newcomers to the language.

Overall the new std.Io.Reader/Writer is an improvement over the old ones.

Edit: BTW. There’s one remaining issue. The lsp_client example is not working on
Windows. There’s some issue when reading from the stdin of a child process. I’m not sure it’s my code or a bug in the std lib. See question.

13 Likes

this is true of all implementations, the buffer is temporary storage.
the difference here is Writer.Allocating moving the location of the buffer might result in a segfault.

otherwise it has the same effect as any other arbitrary implementation, the data referenced externally becomes undefined.

you did it correctly, this ceremony will be the case any time you want a reader/writer.

I’m not sure what you want explained? Did you intend to post in a different category?

I figured I’d tack on my own experience to this thread if you don’t mind…

For me the writergate transition is pretty hard because I built a lot of custom readers/writers. Using what the std lib provides is not that hard, but if you need to make your own implementations, there isn’t much documentation right now. It was hard to figure out how to write a custom drain function for a writer…how do I handle the data and splat parameters, what is the consume function doing, etc…

My project was kind of a perfect storm of breakage because, in addition to writergate, I used many other things that changed or disappeared in 0.15 (the http client, zlib compression, DoublyLinkedList, BoundedArray, managed ArrayLists, etc). I eventually got it building and tests passing with a single 4k+ line commit but I’m still using the deprecated readers/writers everywhere so the work is far from over.

I agree that the new reader and writer interfaces are an improvement though, so I can’t complain. Building the buffer into the interface is a really inspiring case of pragmatic design. Some people criticized it because it feels less “clean” than having buffering as a separate composable abstraction, as if abstractions are an end in themselves. Sometimes there are benefits to de-abstracting. Zig has forced me to change my programming philosophy a lot and I’m grateful for that, but I digress…

2 Likes

the VTable structs document quite well what the passed function pointers are supposed to do/not do.
heres drain for example

    /// Sends bytes to the logical sink. A write will only be sent here if it
    /// could not fit into `buffer`, or during a `flush` operation.
    ///
    /// `buffer[0..end]` is consumed first, followed by each slice of `data` in
    /// order. Elements of `data` may alias each other but may not alias
    /// `buffer`.
    ///
    /// This function modifies `Writer.end` and `Writer.buffer` in an
    /// implementation-defined manner.
    ///
    /// `data.len` must be nonzero.
    ///
    /// The last element of `data` is repeated as necessary so that it is
    /// written `splat` number of times, which may be zero.
    ///
    /// This function may not be called if the data to be written could have
    /// been stored in `buffer` instead, including when the amount of data to
    /// be written is zero and the buffer capacity is zero.
    ///
    /// Number of bytes consumed from `data` is returned, excluding bytes from
    /// `buffer`.
    ///
    /// Number of bytes returned may be zero, which does not indicate stream
    /// end. A subsequent call may return nonzero, or signal end of stream via
    /// `error.WriteFailed`.

Trust me, I read that docstring many times; in fact I think it appeared in a dream at some point. I admit I’m pretty slow, but it’s quite a joke to think that is sufficient. Studying the implementation of File.Writer.drain is ultimately what helped me the most, but it’s not light reading…I could only get through it with a glass of bourbon.

12 Likes

If you think the docs are insufficient, you should make an issue explaining the shortcomings so they can be addressed.

Or even a PR if you can.

That’s pretty brutal. Xit looks like a big project. Much respect to pull it off.

Yes. I have found that I need to go through the reader/writer source and tests couple times to understand its usage and implication. There isn’t much examples to illustrate their usage. I have to write small test programs to try out and verify the API. That’s why it took so long to migrate.

Other reader/writer implementations use the buffer internally. Writer.Allocating exposes it as a return value to the caller. There’s more a need to put a warning on the API returning the buffer.

Really?!

2 Likes

It’s a legitimate question, not sarcasm, I’m sure. I agrees that it does come across as it being incorrectly tagged.

“Explain” is usually used for uncertainty about language features or the stdlib, usually with a somewhat concrete question attached. You seem to have a handle on this, and haven’t asked for anything in particular to be explained, just shared your experiences with porting and invited others to do the same.

1 Like

The Díataxis model identifies four distinct kinds of text which are all called ‘documentation’. I don’t consider it canonical, just useful.

Those being:

  • Tutorials
  • How-to Guides
  • Technical Reference
  • Explanation

They have relatively little to do with each other, and that can lead to the sort of misunderstanding you’re exhibiting here.

Having just a technical reference (which is all a doc comment should really be) for a complex function like drain, is all very well, but it does not, and should not, go into detail about why it works the way it does, how that fits with everything else, and how to correctly implement it, given novelties that by definition will appear, or there would be no reason to be writing a new implementation.

None of the other three quadrants magically appear on their own, and submitting a PR to the doc comment will not create them. That is, it does not follow that a dearth of documentation implies some sort of criticism of that which does exist. It may be that the doc comment for drain is perfect as-is, and it would still be quite beneficial if there were a detailed tutorial on implementing it, a thorough explanation of it, and some how-tos on using it: none of which fit above the VTable slot, and don’t belong there even if they did.

9 Likes

Here’s the description of the Explain category.

Looking to understand more about Zig? You’re in the right place!
The Explain category is dedicated to free-flowing discussions about the
Zig language and its internals, best practices, surrounding
ecosystem, and other Zig related topics.

If you are facing a specific problem that could be solved directly,
please consider using the Help category instead.

Perhaps you can help me to choose a correct category?

That is not relevant, being cautious of referencing the buffer while the interface is in use applies to all implementations. Even when you have the implementation, not just the interface (you’re the caller), it still applies because you can use it multiple times.

The buffer being the output/input instead of some other underlying thing is irrelevant.

Erm, to do that you would already have to understand the interface, no?

Given the minuscule number of projects in allyourcodebase that are Writergate compliant, that would suggest that either:

  1. The documentation is lacking and insufficient
  2. The changes required are subtle, difficult, and extensive.
  3. Both

Blasting the previous interfaces into /dev/null without providing shims for a version or two makes this a hard transition.

Don’t get me wrong. I actually like the new interfaces. However, the invasiveness of the change means that your entire dependency chain is likely broken.

Sadly, I’m probably going to have to revert back to 0.14. The required changes are simply too titanic for me right now.

3 Likes

We volunteered for sometimes hard transitions by using a pre 1.0 language.

As volunteers, we can help by working through the transition, and sharing our knowledge of what that entails. Even better, doing it and maintaining a positive attitude ;-). Someone always has to go first.

In this area, I expect 15.2 will have a lot of fixes (and possibly some breaking edge cases).

2 Likes

Other reader/writer implementations don’t expect its internal buffer being accessed. Allocating.written() explicitly returns it. What is your concern on giving a warning in the API documentation?

good point, they can talk with someone here who does understand it to figure out the issues with the docs + they would probably understand it as a result.

their are shims, the old reader/writer, both Generic and Any have adaptToNewApi

the new Reader has adaptToOldInterface, but not Writer for some reason.

but looking at the impl you can just:

AnyWriter {
  context: &interface,
  writeFn: derpWrite,
}
//....
fn derpWrite(context: *const anyopaque, bytes: []const u8) anyerror!void {
   const writer: *std.Io.Writer = @ptrCast(@alignCast(@constCast(context)));
   return writer.write(bytes);
}

that is a difference, but it doesnt change the logic of being cautious about referencing the buffer while its in use.

Warning people is fine, though I dont think it is necessary. A buffer being temporary memory and/or an output is common and obvious.

As @mnemnion said, doc comments should generally be a technical reference, not a tutorial. There are plenty of blogs, posts and videos already about the new interfaces that will give you all the warnings, including the more important one of not copying the interface out of the implementation. Which is clearly a common enough foot gun that it should be documented.

Documenting the behavior of an API is exactly the technical reference. Explaining when the data is safe to use, when the data is not safe to use, documenting any unusual behavior, and giving warning about how the reference becoming invalid are exactly what an API documentation should give. It has nothing to do with blogs, forum posts, or videos. Why are we so death set on gatekeeping basic information on the std library API?

2 Likes

Switching to the new I/O for my toy language interpreter, which only uses stdout and stderr, was the same amount of work as adding exception handling to the language…

What bit me was the need to flush both writers when both stdout and stderr write to the console, to get the same behaviour as before (the program uses stderr for debug output and errors and stdout for the normal output, and I want to see the output when I expect them logically).

I ended up using a global helper struct, just for something that should be trivial. I don’t claim that’s a good solution, it’s just one that works for me, good enough so that I can focus on the interpreter itself again.

I think the new I/O is a very high hurdle for newcomers.

If it was just for buffering, I’d say better revert this,
because the API is too complicated, and writing the data in big chunks can also be implemented in user code where it actually matters.

But the promise is that it’ll allow async etc.
So let’s wait until we see examples for that,
and then evaluate if the pros outweigh the cons of the (still) uncommon and a bit complex API.

Regarding documentation, useful documentation is generally hard to find for Zig. Google often points you to outdated sites like zig.guide or to discussions here on ziggit, where you have to read lots of text to find out what’s the solution, if at all.

There should be some “authoritative” best practice examples for typical tasks on ziglang.org, where developers looking for an example can trust that it will actually work.

1 Like

I agree completely with this.

My interpretation of warning in this conversation was a non-technical explanation of what not to do, which is where my saying it is unnecessary and out of place in documentation came from.

Please keep in mind miscommunications can happen before making accusations against people.

2 Likes