What happened to the proposed source/sink semantics in IO?

In this talk Andrew brought up a great reason why he thinks that reader and writer should be renamed to source/sink. Not only is reader and writer technically wrong (a writer reads from the buffer to send data, a reader writes to the buffer to give us data), but it is also extremely confusing. Reader and writer imply agency, but in actuality they are objects. In writer.write(bytes) you are writing to a writer (whatever that means) whereas in sink.pour(bytes) it is obvious that the sink is the destination that is being fed bytes. As a bonus these semantics all work with flush perfectly already.

What he proposed in the talk is very exciting to me, but I am confused why after writergate the language still uses std.io.Reader and std.io.Writer. The change was already breaking, so why not commit to it fully? I hope it is not because the Zig foundation wants to make it more compatible with the naming schemes used in old code. To me, one of the biggest problem with programming is bad semantics confusing newcomers (or anyone for that matter) because they are inaccurate in conveying their true behavior. This what makes Zig appealing to me. It is set on being explicit, and is not afraid to learn from past mistakes to modernize programming (C) to be much more intuitive, rather than following the old ways that everyone is used to. So I hope this change will still happen, as Zig is still not 1.0.

8 Likes

Not really, the buffers are temporary, intermediate, storage that exists for efficiency and convenience.
It is neither the destination nor the source (excluding Reader/Writer.fixed).
And they do have agency, rather the interface implementations do, ofc they should give you control over their behaviour but that can only be given to the creator/owner of the reader/writer. As far as code that just has the interface is concerned, they certainly have agency.

I think the names are quite self-explanatory, they are objects that perform reads/writes agnostic to the underlying location*.

I am not opposed to naming them Source/Destination or Source/Sink or something else, as long as the names make sense.

The blog you linked is about the confusing and poorly documented buffer requirements and lack of convention with the stream and tls_client reader/writer s specifically, which has been fixed on master a while ago.
As well as the fact it was a big breaking change without a clear upgrade guide. It has nothing to do with the semantics.

The semantics of the interfaces are rather straight forward, to use at least, less so to implement. The main issue is the foot guns, that exist with all runtime interfaces, but are easier to step on with this particular kind. there are plans to address this

Perhaps I misunderstood what you are referring to.
Regardless, I encourage you to elaborate on precisely how their semantics are inaccurately conveyed.

3 Likes

Not really, the buffers are temporary, intermediate, storage that exists for efficiency and convenience.

I think you are fundamentally misunderstanding me. Andrew explained it better than me in the talk, at least in more detail. We are talking about two different kinds of ā€˜Agency.’

When I say ā€˜Writer’ implies agency, I mean grammatically/linguistically. Let’s look at the larger picture: in the function call writer.write(bytes), the ā€˜writer’ is the passive recipient of the action (the object), yet we name it ā€˜writer’ (the actor, the subject).

If I name a variable ā€˜writer’, my mental model, my schema if you will, imagines an active agent that I am asking to perform a task. If I name it ā€˜sink’, my mental model imagines a passive receptacle that I am pouring data into, which is far more accurate to what is actually happenning.

In addition to this newly gained semantic accuracy, this also conforms better to 0.15.1’s flushing requirements.

You don’t intuitively ā€˜flush’ a writer (why would an author need flushing, it’s not called writer.flush_poop?). BUT you do intuitively ā€˜drain’ or ā€˜flush’ a ā€˜sink’.

If Zig had renamed ā€˜std.io.Writer’ to ā€˜std.io.Sink’ during the overhaul, manual flushing would be far more intuitive in the documentation. It would have been obvious from the name itself. That is what I mean by ā€˜bad semantics confusing newcomers.’ The documentation now has to also explain the gap between the name (ā€˜writer’) and the required behavior (ā€˜buffering/flushing’)."

1 Like

The writergate update was already huge, and there’ still more to do. Next update will also be breaking with lots of async stuff.

I think source/sink might happen at some point, for example the doc comment for Writer.vtbl.drain already explains it in terms of sending bytes to a sink. It’s especially nice when you have pipelines.

4 Likes

Don’t give me too much hope or I might cry if it doesn’t happen.

4 Likes

I agree with you and Andrew and I am aware that you and Andrew are talking about intuition and subjective interpretation. And I agree with both of you.

I am pointing out that the linked blog is irrelevant.
That reader/writer names are not ā€˜technically wrong’ and that they do have some agency.

the writer either puts the data into its buffer, or lowers to the implementation if there isn’t enough space.
That is agency, and the semantics explicitly give the implementation agency for some details to allow for more efficient or simpler behaviour.

writer very much is an agent performing a task.

This is just a different perspective, both are accurate enough, neither more than the other.

Again, I agree with you and Andrew that Source/Sink are more intuitive with the behaviour, and are conceptually simpler due to our human experience.

I am not sure why Andrew didn’t change to these names.

The whole source/sink thing was discussed in the core team in the past but I guess it never felt pressing enough to actually act on it.

Personally I think it’s an exaggeration to say that those terms are flat out wrong. Of course when you ā€œreadā€ you have to put what you read somewhere, and same when you’re ā€œwritingā€ you have to take what you intend to write from somewhere.

That said you can definitely make a ā€œcat looking into the boxā€ meme out of this nomenclature, that’s fair.

As for newcomers, I think there’s an argument to be made that picking wildly different names for concepts that exist in more or less the same form in other languages can be an ineffective choice.
So unless you really are confident that the new names are significantly better, you risk paying this price (having to learn new nomenclature) and not get enough in return.

In conclusion, I’m personally not there yet (i.e. convinced that the new names are good enough to be worth switching), and maybe the same was true for Andrew when he did the interface work.

I wouldn’t say that the change has been rejected yet, but as far as I can tell at the moment we’re not committing to what was mentioned in the talk.

17 Likes

You can rename things you import.

const Sink = std.Io.Writer;
const Source = std.Io.Reader;
4 Likes

Not wrong, but 100 % ambiguous however you look at it. There’s a breadth of interpretations that you can apply and they’re contradictory. writer, write! write what? write where? reader, read! read what? read where? reader, don’t write! you can’t say that, as it does write. writer, don’t read! can’t say that either. one moves data leftwards in the argument list, the other moves it rightwards. They’re kind of symmetric, but opposite to what the natural English word order implies (where have you seen someone read outwards? where have you someone write not outwards?).

For the kind of brain that I have, it consistently plays tricks. Having used them a bunch, it just doesn’t install itself to me. Taking me as an example – it’s an accessibility issue.

I think it is bad design if mnemonics are needed to differentiate something, especially so basic to the ecosystem.

Source and Sink give something to hold on to – there’s a definite direction – source goes outwards (towards the arguments), sink moves inwards (from the arguments). There’s no such pivot with reader/writer.

You can rename things you import.

You can’t rename .writer() .reader()

2 Likes

The situation with Writer/Reader is totally messed up. It all stems from the insistence on using the same word to describe both the interface itself and the provider of the interface. That makes no sense. An interface is something that exists between two parties. It describes one side’s expectations of the other.

Consider how it is in real life. You walk into a restaurant. Suddenly, you’re a customer. Not because you’ve undergone some transformation. You’re a customer because that’s how employees of the restaurants see you. They expect you to order food, eat it, and pay. Conversely, your waiter isn’t a waiter intrinsically. He’s only a waiter in relation to you.

Getting back to Zig. The use of ā€œwriterā€ and ā€œreaderā€ as names for our generic interfaces is more or less correct. They properly convey the receivers’ indifference towards the providers. What’s messed up is the reuse of these terms (e.g. std.fs.File.Writer). Even if the semantic is correct, the pragmatic is wrong. We need to refer to these other things by some other name. ā€œStreamā€, for instance, is used in other languages. So instead of this:

    var file = try std.fs.createFileAbsolute(path, .{});
    defer file.close();
    var buffer: [4096]u8 = undefined;
    var writer = output_file.writer(&buffer);
    try foo(&writer.interface);

We would have this:

    var file = try std.fs.createFileAbsolute(path, .{});
    defer file.close();
    var buffer: [4096]u8 = undefined;
    var strm = output_file.createStream(&buffer, .write);
    try foo(strm.writer());
9 Likes

The naming issue here might stem from the inherited interface itself. The implementation of an inherited subclass has long been regarded as having an ā€˜is-a’ relationship with its parent class.

One problem with naming ā€˜Streaming’ is that File.Writer has two modes, one is positional and the other is streaming

1 Like

That’s the problem, isn’t it? We’re creating a programming language that eschews inheritance yet its standard library is forcing people to do polymorphism through subclassing. Manually.

I want to emphasize that I’m criticizing the implementation from a technical standpoint. That’s perfect fine. Nothing wrong with borrowing techniques from C++. We just need to be careful about how we name things so we don’t introduce dubious ideas. Names like FileWriter clearly implies polymorphism through inheritance. And that’s an awfully poor way to model how the world works.

Consider the real life scenario of you in a restaurant again. Are you a customer because you’ve inherited the customer traits from your ancestors? Of course not. If you’re wearing a shirt and shoes, you’re hungry and you can pay, then you’re a customer. Your capability to meet the expectations associated with a customer determines whether restaurant would interface with you as such.

4 Likes

The example you gave is a good reason why inheritance should not be abused as the core of a language, and we agree on this point.
Inheritance should not be used to express the classification of things in a bottom-up, ā€˜inductive’ manner. In other languages, most uses of inheritance are almost abuses—for example, having ā€˜Dog’ and ā€˜Cat’ inherit from ā€˜Animal.’ This clearly uses inheritance to express inductive classification; ā€˜Animal’ is just an induction of ā€˜Dog’ and ā€˜Cat,’ not their essence. Using inheritance to express this relationship is a mistake.
However, inheritance is reasonable when designing classifications in a top-down manner. In this case, the programmer is the creator of concepts and has the right to decide what meaning the abstractions he creates have, as well as to enforce that subcategories must hold parts of the parent category.

In your example, I think File corresponds to the abstraction of a ā€˜restaurant’ in your example, while Writer corresponds to the abstraction of a ā€˜customer.’ File.Writer is the abstraction of the ā€˜customer at a specific restaurant.’

Notice the difference? If we are talking about the abstraction from ā€˜person’ to ā€˜customer,’ we are generalizing from a concrete concept to a defined abstract concept from the bottom up. Using inheritance to describe this kind of abstraction is nonsense.

On the other hand, when we talk about the abstraction from ā€˜customer’ to ā€˜a specific restaurant customer,’ we are expressing a more specific, yet still definition-based, abstract concept from the top down, starting from an abstract concept that we defined ourselves.

We can think of a file as a reservoir of data. We create streams connected to it so data can flow to and from our application. It’s not a perfect metaphor but it does have the benefit of being actively used. Programmers are already familiar with it. The notion that a stream has a buffer is completely normal. That an interface has a buffer? Completely wild.

6 Likes

I understand your point. Another criticism of inheritance is that it mixes in code reuse, which can lead to abuse. It’s hard to say that making the buffer part of the parent class here is not intended for code reuse, and this way of using inheritance should be approached with caution. However, from another perspective, the design of this interface may have been carefully considered and concluded that ā€˜the buffer must exist,’ and promoted from top to bottom. If this is based on that idea, I think source/sink is indeed a more appropriate abstraction. This term is more persuasive in explaining why the buffer must exist.

2 Likes

Thank you for responding to my thread, I think topics like these need more attention in the programming sphere. There’s been a lot of discussion here, many people half-agreeing and half-disagreeing with me. I feel like I have to add some nuance and address some more points. I still remain set on my initial argument, but for reasons I may not have communicated clearly enough earlier, and now I will try to explain why.

With the 0.15.1 Northside buffering changes the writer interface effectively behaves like a sink. Earlier we used to call write(byte) which checked (read) the internal buffer and added a byte. It used to be an agent (even though it actually both read and wrote, it was focused on writing more). Now the buffer is north of the function call. The interface gives us a slice of available memory, we access the memory directly with no jump, and fill the buffer, and only when it’s full does it drain. Now the object std.io.Writer is no longer a doer of action, but a holding area with capacity. Writing to a writer (the buffer) invokes no implementation logic and is a pure memory operation.

As an aside, I also believe that having human professions as names for interfaces is generally not a good idea, since humans are much more multifaceted than memory operations. This false anthropomorphism is a legacy from the inception of Computer Science and OOP back in the 70s, and I truly believe that our lack of mechanical precision when building our schemas is one of the primary reasons why the philosophy of Data Oriented Design came about decades later than it should have.

While I think that reader and writer are somewhat correct, that is actually even more of a problem than reader/writer being completely wrong. That little bit of correctness is actually a trap. Now our mental model is in this ā€œuncanny valleyā€ of correctness. For instance, if a concept is named ā€œFronpā€, the user knows nothing (they have no schema of a Fronp) and reads the manual (0% confusion). If a concept is named ā€œWriterā€ but behaves like a ā€œSinkā€, the user thinks they know it and now their false confidence introduces a bug.

As for newcomers, I think there’s an argument to be made that picking wildly different names for concepts that exist in more or less the same form in other languages can be an ineffective choice.

Let’s try really hard to put ourselves into their perspective. Newcomers have to learn a word and a concept at the same time. If someone learns the concept of reader and writer incorrectly, they will likely develop a bad schema, and have to (in their head) translate it to a sink/source schema. If someone learning Zig for the first time learns source/sink (the more correct schema), then the mental image is more correct, and the price of renaming source/sink to reader/write in a different programming language is a quick translation cost. I believe that the constant subconscious mental image cost is greater than this instant conscious translation cost.

3 Likes

I concede on the blog post not being super relevant, it was a very minor addition to me but I regret linking it now.

the writer either puts the data into its buffer, or lowers to the implementation if there isn’t enough space. That is agency, and the semantics explicitly give the implementation agency for some details to allow for more efficient or simpler behaviour.

I’ll disagree here. In the new Northside model the vast majority of interactions are raw memory copies into a slice where the object is completely passive. It only wakes up to drain, and that’s triggered by the caller anyway. Since the primary user experience is filling a passive container rather than invoking an active decision-maker for every byte I think ā€œsinkā€ describes that relationship much more accurately than ā€œwriterā€, even if the plumbing underneath is complex.

3 Likes

YES!!! Thank you! And good job describing the core of the issue.

I didn’t have the words, nor motivation to find them, so I poked holes until you or someone else could get here. I could have communicated that better.

2 Likes

There’s nothing particularly odd about the interface struct having a pointer to the buffer. It’s just an implementation detail. We end up with the odd notion of a buffer attached to an interface because we don’t have something conceptually distinct on one side of the interface that would take ownership over the buffer. I mean, what’s std.fs.File.Writer? It’s a struct with the same name as std.Io.Writer but sits in a different namespace. What the heck does that mean?

If we call it std.fs.File.Stream instead, then everything becomes clear. Here’s my back-of-the-envelope sketch:

We have a pool of data in the form of a file. To use this data we need a stream that would carry the data from OS space into our app. At the end of the stream is an interface object that our data consuming code interacts with. Thanks to the interface’s standardized nature, our code can work with different types of streams. There’s a spill-over area that helps us manage data flow. When data shows up from upstream faster than it can be consumed, it’ll end up there. For efficiency sake we connect this buffer directly to the interface so that data doesn’t need to flow back into the stream first.

This mental picture of the situation is pretty understandable. There’s a purpose behind every component. The concept of a stream sits at the center. Without it the whole things fall apart.

9 Likes

Your stream analogy works for reading (data flowing downstream to us), but I think it fails for writing. Stream implies continuous constant flow, but a Northside buffer is actually accumulating data until it hits its capacity.

Renaming to stream fixes noun collision (stream vs interface) but it ignores the verb confusion (write vs flush). We still have the problem of stream.writer().write(). The user thinks they are writing (actively sending over data) rather than filling (passive). Because they think they are writing they assume the action is complete when the function returns.

If we called stream.sink() or file.sink() it would be obvious that the data is sitting in this spill-over tank and therefore needs to be pushed. Source and sink are standard concepts in physics (in Andrew’s talk he brought up audio engineers as an example) to describe the lifecycle of flow, so this would work perfectly for many people.

If we accept that the underlying object should be called a stream to fix the name space collision, it must still be accessed via sink() to fix the mental model. From the writer’s perspective the Northside buffer isn’t just a ā€œspill-over areaā€, it’s a tank that we are responsible for filling. So if the type were std.fs.File.Stream, then Stream.sink() should return the std.io.Sink() interface.

You are trying to solve a naming collision (std.io.Writer vs std.fs.File.Writer) by renaming the struct to stream. But if we rename the interface to Sink, the ambiguity disappears automatically. We don’t need a unique struct name like Stream to distinguish it, because Sink clearly describes a passive interface role, whereas File describes the resource.

If std.fs.File handles the OS resource (the file descriptor), then std.fs.File.source() returns the buffered view of the OS resource, and std.io.Source handles the Northside buffering interface, and source.read() and source.next() fit perfectly as actions on the interface. It follows that if the output interface is a sink, then input must be a source (for symmetry). std.fs.File.Stream being the object while the handle were std.io.Source would cause more confusion.

To be clear on my stance, I am suggesting that the interface, and the method to get it, should be named Sink, regardless of the underlying struct. You suggested changing writer to stream, but keeping the interface name as writer. I believe changing the interface to Sink solves both the namespace collision and the mental model mismatch in one move.

5 Likes