You want the new 0.15 IO Interface? We have IO at home

I have some projects that aren’t ready to break compatibility with 0.14, but, I really want to start using the new Io interfaces in 0.15. I’m happy to report I was able to copy what I needed from the 0.15 implementation into a couple projects and now I can start using it even on zig 0.14!

Here’s the commit where I backported it for zigx first pass on Writergate changes · marler8997/zigx@21a6728 · GitHub

With this, zigx now mostly works with 0.14 and 0.15. The new writer interface is very exciting because I finally have a solid mechanism to streamline the zigx API. The current interface has always been a bit clunky but I’m planning on replacing all my serialization code with much simpler functions that simply take a writer. This also solves the long-standing problem I’ve hesitated to solve which is how to implement X11 message buffering/flushing, it’s now all baked in to the new Writer interface. This was really a huge and much needed improvement to Zig, great job Andrew!

11 Likes

Haha yes! I was actually rewatching your talk from SYCL 2023, so happy to see that you’re maintaining it.

2 Likes

Yeah I second this…the new design has turned out to be a big improvement. I was ambivalent at first because my project is 25k+ lines and used many custom readers/writers internally. Transitioning them to the new interfaces took a few weeks but it was worth it.

One side benefit is that you can’t forget to buffer now. Many newbies have asked why their Zig code reads a file slower than language X, and that’s always why. Even my own benchmark got ~30% faster after the transition because apparently even I forgot to buffer in one place! Oops.

8 Likes

Moving to the new reader/writer with builtin buffering allowed me to remove a handful of allocators in my ZigX examples, most notably when you read the initial data from the server after you connect. Here’s a diff when I grep for the word “allocator” in my examples before/after:

12 Likes

That’s been the biggest change for me. Lots of things that are now much easier to implement without allocators because I can redesign them to take a *Reader instead.

Less memory allocated means less memory handling to screw up, I’m happy with this change.

6 Likes

Looking through this diff, isn’t most of the change because of swapping from an enum to an int (see all the lines which changed from try sym_key_map.put(allocatot, @intFrumEnum(...), Key...) to just one try keycode_map.put(allocator, @intCast(keycode), key))?

1 Like

I just merged the new writer changes to master. I estimate I spent maybe 40 hours on these changes. Here’s the before/after of the simple “hello” example:

BEFORE: zigx/examples/hello.zig at 22ec7ed2f8e2de87811daed40bace040da4c0ac1 · marler8997/zigx · GitHub

AFTER: zigx/examples/hello.zig at 4b4035caba8811c1df20d8d8722135c2896564da · marler8997/zigx · GitHub

note the BEFORE/AFTER also has some unrelated cleanup

The gamble I made with this new API was seeing if I could remove needing an allocator to read data from the server. Most X11 messages are small and fixed in size, but, some messages can include dynamic data of arbitrary size. The old API would require the caller to provide an allocator for those. With the new API, instead of reading the full message into a buffer with an allocator, the API allows the client to read each section of the reply as needed. The downside of this approach is it takes the code that reads/separates messages that was once in a single place and disperses it throughout the application…alot more surface area for mistakes. To address this, I wrote an “X11 Aware Reader” that would track the state of the incoming data as it’s read. It’s more complex than a simple function that takes an allocator and reads the entire message into a buffer but, it also unlocks new abilities. Take this example: zigx/examples/getserverfontnames.zig at 4b4035caba8811c1df20d8d8722135c2896564da · marler8997/zigx · GitHub

This example requests the full list of fonts on the server and prints them to stdout. Here’s the old approach:

const msg_bytes = try x11.readOneMsgAlloc(allocator, reader);
defer allocator.free(msg_bytes);
const msg = try x11.asReply(x11.ServerMsg.ListFonts, msg_bytes);
var it = msg.iterator();
while (try it.next()) |path| {
    try stdout.print("{f}\n", .{path});
}

We allocate a buffer for the message (likely a few dozen kilobytes), read the entire reply into it, then iterate over it as we write each string back to stdout. Here’s the new code:

const fonts, _ = try source.readSynchronousReplyHeader(sink.sequence, .ListFonts);
std.log.info("font count {}", .{fonts.count});
for (0..fonts.count) |_| {
	const len = try source.takeReplyInt(u8);
	try source.streamReply(stdout, len);
	try stdout.writeByte('\n');
}

Here’s a comparison of each approach:

Step Old API New API
1 Allocate large buffer No allocator/buffer needed
2 Read entire message into buffer (dozens of kilobytes) Read message header (32 bytes)
3 Iterate/write each font to stdout Stream each font to stdout

Note that we actually stream each font directly from the x11 reader to stdout. This means that it could potentially use sendfile and never even copy the data into the current process. This wouldn’t be possible with the old API.

Caching/Flushing

The other big problem this change solved was caching/flushing messages. The old API simply provided a way to calculate message sizes and serialize them. This left buffering up to the application. Up to now my applications have opted to take the simple approach, no buffer, 1 message per syscall.

// render the "hello window" with the OLD API
//   1 message per syscall
{
	var msg: [x11.poly_fill_rectangle.getLen(1)]u8 = undefined;
	x11.poly_fill_rectangle.serialize(&msg, .{
		.drawable_id = window_id.drawable(),
		.gc_id = bg_gc_id,
	}, &[_]x11.Rectangle{
		.{ .x = 100, .y = 100, .width = 200, .height = 200 },
	});
	try x11.ext.sendOne(sock, sequence, &msg);
}
{
	var msg: [x11.clear_area.len]u8 = undefined;
	x11.clear_area.serialize(&msg, false, window_id, .{
		.x = 150,
		.y = 150,
		.width = 100,
		.height = 100,
	});
	try x11.ext.sendOne(sock, sequence, &msg);
}
{
	const text_literal: []const u8 = "Hello X!";
	const text = x11.Slice(u8, [*]const u8){ .ptr = text_literal.ptr, .len = text_literal.len };
	var msg: [x11.image_text8.getLen(text.len)]u8 = undefined;
	const text_width = font_dims.width * text_literal.len;
	x11.image_text8.serialize(&msg, text, .{
		.drawable_id = window_id.drawable(),
		.gc_id = fg_gc_id,
		.x = @divTrunc((window_width - @as(i16, @intCast(text_width))), 2) + font_dims.font_left,
		.y = @divTrunc((window_height - @as(i16, @intCast(font_dims.height))), 2) + font_dims.font_ascent,
	});
	try x11.ext.sendOne(sock, sequence, &msg);
}

The new API decouples these two concerns. The number of messages per syscall is now a result of the size of the write buffer. No code changes are needed to adjust this relationship, the code is now agnostic of buffering.

// render the "hello window" with the NEW API
//   number of syscalls depends on writer, but, it'll probably
//   just be 1 syscall at the end to send all of them
try sink.PolyFillRectangle(
	window_id.drawable(),
	bg_gc_id,
	.initComptime(&[_]x11.Rectangle{
		.{ .x = 100, .y = 100, .width = 200, .height = 200 },
	}),
);
try sink.ClearArea(
	window_id,
	.{
		.x = 150,
		.y = 150,
		.width = 100,
		.height = 100,
	},
	.{ .exposures = false },
);
const text = "Hello X!";
const text_width = font_dims.width * text.len;
try sink.ImageText8(
	window_id.drawable(),
	fg_gc_id,
	.{
		.x = @divTrunc((window_width - @as(i16, @intCast(text_width))), 2) + font_dims.font_left,
		.y = @divTrunc((window_height - @as(i16, @intCast(font_dims.height))), 2) + font_dims.font_ascent,
	},
	.initComptime(text),
);

Conclusion

I think my gamble was a success! The approach of creating a “structured reader” that takes care of shepherding protocol-specific data over a generic reader seems to be the way forward. I’ve already started doing the same thing for DBUS here GitHub - marler8997/dbus-zig and I’m excited to start using these libraries in some real software!

8 Likes

This is one of the reasons why I love zig: it provides a really good balance between power and flexibility. You could still, for example, choose to use the new Io strategies and still pass in an allocator – even better, if you are going to use and discard the data right away (as in your font listing example), you could pass in a scratch (bump) allocator and enjoy simple code with a quick and cheap memory release. To me, this (plus the new Io abstractions) is the sort of toolbox I want to have available at all times. :heart_eyes:

2 Likes