Zeppelin - cross-platform 2D graphics

marler8997 · April 10, 2025, 1:57pm

By “control” what I mean is giving the application the “control”. In this case Zeppelin is making an intentional decision to take control away from the app. Because you are implementing your own WndProc function, and not providing a mechanism for the app to execute anything inside WM_PAINT, you’ve taken that control away. I understand the reason for this is you believe you can achieve an acceptable result without requiring this mechanism and I’m interested to see what you come up with here, but just wanted to make it clear, you’re restricting what the app can do in the name of simplifying your library’s API.

I see the callback mechanism as

optional (app still has the control/power to use it or not)
a good default

I’ve learned that implementing your own main loop in general is a footgun. Each platform will have its own idiosyncracies, on Windows the big one is realizing that your loop isn’t guaranteed to be the one that is running. At any point the OS could take over and start running its own loop to service the message queue. (note: I see you have a comment in your own main loop about this // this might look like it would handle every single event but it does not) This bit me hard once as I was leveraging “thread-specific” messages (not tied to any window) and handling them in a custom message loop…this resulted in a race condition where sometimes I would lose some messages because sometimes it would be the message loop inside ShellExecute that would be running when the message was popped off the queue rather than my own.

That being said, there are times where an app will want to implement their own message loop, but, I’d consider this advanced usage. For a main loop specifically, I think it might be better not to abstract this away and just let the application start using the platform-specific API’s directly. For most apps though, I think the default main loop is both simple to use and probably results in the behavior they want, which is why I think it’s a good default, only to be overridden when necessary and hopefully after they’ve learned some things. I think this is the type of thing that deserves it’s own example/documentation that can then highlight all the footguns and educate the user on why they may or may not want to do this.

P.S. just to be clear though, I get why you’re wanting to try avoiding this mess, I applaud this effort and encourage you to keep going in this direction

ssmid · April 10, 2025, 2:33pm

The design idea of the event system is a data-oriented one. Check what happened on the OS side (+ process it if really necessary) and return it to the application as fast as possible. From this POV, the application has 100% control, just that I didn’t care to emit an event equivalent to WM_PAINT to the application, because the application already knows from the swap chain whether to render or not. On linux, where everything happens on a unix socket, this data-oriented approach is very natural. On windows, the issue is the last part: “return the data as fast as possible”.

Theoretically, the application is even able to access all of the platform-specific code, the only thing it can’t do is to hook into the data-gathering phase (e.g. during a WM_PAINT event) and ideally, this shouldn’t be necessary as this phase should be very short and lean (in the microsecond range). That’s why I’m not thinking of it like taking control from the application. But I can totally get the point when using zeppelin on windows, today, as moving/resizing does not return control to the application in finite time, but this will be fixed, one way or another. However, for reasons stated in the last message, I feel much more strongly about not letting the application control the main loop: You can have many libraries but only one framework, as there is only one main loop. Zeppelin strives to be a library.

marler8997 · April 10, 2025, 4:24pm

The design idea of the event system is a data-oriented one. Check what happened on the OS side (+ process it if really necessary) and return it to the application as fast as possible . From this POV, the application has 100% control, just that I didn’t care to emit an event equivalent to WM_PAINT to the application, because the application already knows from the swap chain whether to render or not.

Sadly this all comes back to Microsoft’s decision to use this callback-based mechanism. There’s just no way to fully satisfy their interface without a callback, it’s an “order of operations” problem rather than a “timing” one. You can do tricks/things to workaround this and make some things work some of the time, but if you don’t update the content within the WM_PAINT callback, and call Begin/End paint, then some things are just never going to work the same as they would otherwise. So I wouldn’t say it’s accurate that you’re giving the application 100% control, you’ve prevented them from satisfying this operation order that seems to be required to make the app behave a certain way.

I actually submitted a new commit to my raylib win32 backend that tries to mitigate this by instead of calling Begin/End paint, instead it stops processing messages immediately and returns control to the main app, then uses ValidateRect to signal that the window contents are updated once the app has finished drawing. You may want to consider using this same mechanism for you library, I’m curious if we could see any noticeable affects from giving the OS slightly more accurate information.

I feel much more strongly about not letting the application control the main loop

We both agree that a library should allow the application to control the main loop. I guess I didn’t word my last message very well because this was one of my main points I believe the library should allow the application to write their own main loop, but, at least on the “Microsoft Windows”, my view is that this is not what most applications want to do. In most cases if you’re diverging from the standard message loop, you’re probably unknowingly introducing subtle bugs/race conditions. It should be allowed but, my point was if you write an application that has a custom message loop, you should really know what you’re doing and at that point, it might be better to not try and make an abstraction but let the App call the OS’s APIs directly. I understand that sentiment of not wanting this to be true and again, I hope you prove me wrong with your simpler approach

ssmid · April 10, 2025, 4:42pm

In the end, it’s a trade off and I don’t yet know how far one can go with this approach. If shake-to-minimize turns out to not work it wouldn’t be great but so be it. I hope to enable application developers (including myself) to deliver a very decent user experience while still preventing them from having to deal too much with this “legacy” operating system called “Microsoft Windows”

marler8997 · April 10, 2025, 5:51pm

I never use shake-to-minimize and don’t know that a lot of people would miss that one. Losing out on “Window Snapping” is a much bigger deal (dragging window to the edge of the screen and having it snap to one side/etc). I dont’ know how you’d both override the default move window handler and implement that for both windows 10/11 for example. Seems a ridiculously hard task, I’m guessing you’d make the app choose to either give up main loop control while the window is being moved, or, lose out on window snapping but man that would be a bummer.

weskoerber · April 10, 2025, 7:29pm

Shake-to-minimize is one of those features that has a very niche use case so it almost never gets used – but when it does it feels so good.

I can probably count on both hands the number of times I’ve used shake-to-minimize, but each time I did, I thought, “ooh that’s very nice!”

marler8997 · April 10, 2025, 7:45pm

On another note…I see you’re using some C libraries for your X11 backend. You might be interested in my zigx project.

It’s also featured in this talk: https://www.youtube.com/watch?v=aPWFLkHRIAQ

I’ll probably be updating/improving zigx as I implement my own windowing library. If you’re interested in using it let me know.

andrewrk · April 10, 2025, 7:51pm

Sorry, I didn’t study the discussion thus far very carefully so this may be a naive comment, but, couldn’t this callback put an event on the main loop queue and then yield until the next frame or whatnot, and then trigger resumption? From Windows’s perspective then it would be like a normal function call of the callback.

marler8997 · April 10, 2025, 8:38pm

Apologies if this is over-explaining, but to make sure everyone is on the same page, here’s a very simplified version of how the Zeppelin/Raylib API works:

while (shouldKeeepRunning()) {
    serviceMessageQueue();
    // draw the stuff
}

The problem with Mr. Gates “Microsoft Windows ©” is that if you want builtin window snapping/gestures you must give up control of the main loop during the resize/move. More specifically, while the window is being moved/resized, our serviceMessageQueue cannot return.

While this resize/move is happening, when it’s time paint, we could add a new event that gets serviced later, but, without a callback function from the application to call, we’re just kicking the can down the road. Even with a new event, we still can’t return from serviceMessageQueue because we’re inside the special Microsoft “window snapping ” message loop.

So if we want to be able to update the window content during resize/move, we have two options. We can provide a way for the application to specify callbacks, or, we can intercept and stop Microsoft from taking over our message loop, but, then we lose out on those special builtin snapping/shaking features.

andrewrk · April 10, 2025, 8:57pm

Right, I see, and serviceMessageQueue is basically implemented in user32.dll or whatnot, and will crash/fail if not run from the main thread.

I suppose you would need to do your main loop in a separate thread, have the “main thread” be the one that interacts with Windows DLLs, registers callbacks, and then exfiltrates those events to the main loop. Again - it can yield during the callbacks until the next iteration of the (separate thread) main loop.

I think that still works, no?

marler8997 · April 10, 2025, 9:15pm

Yes using another thread is another way you could try to address this. While Bill holds our main loop hostage we could have another thread come in and try to update the content somehow. I’m a little fuzzy on the details here though as I’ve never actually tried this solution, it’s just rumors/stories I’ve heard Windows has enough gotchas I’ve never thought it worth the effort to try and deeply understand what does and doesn’t work when you add more threads into the mix. That’s not to say you can’t have multiple threads, just that you usually don’t want other threads directly interacting with your window objects other than passing messages to them. I’d wager it’d be much complicated than a callback, but, I assume this question is for deeper understanding of the system…in the words of Slughorn…“This is all hypothetical isn’t it…? All academic?”

As a side note, I have had multiple threads running their own message loop (this is needed for things like some low-level “hooks”). I’ve even had multiple threads, each with their own set of windows, but never the two shall mix, other than message passing. In fact I just did this in Tuple to implement our new “App Veil” feature which let’s you exclude windows from screen capture. It has a special thread that creates backdrop windows that follow other windows around to show they are being “veiled” from the screen capture. I chose to put all these backdrop windows in their own thread apart from the main app.

david_vanderson · April 10, 2025, 10:08pm

I’m pretty sure this is to give time for a double-click to happen. I wonder if the delay changes if you change the double-click speed?

This kind of complexity is why dvui doesn’t do double-clicks by default: DVUI Devlog

ssmid · April 11, 2025, 10:17am

X11 is not supported, it’s just a leftover from a renderdoc debug session that I still need to remove. (Renderdoc does not support wayland and headless capture did not work). And considering that many distros are enabling wayland by default nowadays, KDE splitting off X11 support, even Cinnamon and Xfce implementing wayland, I think it’s not worth the trouble, at least at the moment.

@marler8997

ssmid · April 11, 2025, 10:26am

This is also what Android does for native apps with its android_native_app_glue library.

marler8997 · April 11, 2025, 3:21pm

Ah ok. My approach is to support X11 before Wayland since as far as I know, all major linux distributions that support Wayland also support X11 (via XWayland). X11 also seems to support more use cases (like remote clients/servers) than Wayland at the moment so X11 still seems to be the choice for most compatibility at the moment.

I’m not sure why but it seems like Wayland has had such a hard time coming up to feature parity with X11, and everything takes a long time. It also seems like they have a hard time coordinating/landing on unified solutions. Seems like we keep ending up with many different solutions for the same thing and it’s hard to get the compositors to coordinate on one. This is just what it seems from a casual outside observer…anyone have actual experience knowledge of where Wayland is at now and how it’s comparing to X11 nowadays?

ssmid · April 11, 2025, 3:56pm

Wayland took a very long time because the wayland devs have a hard time agreeing on anything. Partially, this is due to the fact that the two major desktop environments have vastly different design philosphies. KDE implement all the features and use cases you could think of, even if some of them will be used only by a handful of users. Gnome, on the other hand, have a very opinionated vision for the linux desktop and leverage all the power they have through Gnome Desktop and GTK to shape the protocol in their favor.

For example: I don’t know how it came that in wayland decorations are implemented client-side per default but everyone supports both and defaults to server-side decorations, except Gnome, who only support client-side decorations.

And then there are the general problems of the linux desktop:

design by committee
big variety of implementors both on the server and the client-side
very different levels of available resources
Nvidia doing there own thing, e.g. with implicit sync
huge variety of use cases

Just have a look at linux-dmabuf: Linux DMA-BUF protocol | Wayland Explorer
2-3 pages just about texture format negotiation.

However, these days wayland is pretty feature-complete, at least as far as I can tell. Global hotkeys are still a problem afaik (mainly for security reasons, to avoid key loggers like on windows and probably X11) and some few electron apps not updating to a newer electron version and therefore not supporting screen capture of individual windows or something like that. Can’t tell though if that’s still the case, I haven’t had any problems in the last years with that.

neurocyte · April 11, 2025, 4:09pm

Wayland finally overtook X11 around the beginning of 2024. Application compatibility and hardware support is now fine for 99.9% of users and Wayland even has quite a bit of an edge in terms of features. X11 will likely be around forever as a legacy protocol, but Wayland is now very much here to stay. If fact distros are increasingly announcing plans to remove out-of-the-box support for X11 and let users only install it if they explicitly need it/want it.

I’d say “Wayland first” is the way to go for any new development and support for X11 is mostly optional.

TUSF · April 11, 2025, 6:14pm

Pretty sure it’s because the Wayland core has no concept of a shell or decorations at all. This is, as I understand it, because it tries to make as few assumptions as possible.

The ones who dropped the ball here (in my opinion) are the ones that designed the xdg-shell protocol, and for some reason decided to omit any acknowledgement of decorators.

gonzo · April 11, 2025, 7:21pm

As far as I understand, the Wayland developers are also the X developers, and they have firmly said that they don’t want to work on X anymore. Therefore, Wayland should be (and perhaps already is) the default.

permutationlock · April 12, 2025, 9:14am

It is suspicious that the pause is exactly the default 500ms default double click window, but when I adjust the double click speed the event pause does not change. I can make the double click window much shorter or longer (and double clicks register correctly), and the block remains 500ms.