Architecture of a complex data-driven app

debrisapron · February 28, 2024, 1:26am

I’m working on an embedded app which will have fairly complex UI logic and I’m considering using Zig. When I write these kind of apps I like to separate my logic into an “input → state” pass followed by a “state → output” pass. If you’re familiar with React / Redux think of the input logic as your store that handles events & updates the state, and the output logic as your components that take the state and render it to a canonical output format.

In JS (or even C++ with a bit of hacking) I have a pretty good idea of how to achieve these kind of patterns, but I’m struggling a bit to get Zig to work at the level of abstraction I’m used to. Here’s what I’ve got so far:

const std = @import("std");

// Definition of state atom

fn Field(comptime T: type) type {
    return struct {
        const Self = @This();
        
        value: T,
        prev_value: T,
        dirty: bool = false,
        
        fn init(v: T) Self {
            return Self { .value = v, .prev_value = v };
        }
        
        fn set(self: *Self, v: T) void {
            self.dirty = true;
            self.prev_value = self.value;
            self.value = v;
        }
        
        fn use(self: *Self) struct {T, T} {
            const pv = self.prev_value;
            self.prev_value = self.value;
            self.dirty = false;
            return .{ self.value, pv };
        }
    };
}

const State = struct {
    num: Field(u8) = Field(u8).init(0),
    str: Field([]const u8) = Field([]const u8).init("foobar"),
    // ...lots more
};
var state = State {};

// Event handler (input)

const EventCode = enum { A, B };

fn handle(event_code: EventCode) void {
    switch (event_code) {
        .A => {
            state.num.set(state.num.value + 1);
        },
        .B => {
            state.str.set("baz");
        }
    }
}

// Renderer (output)

fn render() void {
    if (state.num.dirty) {
        const v = state.num.use();
        std.debug.print("num changed from {d} to {d}\n", .{v[1], v[0]});
    }
    if (state.str.dirty) {
        const v = state.str.use();
        std.debug.print("str changed from {s} to {s}\n", .{v[1], v[0]});
    }
}

// Use them like this

pub fn main() void {
    handle(EventCode.A);
    render();
    handle(EventCode.B);
    render();
}

This is obviously a trivial example, please try to imagine a lot more complexity. Anyway, the general point here is that the state struct centralizes all your mutable state for the app, and sets a flag when a given field has been changed. The input phase updates the state, and the render phase renders whatever has changed. The previous values are needed because graphics on embedded is v expensive, so you often use optimizations like “if the first digit of the number hasn’t changed only render the second digit”, that sort of thing.

Am I barking up the wrong tree here? Is there a more idiomatic way to achieve this sort of pattern in Zig? Am I overthinking things?

AndrewCodeDev · February 28, 2024, 2:03am

Hey @debrisapron, welcome to the forum

One thing that would really help me is to understand how you would tackle this problem in something like C++ because that way I can understand your mental model more clearly and help you find some solutions in Zig that that would solve a similar problem for you. Again, it doesn’t have to be terribly complicated but some more background would be helpful to hash out ideas that you may want to consider.

For instance - a common pattern in C++ for this sort of event handler is to have a shallow base-class that is inherited from that has an identifying tag. In C, you could do something similar with a void* but that imagines a specific kind of event bus.

AndrewCodeDev · February 28, 2024, 5:29am

I’ll just give a couple thoughts here in the interim. I’m seeing a much more object oriented design being formed from your post than a data oriented one.

I’m saying this due to a few things I’m observing so far. First, the struct type that carries its previous values along with it. This makes it hard to compartmentalize the new data from the old and now you’re going to be accessing both regardless of what the use is. Same goes for tracking dirty fields. You could compress a lot of that with a much smaller array of something like u1 or bool, but you’re going to pay an offset tax by involving smaller types in the same struct.

A common approach here is to separate those two out either as separate arrays or arrays that are segmented. That way, if you want to access both, you can (incase you want to rollback) but you’re not always accessing both in the same read. This allows you to load more data of one state or another. An important thing to consider with data oriented design is “what do I do the most?” AKA, if you’re rolling back constantly, it may be handy to have that along side - if it’s more like an undo button and you’re more concerned about forward momentum, segmenting those out will allow you to access your data more effectively.

To do the C++ trick I was talking about (which I’m assuming is a kind of observer pattern) you’d typically do it with some kind of type erasure with interface functions. The language supported approach in C++ is virtual inerhitance (or virtual tables, really). In Zig, you can roll your own virtual tables very easily. A good example of this is in the allocator interface - you can see the formation of a v-table there that would suffice for this example: zig/lib/std/mem/Allocator.zig at master · ziglang/zig · GitHub

In that pattern, you have a v-table that contains a set of function pointers. The first argument is usually a *anyopaque and you cast back to the type you want… here’s an example from the arena allocator:

fn alloc(ctx: *anyopaque, n: usize, log2_ptr_align: u8, ra: usize) ?[*]u8 {
    // we cast the opaque pointer back to an arena allocator
    const self: *ArenaAllocator = @ptrCast(@alignCast(ctx));
    // more stuff below...

So this is an interesting take, but I can’t recommend it if you want good performance on your application. Making dozens of virtual calls in a row will make your application chug. Without analyzing your app more carefully, one approach here would be to have a single virtual call that knows which elements are which that then updates all of them as a batch.

I hope I’m giving you helpful information, but I really can’t say until I understand what type of design pattern you’re going for here.

dimdin · February 28, 2024, 7:21am

Welcome @debrisapron

You can declare render in a such way that it is not necessary to track changes in the field level.

fn render(s: *State, d: *Display) void {
    d.outLine1(s.num);
    d.outLine2(s.str);
    d.show();
}

Display can have two buffers, one that stores the currently displayed output and another that collects the output from render function.
When show is called it compares the two buffers and shows the changes in the actual display.

debrisapron · February 28, 2024, 9:09pm

Hi Andrew, thanks for the reply! I’m fairly allergic to classical OO patterns so when I want to do something like this in C++ (or C) I generally reach for macros

In my last C++ project the core looked a bit like this:

namespace state {

#define FIELD(FNAME, T, DEF_VAL)                                               \
    T __##FNAME##_prev_value = DEF_VAL;                                        \
    T __##FNAME##_value = DEF_VAL;                                             \
    bool __##FNAME##_changed = true;                                           \
    void set__##FNAME(T FNAME) {                                               \
        if (__##FNAME##_value != FNAME) {                                      \
            __##FNAME##_prev_value = __##FNAME##_value;                        \
            __##FNAME##_value = FNAME;                                         \
            __##FNAME##_changed = true;                                        \
        }                                                                      \
    }                                                                          \
    T get__##FNAME() { return __##FNAME##_value; }

#define HANDLE_STATE_CHANGE(FNAME)                                             \
    if (__##FNAME##_changed) {                                                 \
        __##FNAME##_changed = false;                                           \
        _on__##FNAME##_changed(__##FNAME##_value, __##FNAME##_prev_value);     \
    }

// define state model here

FIELD(edit_step_idx, U8, 0)
FIELD(edit_track_idx, U8, 0)
FIELD(edit_patt_idx, U8, 0)
FIELD(edit_param_id, U8, 0)
FIELD(playing, bool, false)
FIELD(play_pos, U8, 255)
FIELD(clock_phase, bool, false)
FIELD(is_loading, bool, false)
FIELD(transport_btn_held, bool, false)
FIELD(int_tempo, U8, 120)
// more...

}

namespace input {

void on_some_event() {
    set__edit_step_idx(1);
    set__edit_track_idx(2);
}
// more...

}

namespace output {

void __on_edit_step_idx_changed(U8 value, U8 prev_value) {
    // do something
}

void __on_edit_track_idx_changed(U8 value, U8 prev_value) {
    // do something
}

void render() {
    HANDLE_STATE_CHANGE(edit_step_idx);
    HANDLE_STATE_CHANGE(edit_track_idx);
    // more...
}

}

Looking at it now I guess you could call it a crude MVC framework, with the input namespace being the Controller, the state namespace being the Model, and the output namespace being the View. But to emphasize what I said earlier, the reason I can’t simply have the state be plain values is that the renderer must know what has changed, and what it changed from - re-rendering everything from scratch is not an option, and diffing is too expensive. Does this clarify things at all?

debrisapron · February 28, 2024, 9:20pm

OK this is very interesting but going a little over my head. Are you suggesting instead of structs to have something like:

const Field = enum {
    age = 0,
    name = 1
};
var values = .{ 51, "Matthew" }; // Sorry I can't remember the syntax for tuples offhand
var dirty = [_]bool{false, false};

fn onChangeAge(u8 new_age) void {
    values[Field.age] = new_age;
    dirty[Field.age] = true;
}

Or am I totally missing the point?

debrisapron · February 28, 2024, 9:26pm

Right this is a nice simple approach but the problem is you are effectively doing a diff of state ↔ prev_state on every render, which means crawling the whole state tree to find the values that have changed. And if we’re talking about strings/arrays that means comparing every value in those arrays. Maybe what I need is just some cunning way to memoize that comparison?

debrisapron · February 28, 2024, 9:30pm

Thanks everyone for helping btw, I am a relative n00b to hardcore low-level programming so apologies if I’m asking stupid questions.

dimdin · February 28, 2024, 9:31pm

No, you don’t have to compare the state. You compare the rendered output with the desired output that render generates. It’s double buffering, and it is much easier to display only the differences.

dimdin · February 28, 2024, 9:31pm

There are no stupid questions.
And you are welcome.

debrisapron · February 28, 2024, 9:34pm

When you say “compare the rendered output with the desired output” what do you mean? Are you talking about a pixel by pixel comparison?

dimdin · February 28, 2024, 9:35pm

Pixel by pixel, or led by led, or character by character, whatever is more convenient for your display device.

debrisapron · February 28, 2024, 9:38pm

Ok now I think I’m getting what your saying. So basically on render we:

Render state to screen buffer 1
Compare screen buffer 1 to screen buffer 2
For every pixel that’s changed, write the new value to screen buffer 2 AND actual screen

Am I getting that right?

dimdin · February 28, 2024, 9:42pm

Yes, that’s it.

This method is used everywhere, e.g. react does not update the web browser directly, it generates a virtual DOM, then it compares that with the actual browser DOM and applies the changes. So you have a react render function that generates a big list, but only the one changed line in the list is appended to the browser output.

debrisapron · February 28, 2024, 9:48pm

Right I know about React rendering, but what I was trying to avoid was the step where you have to render the entire state tree to your shadow buffer on every frame. But it’s possible that I’m way off with my intuitions about the runtime cost and I’m just making my life complicated for the sake of a few cycles!

dimdin · February 28, 2024, 9:58pm

Additionally you can validate that it actually changed:

const Field = enum {
    age = 0,
    name = 1
};
var values = struct{ 51, "Matthew" };
var dirty = [_]bool{false, false};

fn onChangeAge(u8 new_age) void {
    if (values[Field.age] != new_age) {
        values[Field.age] = new_age;
        dirty[Field.age] = true;
    }
}

Another way is to use named struct for state and have the dirty flags names and array size generated by the struct metadata.

AndrewCodeDev · February 29, 2024, 5:19am

Hey, no prob. Happy to help (or at least make commentary lol).

The main point I was going for is about breaking apart the objects. Now to be clear, the most important aspect of all of this is your actual use case. There’s also many ways to approach this… if something gets dirty, it could place its index in a buffer to be addressed later, for instance - there’s more than one approach here. Let’s talk about the scan properties though…

A benefit of breaking apart certain aspects of structs is how many elements you can scan at once and whether or not you can use something like SIMD optimization for that. Let’s get to SIMD in a moment and just look at the memory layout. One example has:

union - current
union - previous
bool - state

So for every bool, I need to step over two unions. That’s quite a bit of offsetting to get to a single bool. Now if we do an array of bools, we can find things quite quickly. With SIMD, we can now do much faster searching because it’s supported on types like bool. You can use std.mem.IndexOfScalarPos to search chunks of the buffer at once - in other words (with AVX 512), 64 bools at a time instead of one per loop cycle (assuming 8-bit byte width)… you can see the implementation here: zig/lib/std/mem.zig at master · ziglang/zig · GitHub

That’s one benefit of breaking apart your data-types.

As for the virtual functions, that’s probably not what you were going for, so I don’t think it’s quite worth exploring at this juncture

Here’s a good blog post to motivate some of those ideas: Ghostty Devlog 006 – Mitchell Hashimoto

debrisapron · February 29, 2024, 8:51pm

OK this is very helpful, I still haven’t fully internalized the mindset of thinking seriously about performance like this, where it actually determines the upfront architectural decisions. Thanks very much!

debrisapron · February 29, 2024, 8:53pm

This is very helpful, thank you!