Concepts needed to implement simple template engine

ragezor · July 31, 2024, 9:26pm

I’m trying to implement a simple templating engine. Here is my pseudo code that I hope showcases my intentions:

const NodeType = enum {
    text,
    variable,
};

const Node = struct {
    type: NodeType,
    content: []const u8,
};


const TEMPLATE = "<p>Welcome, {{ user_name }}";

// parsing template gives us:
// Node 1 is type = text and content = "<p>Welcome, "
// Node 2 is type = variable and content = "user_name"

fn render(node_list: *ArrayList(Node), context: anytype) !void {
    for (node_list.items) |node| {
        switch (node.type) {
            NodeType.text => std.debug.print("{s}", .{node.content}),
            NodeType.variable => std.debug.print("{s}", .{context[node.content]}), // context[node.content] does not work
        }
    }
    std.debug.print("\n", .{});
}

// desired usage
// struct should be type checked so that it contains .user_name
render(node_list, .{ .user_name = "Martin" });

I have a working version with Map instead of struct for context but I’m not satisfied with this solution because Map keys are not type checked and usage is cumbersome.

So my question is what approach should I use for this to work and what Zig concepts do I need to learn? I’m thinking parsing needs to happen in comptime somehow (what data structure would replace ArrayList in comptime?) and then I need to generate specialized render function for each template that wil somehow access .{context.user_name} instead of .{context[node.content]} but I’m not sure if you can even do that in Zig?

AndrewCodeDev · July 31, 2024, 11:02pm

It’s also cumbersome because you end up doing lookups for things instead of directly having them.

TBH, it sounds like you’re trying to build a system with RTTI (runtime type information). There’s a few ways that I think about those systems but here’s just two thoughts:

There is some mechanism in the system that remembers implementation details. This is often in the form of closures that capture surrounding information (such as the type) or virtual tables that point to the correct implementation.
The system makes a lot of assumptions and pays for a runtime check. Effectively, when you access a variable in Python, you’re also always checking what type is stored (things aren’t statically typed like “this is an int”). Those systems can work but they require a tight setup with good invariants and the ability to bail out if assumptions aren’t met.

You’ve got a bunch of options but you’re basically always remembering/checking or assuming something. Intrusive data structures can come in handy here because you can generically store things and cast back to what they were (you can even do buffer optimizations to keep small things off the heap).

You can also make good use of unions here too. Zig makes it easy to check what types are if you need something to be correct before proceeding:

if (x == .foo) // proceed using x.foo if true

They also play nicely into switch statements if an instance could be one of of several things at some juncture. This can get nasty though if you are using multiple unions and you have to switch over them in the same expression - that gets embedded very quickly.

If you know that you want all of your types resolved before runtime, comptime programming can help but it’s easy to rely on that too much and build a system that requires ever more comptime deduction to work properly.

If you are really interested in going deeper in, you can look at how objects are implemented in languages like javascript or even NoSQL databases. In those cases, they often store things as raw bytes and also keep a table of offsets to the member variables so you don’t have to parse things each time.

Just some things to think about - there’s not a right answer here, it just depends on what kind of system you want to build.

AndrewCodeDev · July 31, 2024, 11:16pm

One other trick I will mention is handle types. Here’s a really simple example… I have some union of types or some blob of memory that needs to be interpreted. Let’s say I have some datastructure in the backend that they get stored in. I can have a create function that loads that data into a generic container and returns a handle that holds onto something with type information.

// storage holds onto the memory, but doesn't know what it is
const x = create(T, storage);

// x remembers what T is
const y = x.get();

Now you can store whatever you want in storage as raw bytes, but x remembers what is in storage.

There’s a big drawback to this system though - creating API’s on top of this gets ugly. Since functions that can work on x also need to know the type signature of x, it’s easy to lean heavily into features like anytype. It eventually can get to the point where anytype or deducing T becomes a part of everything you do. IMO, this should really be of a frontend thing.

If it’s just to generically store things though in some common place, you don’t have to create functions that use handles. You can use functions that only use direct types themselves and avoid all the deductions. Again, I’m not saying that this is the correct angle (I’ve had rewrite systems that I used this too heavily), but it’s just another way of “remembering” what is held on the backend.

anticrisis · August 1, 2024, 9:39pm

AndrewCodeDev:

You can also make good use of unions here too. Zig makes it easy to check what types are if you need something to be correct before proceeding:
if (x == .foo) // proceed using x.foo if true

This could possibly go in the langref – I’ve read that section several times looking for something other than the switch statement to check if a specific tag is active.

AndrewCodeDev · August 1, 2024, 10:00pm

They mention it but they could be more explicit with an example:

Tagged unions coerce to their tag type