How should I implement a payload abstraction for a server

Hello!

I’m learning zig by building a server that implements a binary protocol and I would like some thoughts on how I should design the payload abstraction.

To summarize, I need:

  1. structs that represent data that will be sent or received.
  2. functions to serialize and unserialize these structs, as well as some auxiliary ones.
  3. an association of struct type to an opcode.

So, my first attempt to implement this (which I’m very discomfortable about) consists of having an enumerated union where each field is a payload type.

Here is a skeleton of how I made it:

pub const Payload = union(enum(u16)) {
    PayloadType1: struct {
        f1: u16,
        f2: u32,
        // ...
    } = 1,

    PayloadType2: struct {
        f1: f32,
        f2: [8]u8,
        // ...
    } = 5,
    // ... 

    pub fn serialize(self: Payload, allocator: mem.Allocator) ![]u8 {
        switch (self) {
            inline else => |payload| {
                /// The actual implementation here
        }
    }

and it’s used like that:

const payload = Payload{.PayloadType1 = .{
    .f1 = somethig,
    .f2 = something_else,
}};
const serialized_payload = try payload.serialize(allocator);
defer allocator.free(serialized_payload);
send(p); // Just a simplification here

But I’m not really convinced that this is the best way to do that, as there’s some things that are bothering me:

  1. I have to do a static dispatch to do everything I need about the payload. i.e. every method I put behind Payload will need to have that same pattern, which costs 2 indentation blocks for the function logic (or making these public methods just a wrapper, but I don’t like it that much).
  2. I’m not really using any union feature other than the opcode to type association thing.

Given that, I would love to hear suggestions about how I can improve the abstraction of this code.

Thanks!

2 Likes

Welcome to the forum. Good question. I found that utilizing generic parameter in function and having common fields and common methods in struct types can go pretty far.

The following is a sample pattern using generic type in function parameter for the payload types. You can easily add a new payload type as long as the common field and the two common methods are added for the payload.

const std = @import("std");
const Allocator = std.mem.Allocator;

const OpCode = enum { p1, p2 };

const PayloadType1 = struct {
    op: OpCode = .p1,
    f1: u16,
    f2: u32,

    pub fn serialize(self: *const @This(), writer: *std.Io.Writer) !void {
        try writer.print("{} (f1={}, f2={})", .{self.op, self.f1, self.f2});
    }

    pub fn deserialize(self: *@This(), alloc: Allocator, reader: *std.Io.Reader) !void {
        _=self; _=alloc; _=reader;
    }

    // ... other payload specific methods.
};

const PayloadType2 = struct {
    op: OpCode = .p2,
    f1: f32,
    f2: []const u8,

    pub fn serialize(self: *const @This(), writer: *std.Io.Writer) !void {
        try writer.print("{} (f1={}, f2={s})", .{self.op, self.f1, self.f2});
    }

    pub fn deserialize(self: *@This(), alloc: Allocator, reader: *std.Io.Reader) !void {
        _=self; _=alloc; _=reader;
    }
};


// Access to the common fields or functions of the payloads.
fn getOpCode(payload: anytype) OpCode {
    return payload.op;
}

fn serialize(payload: anytype, writer: *std.Io.Writer) !void {
    try payload.serialize(writer);
}

fn deserialize(payload: anytype, alloc: Allocator, reader: *std.Io.Reader) !void {
    try payload.deserialize(alloc, reader);
}


test {
    var gpa = std.heap.DebugAllocator(.{}){};
    defer _ = gpa.deinit();
    const alloc = gpa.allocator();

    {
        const p1 = PayloadType1 { .f1 = 1, .f2 = 2 };
        std.debug.print("opcode: {}\n", .{getOpCode(p1)});

        var out_buf = std.Io.Writer.Allocating.init(alloc);
        defer out_buf.deinit();
        try serialize(p1, &out_buf.writer);
        std.debug.print("{s}\n", .{out_buf.written()});

        var in_buf = std.Io.Reader.fixed(out_buf.written());
        var p1a: PayloadType1 = undefined;
        try deserialize(&p1a, alloc, &in_buf);
    }

    {
        const p2 = PayloadType2 { .f1 = 1, .f2 = "abc" };
        std.debug.print("opcode: {}\n", .{getOpCode(p2)});

        var out_buf = std.Io.Writer.Allocating.init(alloc);
        defer out_buf.deinit();
        try serialize(p2, &out_buf.writer);
        std.debug.print("{s}\n", .{out_buf.written()});

        var in_buf = std.Io.Reader.fixed(out_buf.written());
        var p2a: PayloadType2 = undefined;
        try deserialize(&p2a, alloc, &in_buf);
    }
}
2 Likes

what specifically don’t you like? indentation blocks are not a useful metric at all here.

that is literally the only feature of a tagged union.

For deserialisation a tagged union is great here, it is a type safe way to be agnostic of which particular message is actually received.
And for serialisation, reusing that tagged union is good to reduce type duplication and mental overhead. You can definitely separate the inner types if you prefer.

To provide anything more, I would need to know what this protocol is and what you are using it for.

1 Like

Back in my MMO days we used code generation for this (not protobuf, but a simpler self-rolled system).

Basically some JSON, XML or ZON which describes the message protocol, the important part here is that this files also contains meta-information that can’t be expressed in the type system, like upper or lower bounds for numeric values, a list of valid values (ok, that would simple be an enum), or the max length of a dynamically sized string or arrays => with the basic goal to enable as much validation as possible already in the encode/decode layer.

The code generation would then generate the actual payload structs, and the encode/decode function for each type including data validation.

You don’t need such a data file format to describe the message protocol if the language allows to attach custom attributes to struct items (like for instace C# has: Writing Custom Attributes - .NET | Microsoft Learn) - Zig’s comptime would be perfect for that, and maybe there’s a way to ‘emulate’ custom attributes somehow.

Down in the message-passing-system I would define a payload as just a bag of bytes. ‘Encode’ turns typed data into a bag of bytes, and ‘decode’ turns a bag of bytes back into typed data.

Also one thing to consider is that a tagged union is always as big as the biggest item (not necessarily on the wire - depending on how your encode/decode functions look like, but in memory).

Thanks!!

I took your suggest as inspiration and tried to do a more generic thing, where I don’t need to have one serialize/deserialize function for each payload type:

const std = @import("std");

const Opcode = enum(u16) {
    PayloadType1 = 1,
    PayloadType2 = 5,
};

const PayloadType1 = Payload(Opcode.PayloadType1, struct {
    f1: u16,
    f2: u32,
    // ...
});

const PayloadType2 = Payload(Opcode.PayloadType2, struct {
    f1: f32,
    f2: [8]u8,
    // ...
});

pub fn Payload(comptime opcode: Opcode, comptime T: type) type {
    return struct {
        opcode: Opcode = opcode,
        size: u32,
        content: T,

        pub fn serialize(self: @This(), writer: *std.Io.Writer) !void {
            try writer.print("{}\n", .{self});
        }

        pub fn deserialize(src: []u8) @This() {
            var i: usize = 0;
            var result: T = undefined;

            inline for (std.meta.fields(T)) |field| {
                switch (@typeInfo(field.type)) {
                    .int => |v| {
                        const size = v.bits / 8;
                        @field(result, field.name) =  std.mem.readInt(field.type, src[i..][0..size], .big);
                        i += size;
                    },
                    else => unreachable,
                }
            }

            return @This().init(result);
        }

        pub fn init(content: T) @This() {
            return @This(){
                .size = @bitSizeOf(@TypeOf(content)) / 8, // Oversimplification here
                .content = content,
            };
        }
    };
}

test "serialize" {
    var gpa = std.heap.DebugAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var out_buf = std.Io.Writer.Allocating.init(allocator);
    defer out_buf.deinit();

    const p = PayloadType2.init(.{
        .f1 = 1.0,
        .f2 = .{1}**8,
    });

    try p.serialize(&out_buf.writer);
    std.debug.print("{s}\n", .{out_buf.written()});
}

test "deserialize" {
    var runtime_packet = [_]u8{
        0x00, 0x01,             // Opcode
        0x00, 0x01,             // f1
        0x00, 0x00, 0x00, 0x02  // f2
    };
    _ = &runtime_packet; // Make it runtime

    const opcode: Opcode = @enumFromInt(std.mem.readInt(u16, runtime_packet[0..2], .big));
    switch (opcode) {
        // It should be an inline else block, but idk how to get the payload type
        // from the tag name.
        .PayloadType1 => {
            const result = PayloadType1.deserialize(runtime_packet[2..]);
            std.debug.print("{}\n", .{result});
        },
        .PayloadType2 => {
            const result = PayloadType2.deserialize(runtime_packet[2..]);
            std.debug.print("{}\n", .{result});
        },
    }
}

It became a little bigger cause I have decided to at least draft some kind of deserialization function, cause I noticed that in this case, it would be a bit clunky to get a generic one.

The best I could do is to let the responsibility to the deserialize() caller to read the opcode and switch each possible result to actually infer the payload type. I think that it doesn’t hurt to much cause I prob will get to implement this switch in the code somewhere, to choose how to handle each payload type, but there will be some boilerplate that bothers me.

By writing this example, I noticed that the deserialization may be better in the tagged union case, cause it just return the union type, instead of specialized types.

Btw, I’m still not really convinced if it is worse or better than using the union, but it surely gave me information that will make taking decisions more conscious.

Thx!

1 Like

Sure. I think that what most annoys me is to have such boilerplate in front of every function implementation.

And then, to be more specific, I’m meant to say that, given that the only switch I would do on my union is a static dispatch with inline else, it’s an indicator that I’m not really having different flows for the possible tags the union. For example, if I had some function that switch over the union and had multiple prongs for each type of payload, I think that it could be a more justified use of union.

In summarize, if any union method is doing switch with inline else, it would just accept an anytype to do its work, so the payload types could be just regular structs. But the problem with that is that I lost the opcode → payload type association.

What exactly would be these custom attributes? The constraints that you previously mentioned, for example? Would you mind to give some simple example? Cause idk if I get exactly how these things could be done in the code.

Sure. I think this is what I’m trying to do, but it can be done in multiple ways, like just regular structs and generic functions, as well as using the tagged union, like in my first example.

This is actually the core feature of implementing polymorphism with a tagged union. One of its key characteristics is that it allows the payload type to be unknown at comptime, whereas generics require the payload type to be known at comptime.

Tagged unions also allow some interesting features, such as:

    pub fn serialize(self: Payload, w: *std.Io.Writer) !void {
        const op_code = @intFromEnum(self);
        switch (self) {
            inline else => |payload, tag| {
                const op_name = @tagName(tag);
                try w.print("{s}({d})", .{op_name, op_code});
                try w.writeByte('{');
                try payload.serialize(w);
                try w.writeByte('}');
            }
        }
    }

In this way, the serialization of the structure itself can be decoupled from its operation code, and the structure itself does not need to know what operation code it is.

2 Likes

That’s pretty cool. The generic deserialization has an advance use of the language.

One word of caution. Zig has an explicit no guarantee on the ordering of the fields of a struct, at least in memory layout. I’m not sure about the guarantee of the field ordering in the comptime Struct.fields. I’m not sure if packed struct can force a strict order of the fields in comptime. You can have your own way of defining the order the fields for serialize/deserialize.

One approach in a current project I’m working on is to use some kind of “annotation” in a struct to attach extra information about the struct. The generic processor (serializer/deserializer) can read the “annotation” and process accordingly.

const PayloadType1 = struct {
    f1:  u16,
    f2:  u32,

    pub const @"$FieldInfo" = .{
        .order = &[_][]const u8 { "f2", "f1" },
    };
};

fn fieldInfo_order(comptime struct_t: type) ?[]const []const u8 {
    return if (@hasDecl(struct_t, "$FieldInfo"))
        return @field(struct_t.@"$FieldInfo", "order")
    else 
        null;
}

test {
    const order = fieldInfo_order(PayloadType1);
    if (order)|field_names| {
        for (field_names)|name| {
            std.debug.print("{s} ", .{name});
        }
    }
}
2 Likes

For me personally, I love how Zig made those things very obvious the right thing to do. For your requirements

  1. structs that represent data … => extern structs
  2. functions … => functions
  3. an association … => union(enum)
const Payload = union(enum(u16)) {
    type1: PayloadType1 = 1,
    type2: PayloadType2 = 5,

    const Opcode = std.meta.Tag(Payload);

    // opcode in data is always non-exhaustive,
    // who knows what life throws at you
    const RawOpcode = blk: {
        var tag_type = @typeInfo(Opcode);
        tag_type.@"enum".is_exhaustive = false;
        break :blk @Type(tag_type);
    };
};

const PayloadType1 = extern struct {
    f1: u16,
    f2: u32 align(2),
};

const PayloadType2 = extern struct {
    f1: f32,
    f2: [8]u8,
};

pub fn deserialize(r: *std.Io.Reader) !Payload {
    @panic("TODO");
}

pub fn serialize(w: *std.Io.Writer, payload: @panic("something related to payload")) !void {
    @panic("TODO");
}

Fill in the rest of the owl

// TODO: check if buffer is big enough to fit all take/peek calls
pub fn deserializeBuffer(buf: []const u8) ?Payload {
    var fixed = std.Io.Reader.fixed(buf);
    // if not, then maybe should add way to return remaining len
    defer std.debug.assert(fixed.end == buf.len);
    return deserialize(&fixed) catch return null;
}

pub fn deserialize(r: *std.Io.Reader) !Payload {
    const opcode: Payload.RawOpcode = @enumFromInt(try r.takeInt(u16, .big));
    switch (opcode) {
        .type1 => {
            const payload = try takeStruct(PayloadType1, .big);
            return .{ .type1 = payload };
        },
        .type2 => {
            const payload = try r.takeStruct(PayloadType2, .big);
            return .{ .type2 = payload };
        },
        _ => @panic("unknown opcode?"),
    }
}

// you could go further and make payload anytype, then figure out the op, but i think this is clearer
// since most of the case of serialization you know what you are dealing with (and want to make it clear thereso)
pub fn serialize(w: *std.Io.Writer, comptime op: Payload.Opcode, payload: @FieldType(Payload, @tagName(op))) !void {
    // You can do anything with op and payload here, but let's say just printing is fine
    const anypayload = @unionInit(Payload, @tagName(op), payload);
    try w.print("{}", .{anypayload});
}

test "serialize" {
    var gpa = std.heap.DebugAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var out_buf = std.Io.Writer.Allocating.init(allocator);
    defer out_buf.deinit();

    try serialize(&out_buf.writer, .type2, .{
        .f1 = 1.0,
        .f2 = @splat(1),
    });

    std.debug.print("{s}\n", .{out_buf.written()});
}

test "deserialize" {
    var runtime_packet = [_]u8{
        0x00, 0x01, // Opcode
        0x00, 0x01, // f1
        0x00, 0x00,
        0x00, 0x02, // f2
    };
    _ = &runtime_packet; // Make it runtime

    const result: Payload = deserializeBuffer(&runtime_packet).?;
    std.debug.print("{}\n", .{result});
}

Meta-programming / reflection field ordering is in the order they appear within the source:
@typeInfo

Type information of structs, unions, enums, and error sets has fields which are guaranteed to be in the same order as appearance in the source file.

Type information of structs, unions, enums, and opaques has declarations, which are also guaranteed to be in the same order as appearance in the source file.

So if you keep your source ordering consistent with your over-the-wire / de-/serialization order, then you at least don’t need something like @"$FieldInfo" for the ordering.

But having some kind of meta data like this in the type can be helpful if you need to signal different ordering or other ways to switch between different representations.

4 Likes

oh.. I know inside out that kind of things

just two notes

  • have a function that can determine if there is more data to read (and also check if there is more data in your buffer than expected)
  • use event-driven-state-machines, do not go on leash of `async/await` buzzz

:slight_smile: