Tagged pointer implementation

I implemented a tagged pointers and tried to make it work with different word sizes. My only question is that if I wanted to make this properly cross platform, do I ever have to worry about the field ordering not working if pointers were implemented on a platform such that the least significant bit is first? (e.g. alignment starts from the left but the tag is positioned at the right)

What is a tagged pointer?

When a pointer is aligned to N, the log2(N) least significant bits are always zero so you can store information there.

pub fn TaggedPointer(T: type) type {
    const P = switch (@typeInfo(T)) {
        .pointer => |P| P,
        else => @compileError("expected pointer"),
    };
    if (2 <= (P.alignment orelse @alignOf(P.child))) {
        @compileError("alignment must be aligned to 2 or greater");
    }

    return packed struct(usize) {
        const I = @Int(.unsigned, @bitSizeOf(usize) - 1);
        data: I,
        tag: enum(u1) {
            pointer = 0,
            int,
        },
        const Self = @This();

        pub fn asPointer(self: Self) T {
            std.debug.assert(self.tag == .pointer);
            return @ptrFromInt(@as(usize, @bitCast(self)));
        }
        pub fn asInt(self: Self) I {
            std.debug.assert(self.tag == .int);
            return self.data;
        }

        pub fn fromPointer(p: T) Self {
            const v: Self = @bitCast(@intFromPtr(p));
            std.debug.assert(v.tag == .pointer);
            return v;
        }
        pub fn fromInt(i: I) Self {
            const v: Self = .{ .data = i, .tag = .int };
            return v;
        }
    };
}

I was also considering if it was worth it to allow larger custom tag types if the alignment was large enough, I would just have to assert it fits and find the zero value.

AI was not used.

1 Like

here’s an implementation with custom enums:

pub fn TaggedPointer(T: type, Tag: type) type {
    const P = switch (@typeInfo(T)) {
        .pointer => |P| P,
        else => @compileError("expected pointer"),
    };

    const alignment = P.alignment orelse @alignOf(P.child);
    const tagSize = @bitSizeOf(Tag);

    if (@log2(@as(f64, @floatFromInt(alignment))) < tagSize) {
        @compileError(std.fmt.comptimePrint(
            "alignment of {s} must be large enough to fit in {s}",
            .{ @typeName(T), @typeName(Tag) },
        ));
    }

    const Enum = switch (@typeInfo(Tag)) {
        .@"enum" => |Enum| Enum,
        else => @compileError("expected enum"),
    };

    const zeroVal: Tag = zero: {
        for (Enum.fields) |field| {
            if (field.value == 0) {
                break :zero @enumFromInt(field.value);
            }
        }
        @compileError("no zero value in enum");
    };

    return packed struct(usize) {
        const I = @Int(.unsigned, @bitSizeOf(usize) - @bitSizeOf(Tag));
        const Self = @This();

        data: I,
        tag: Tag,

        pub fn asPointer(self: Self) T {
            std.debug.assert(self.tag == zeroVal);
            return @ptrFromInt(@as(usize, @bitCast(self)));
        }
        pub fn asData(self: Self) I {
            std.debug.assert(self.tag != zeroVal);
            return self.data;
        }

        pub fn fromPointer(p: T) Self {
            const v: Self = @bitCast(@intFromPtr(p));
            std.debug.assert(v.tag == zeroVal);
            return v;
        }
        pub fn fromData(i: I, tag: Tag) Self {
            const v: Self = .{ .data = i, .tag = tag };
            return v;
        }
    };
}

It would be used like this:

const T = TaggedPointer(*u64, enum(u2) { pointer = 0, int, float});
var val: u64 = 0;
const tp: T = .fromPointer(&val);
tp.asPointer().* = 10

std.debug.assert(val == 10)
const tp2: T = .fromData(14,.int);

The only thing I’m not happy with is that there isn’t a way to detect if the enum values were generated by zig or the user so a better way would be to take another parameter that is the enum zero value.

I can see some use cases, but if the tag is inside the upsize would the pointer be broken.
What is I am not understanding?
Or how do you know for sure that the pointer will have the correct tag, from pointer you have an assert there.

I should assert the alignment is an exponential of 2, I just assumed that was always the case in zig.

If the alignment is a multiple of two the least significant bit is always zero, you can expand that logic to multiples of 4, 8, and any exponential of 2.

My implementation assumes that you are only interested in the pointer value if the tag is zero and as such asserts that when converting to a pointer, this because I was coming from the perspective of word sized unboxed types or a pointer to a another one.

If I wanted to treat the tag as just metadata where the data is always a pointer I would zero the tag before using it as a pointer.
Something like

I am a beginner so I may be wrong

You can compile your program for different targets (including both little and big endian architectures) and run tests for multiple architectures to make sure your logic works.
(For example I vaguely remember some default pointer alignments being different on arm? I think for function pointers…)

Then you could also make use of Zig’s qemu support to run the tests. (I haven’t used that much yet, but it seemed to work quite well in the past when I was comparing behavior of different endian targets and running tests for them)

On most systems not every bit in a pointer will be used, but it depends on the target.
Here is a comment that has more details:

Basically pointer tagging is a bit of a trick/hack to store extra information in pointers, but it requires understanding how exactly pointers work on the target architecture, these “unused bits” are also increasingly used for other purposes, so basically you have to make sure you understand the target hardware and the software running on it, to make sure you aren’t doing something that doesn’t work. Using alignment bits is the simplest and most portable.

https://ziglang.org/documentation/master/#packed-struct:

Each field of a packed struct is interpreted as a logical sequence of bits, arranged from least to most significant.


So if you intend to use the low bit (alignment bits) as the tag then I think the tag needs to be the first field.

Here is my suggestion for how to test something like this:

mkdir testexample
cd testexample
zig init

src/main.zig

const std = @import("std");

// NOTE minimal main.zig so that you don't get confused by other tests running
pub fn main() void {  
    std.debug.print("All your {s} are belong to us.\n", .{"codebase"});
}

src/root.zig

const std = @import("std");

pub fn TaggedPointer(T: type) type {
    const P = switch (@typeInfo(T)) {
        .pointer => |P| P,
        else => @compileError("expected pointer"),
    };
    const alignment = P.alignment orelse @alignOf(P.child);
    if (alignment < 2) {
        @compileError("alignment must be aligned to 2 or greater");
    }

    return packed struct(usize) {
        const I = @Int(.unsigned, @bitSizeOf(usize) - 1);
        tag: enum(u1) {
            pointer = 0,
            int,
        },
        data: I,
        const Self = @This();

        pub fn asPointer(self: Self) T {
            std.debug.assert(self.tag == .pointer);
            return @ptrFromInt(@as(usize, @bitCast(self)));
        }
        pub fn asInt(self: Self) I {
            std.debug.assert(self.tag == .int);
            return self.data;
        }

        pub fn fromPointer(p: T) Self {
            const v: Self = @bitCast(@intFromPtr(p));
            std.debug.assert(v.tag == .pointer);
            return v;
        }
        pub fn fromInt(i: I) Self {
            const v: Self = .{ .data = i, .tag = .int };
            return v;
        }
    };
}

fn testPrintIntCase(expected: []const u8, value: anytype, base: u8, case: std.fmt.Case, options: std.fmt.Options) !void {
    var buffer: [100]u8 = undefined;
    var w: std.Io.Writer = .fixed(&buffer);
    try w.printInt(value, base, case, options);
    try std.testing.expectEqualStrings(expected, w.buffered());
}

fn testUSize64Case(expected: []const u8, value: usize) !void {
    try testPrintIntCase(
        expected,
        value,
        2,
        .lower,
        .{},
    );
}

test TaggedPointer {
    const native_endian = @import("builtin").target.cpu.arch.endian();
    switch (native_endian) {
        .big => std.log.warn("big endian test!", .{}),
        .little => std.log.warn("little endian test!", .{}),
    }

    const FooBar = struct { foo: u16, bar: u16 };
    const TPtr = TaggedPointer(*FooBar);

    const f: *FooBar = @ptrFromInt(0b110101011000110);
    const p: TPtr = .fromPointer(f);
    try testUSize64Case("110101011000110", @bitCast(p));
    try std.testing.expectEqual(f, p.asPointer());

    const i: TPtr = .fromInt(85);
    try testUSize64Case("10101011", @bitCast(i));
    try std.testing.expectEqual(85, i.asInt());
}

Test commands:

zig build test --summary all
test
└─ run test w
[default] (warn): little endian test!
failed command: ./.zig-cache/o/cb64f37095022cad5e4f63f7e766c276/test --cache-dir=./.zig-cache --seed=0xe6eb1ade --listen=-

Build Summary: 5/5 steps succeeded; 1/1 tests passed
test success
├─ run test 1 pass (1 total) 6ms MaxRSS:10M
│  └─ compile test Debug native success 595ms MaxRSS:148M
└─ run test success 6ms MaxRSS:10M
   └─ compile test Debug native success 580ms MaxRSS:147M
zig build -fqemu -Dtarget=powerpc64-linux test --summary all
test
└─ run test w
[default] (warn): big endian test!
failed command: ./.zig-cache/o/2a47de6e76932646b349be15e36e6692/test --cache-dir=./.zig-cache --seed=0x826e0521 --listen=-

Build Summary: 5/5 steps succeeded; 1/1 tests passed
test success
├─ run test 1 pass (1 total) 31ms MaxRSS:18M
│  └─ compile test Debug powerpc64-linux success 3s MaxRSS:276M
└─ run test success 23ms MaxRSS:18M
   └─ compile test Debug powerpc64-linux success 3s MaxRSS:281M

For more infos:

1 Like

It seems I got lucky with my tests using the high bit on an aarch64 device (macbook neo).