Comptime Linting Using @typeInfo

A previous discussion on here got me thinking about Pass-By-Value errors, and how insidious they can be; it seemed like an easy mistake to make, but potentially a very difficult one to debug. I came up with an idea to use zig’s @typeInfo inbuilt to recursively search through namespaces for functions that pass structs by value, and then check to see whether those structs contain a particular tag declaring them as “pass-by-reference only”.

const std = @import("std");

/// To subscribe a struct to reference-only linting, include
/// `const PASS_TYPE : PassType = .referenceOnly;`
/// in it. Note that this only works on functions marked `pub`
/// in namespaces subscribed in the comptime block.
const PassType = enum {
    referenceOnly,
    valueAllowed
};

comptime {
    var typelist : []const type = &[_]type{@This()};
    var i = 0;
    while(i < typelist.len) {
        defer i += 1;
        const t = typelist[i];
        const info = @typeInfo(t);
        // We only care about structs here, although this could be expanded to enums
        switch (info) { .@"struct" => |structinfo| {
        for(structinfo.decls) |d| {
            const decl = @field(t, d.name);
            const decinfo = @typeInfo(@TypeOf(decl));
            switch (decinfo) {
            // Lint function. This only works on functions marked `pub`, unfortunately
            .@"fn" => |f| {
            for(f.params) |p| {
                const paramtype = if(p.type) |ty| @typeInfo(ty) else continue;
                const pt = p.type.?;
                // We only care about structs here
                switch (paramtype) { .@"struct" => {
                if(@hasDecl(pt, "PASS_TYPE")
                    and @TypeOf(@field(pt, "PASS_TYPE")) == PassType
                    and @as(PassType, @field(pt, "PASS_TYPE")) == PassType.referenceOnly) {
                    const errstr = "In function '" ++ @typeName(t) ++ "." ++ d.name ++ "', took parameter of type '"
                                    ++ @typeName(pt) ++ "' by value, where field 'PASS_TYPE' specifies 'referenceOnly'.";
                    @compileLog(errstr);
                }}, else => {}}
            }},
            // Add declared type to type list, allowing recursive linting
            // Also only works on structs marked `pub`
            // Additionally, if you have fields marked `pub`, e.g.
            // ```pub const std = @import("std");```
            // Then these will also be searched
            .type => {
            typelist = addType(typelist, decl);
            }, else => {}}
        }}, else => {}}
    }
}

fn addType(comptime list : []const type, comptime Element : type) []const type {
    for(list) |T| {
        if(T == Element)
            return list;
    }
    return list ++ [1]type{Element};
}

pub const A = struct {
    const PASS_TYPE : PassType = .referenceOnly;
    bigdata : [10000]u8,

    // Note that this example optimises away the @memcpy, but we shouldn't assume it will do that
    pub fn cool(a : A, n : usize) void {
        std.debug.print("{}", .{a.bigdata[n]});
    }
};

pub fn main() !void {
    var ind : usize = undefined;
    if(std.os.argv.len > 1) {
        const endind = std.mem.indexOfSentinel(u8, '\x00', std.os.argv[1]);
        ind = try std.fmt.parseInt(usize, std.os.argv[1][0..endind], 0);
    } else {return;}
    const a : A = A{.bigdata = undefined};
    a.cool(ind);
}

(Please forgive the ugly switches, I wanted to avoid having ludicrous amounts of indentation)

The result was only moderately successful. In this demo, it’s able to detect the pass-by-value in the function “A.cool”, but in many other cases, I wouldn’t be able to do the same. @typeInfo only provides you access to declarations marked pub, so if pub const A was instead written const A or pub fn cool was fn cool, then the problem would never be picked up.
On the other hand, supposing @typeInfo could access declarations not marked pub, then the comptime block would comb through everything in std, which is undesirable - in fact, it causes a compilation error trying to access namespaces which it shouldn’t on my machine. The issue here is that there’s no way for me to distinguish between std, an imported struct, and A, which is defined within this file.
I would be keen to do things like this in my own projects to write my own safety rules, but it’s not really usable in its present form. Please share any ideas on how it could be improved or made viable.
It would certainly be helpful to me if we could perform more robust comptime analysis like this, but perhaps that’s outside the scope of what comptime is supposed to achieve. What do others think?

1 Like

Rather than calling it an “error,” I prefer to call it the “pain” of PRO’s impending removal.

In the past, PRO encouraged the direct passing of many large structures by value rather than by reference. Some structures might be very large, but the information they contained was stateless. Passing them by value could express this stateless semantics. PRO would then optimize the performance overhead caused by the structure’s large size. This is actually different from the semantics expressed using *const T. In the past, designing a function parameter as *const T meant that the function was read-only for T, even though T itself might be a stateful structure.

The removal of PRO does come with a semantic sacrifice: for performance reasons, large, stateless structures will have to be passed as *const T parameters, losing the ability to express the structure’s statelessness.

Because PRO and RLS have correctness issues when encountering alias problems, removing PRO is really a necessity. I’m not sure what direction zig will take.

1 Like

Maybe there’s a way to use the build system to circumvent the issue with private decls, while also providing the control you need to avoid linting the standard library?
Another issue though is that this can’t see through anytype. I’m not sure what you could do about that. It would be neat if there was a builtin akin to @This() that would be called in function scope and return a comptime mapping (modeled e.g. as a struct type) from parameter name to fully monomorphized type. If you had that, you could build a generic boilerplate function insertable in any function to check something about the argument types, and at build time you could just insert it textually into all your functions. This would also solve the anytype issue.

1 Like

Passing a struct by value (whether a shallow copy is made or a *const is passed implicitly by PRO) doesn’t prevent state from being modified when the struct contains a (non-const) pointer. The Allocator struct is one example. Passing by *const provides the same semantics. Or am I misunderstanding?

In my opinion, Allocator is stateless, like a handle that can be safely copied. Its contents are a pointer and vtb that never change. The content behind it points to has state, but it itself does not.

To be more specific, I think this structure can be called:
A stateless view over a stateful resource.

Perhaps it can be understood this way: the reason why we need to use pointers for stateful resources is because pointers convert stateful resources into stateless views.

1 Like

That’s a valid way of looking at it. But all that is also true when passing an Allocator as *const.

const T* has looser semantics than passing by value by copy; its semantics convey “this function does not modify this resource.” Therefore, not only can stateless Allocator objects use const T*, but even stateful structures can use const T* when using read-only APIs. However, the reverse is not true. In some scenarios, you can use const T* to pass parameters, but you cannot pass them by copy. This isn’t due to performance concerns, but rather to the state of the struct, even though the API is read-only.

I realize now you’re probably talking about modern conventions in C, with which I’m not familiar because I last used C in '95. In Zig I only pay attention to the hard capabilities of each parameter passing option (value, * and *const) rather than trying to follow any particular conventions (which seem in flux for Zig right now anyway). Sorry for the miscommunication. If there are common conventions specifically for Zig, especially taking into account the removal of PRO, I would love to know of course.

The above discussion is indeed a convention that was in effect before the removal of PRO. If PRO were in effect, copying a T could be optimized to passing a *const T parameter. Therefore, copying could be used to represent the stateless semantics of the structure itself, without worrying about performance issues.

After PRO was removed, the decision to use *const T or copying parameters no longer depends on the semantics, but only on performance: copying is preferred for small objects, while references are preferred for large objects.

This convention is not a C convention, as C lacks PRO.

Regarding the fact that stateful structures cannot be modified to pass by copy when using *const T, the standard library’s std.atomic.Value is a good example. Its load method looks like this:

pub fn Value(comptime T: type) type {
    return extern struct {
        /// Care must be taken to avoid data races when interacting with this field directly.
        raw: T,

        const Self = @This();
        ...
        pub inline fn load(self: *const Self, comptime order: AtomicOrder) T {
            return @atomicLoad(T, &self.raw, order);
        }
        ...
    };
}

In this example, no matter how small Self is, it is incorrect to change its parameter from *const Self to copy parameter. This is the case where copy parameter cannot be used for stateful structures.

I realize I once expressed the idea of ​​optimizing const T* to copy parameters based on their size. I apologize for this. This idea was wrong and this optimization is not possible.

Thank you. I’m aware of all that, but still not seeing any clear conventions, so perhaps that’s due to my own limited viewpoint.

This semantics is indeed undocumented(or perhaps I simply haven’t found it), but rather stems from my own reasoning from past experience reading and writing code, though these conclusions seem like castles in the air after PRO was removed. Ignoring this semantics, never using copy-by-value, and always using *const T , will ensure correctness.

From a performance perspective, sometimes passing by reference is more performant, and sometimes copy-by-value is more performant. When PRO can optimize copy-by-value to pass-by-reference, this means always using copy-by-value for optimal performance, regardless of whether the value’s size is suitable for reference or copy.

For stateful structures, copying is not permitted, regardless of whether they are read-only. Instead, the stateful semantics of the structure itself can be expressed through API parameter passing by copy or reference.

However, since PRO was removed, these semantics have become obsolete.

I can live with that, because for a language that does not enforce ownership or have built-in move and copy semantics, I’m pretty sure documentation (or reading the source when doc is missing) is the only way to know the specific intended semantics.

1 Like

That’s a great idea, and should be doable. First, I’ll just need to figure out how to the build system to modify code before passing it to the compiler.

I think neither comptime nor codegen are great approaches to implement this kind of check, because implementations seem to require too much awkward complexity.

I am not sure if it would work, but if it did, it may make sense to implement using techniques like clr:


I think ideally Zig would gain something like Lint/Validation-time, which can be configured via the build system and sort of runs like a compiler plugin. I think it would already be possible to do as a run step that re-does some of the things the compiler has already done, or looks at some of the intermediary outputs of the compiler, but having this as a validation-time feature, might make these sorts of checks easier to write.

Another possibility might be that something like this could be implemented via the compiler-protocol, once it is implemented.

related:

2 Likes