Allowing .packed, .extern, .struct, .fn, etc. (no @"...")

The enum std.builtin.Type.ContainerType previously had fields with upper-case camcelCase. It’s currently defined as:

enum { 
    auto,
    @"extern",
    @"packed",
}

Because extern and packed are keywords, the use of @"..." is necessary. To set the layout of a struct for @Type() to packed, you have to escape the enum name:

    return @Type({
        .layout = .@"packed",
        // ...
    });

I was thinking, what if we allow keywords to be interpreted as identifiers whenever they come immediately after a .? That would allow a less ugly syntax:

    return @Type({
        .layout = .packed,
        // ...
    });

And permit snake_case to be used for enum throughout the standard library. Currently, the naming convention allows uppercase camelCase when clashes with keywords occur. This sounds like a reasonable compromise but this exception is absolutely pervasive because we switch on @typeInfo() all the time.

Allowing .struct, .packed, etc. should require only trivial changes to the lexer. Basically, immediately after a period we don’t perform keyword look-up. The fact that this hasn’t happened yet makes me wonder if there’s some reason not to do so that I’m overlooking.

4 Likes

I don’t understand completely what you mean with the topic title.
I think the topic title makes more sense as “allow unescaped keywords as field accessors” or something like that?

I guess one good reason to avoid special allowances for field access, could be so that you have consistency between field access and field declaration.

I think with field declarations the explicit syntax is wanted so that you can tell immediately whether something is an extern keyword and not a extern: field.
And I guess needing @ in one place but not the other might be bad in another way, for example unexpected for beginners / more to keep in mind.

Personally I still think it might make sense to allow it without @ if there is a dot in front.

2 Likes

JavaScript allows this. You can use keywords as property names but var var = 5; or function const() {} would fail. It won’t be an alien concept to most programmers coming to Zig that you can’t use certain names in certain contexts. Beginners are more likely to be confused by .layout = .auto vs .layout = .@"extern". They definitely won’t get the impression that Zig is a clean language.

If I understand you correctly, you aren’t raising an Objection towards “allowing unescaped keywords as field accessors”, because that would mean that you are against being able to use keywords directly in that situation.

You object to the status quo of not being allowed to simply type .extern.
Or said another way you suggest that it should be allowed without having to use @.

I am not super decided on one way or the other, but I lean towards agreeing with you.
“Suggestion to allow unescaped keywords as field accessors”
or maybe the title could be:
“Objection towards status quo: allow unescaped keywords as field accessors”

I think your use of the word “Objection” is the inverse of the point you are trying to make.

I was wondering what objections people might have. I’ve updated the title of this post to reflect that.

The thing is, this is an easy change to make. I imagine a discussion must have occurred at some point and a decision was made not to allow this. I don’t want to sound like an idiot who keep bringing up ideas that have been struck down already :stuck_out_tongue:

1 Like

I always consider the dot part of the name so things like ‘a.b’ are a single entity to me. I know the lookup happens in stages (lookup a then b in namespace a), but in my mind that is just breaking apart the single name a.b and then doing the binding lookups.

I didn’t know this wasn’t possible actually, so I totally agree. .packed is not the same token as packed in my head already.

I finally have time to look more into this. So yeah, this is a simple change to make. We just need to add a bool to Tokenizer (lib/std/zig/tokenizer.zig):

pub const Tokenizer = struct {
    buffer: [:0]const u8,
    index: usize,
    pending_invalid_token: ?Token,
    is_last_token_period: bool = false,

At the bottom of Tokenizer.next(), set the field to true if the token is a period:

        self.is_last_token_period = result.tag == .period;

Then put a if() { ... } around the keyword lookup:

                .identifier => switch (c) {
                    'a'...'z', 'A'...'Z', '_', '0'...'9' => {},
                    else => {
                        if (!self.is_last_token_period) {
                            if (Token.getKeyword(self.buffer[result.loc.start..self.index])) |tag| {
                                result.tag = tag;
                            }
                        }
                        break;
                    },
                },

As a result of the change, tokenization would actually be faster, since we’re no longer doing a lot of unnecessary lookups.

Here’s binary (Linux-GNU) of the stage 2 compiler with the change:

It compiles the following code with no complaint:

const std = @import("std");

const Union = union {
    @"struct": u32,
    @"fn": u32,
    @"union": u32,    
};

pub fn main() void {
    const a: Union = .{ .struct = 1 };    
    const b: Union = .{ .fn = 2 };
    const c = .{ .layout = .packed };
    std.debug.print("{d} {d}\n", .{a.struct, b.fn});
    std.debug.print("{any}\n", .{c});
}
1 2
struct{comptime layout: @TypeOf(.enum_literal) = .packed}{ .layout = .packed }

Note: I’m changing the title of this thread again to clarify

2 Likes