One of the things I find slightly awkward about Zig is the use of the .*
operator to dereference pointers. My question is, why is this so? Are there any advantages of this vs. C’s prefix *
operator?
I think the main benefit is explicitness, coming from a C background I found it weird too at first but now I actually prefer it this way I think it makes it visually clearer that you use a pointer, and especially for multiple level of indirection it’s easier to parse with the eyes.
int *a;
var b : *i32 = undefined;
*a = 5;
b.* = 5;
I think in general the added ‘.’ add some spacing which makes it easier to see the .* pointer itself. Also I’m sure it’s easier to parse too because when you have only
*identifier, the meaning of * is overloaded here because instead of an identifier it could be some numbers literal, or maybe you are multiplying something. Whereas identifier.* is probably easier to parse, because you know that when you have a ‘.’ the rest is probably a member of your identifier. But that’s just a guess I don’t know if that’s the case at all.
I made my peace with ptr.*
vs. *ptr
, when talking about the pointer itself. ptr.*.x
is also more regular than C’s ->
(aka *(..).
)
The thing that confuses me the most is that .*
is not necessary to access members through a pointer to the outer type. I.e.:
const A = struct {
a: i32,
};
// needs to be a var or &a will be *const A
var a: A = .{.a = -42 };
// 'var' works just as well ... which I find confusing
// given 0.12, I would've expected either to work, not both.
// var ap: *A = &a;
const ap: *A = &a;
// but why not ap.*.a ? Nothing here reminds me that ap is a pointer.
// don't get me wrong, I prefer ap.a here, .*. is just annoying visual junk
// I'm just confused by varying level of strictness in the language.
ap.a = 17;
std.log.debug("{any}", .{ap.*});
I also think that an identifier-first-deref-op-next is more “logical”. Same thing with declaring variables (var a: u32;
vs u32 a;
), most modern languages have chosen name-first-type-next way.
BTW, there were times when ^
(Pascal fashion) was used (or planned to be used) for single-item pointers.
What is a bit confusing for me regarding derefernce, is this:
const std = @import("std");
const log = std.debug.print;
const SomeThing = struct {
a: u32 = 1,
b: u32 = 2,
};
pub fn main() void {
const a_thing = SomeThing{};
const thing_p = &a_thing;
log("t.a = {}, t.b = {}\n", .{thing_p.*.a, thing_p.*.b}); // t.a = 1, t.b = 2
log("t.a = {}, t.b = {}\n", .{thing_p.a, thing_p.b}); // t.a = 1, t.b = 2
}
In C we have t.a
or t->a
depending on whether t
is a struct or a pointer to struct. In Zig .
works for both, but .*.
is kinda weird, isn’t it?
I get it, It is kind of weird, but at the same times I don’t think syntax is that big of a deal, as long as it doesn’t impair readability, and in the case you’ve shown, it is weird, and I don’t really like that there are multiple ways of expressing the same thing. But that being said, I think syntax can always be fixed, especially considering that the language maintainers don’t care about breaking backward compatibility if that makes sense for the language. So maybe one day we’ll have a different way of accessing pointers to thing.
The pointer syntax is part of the reason I can build three versions of the same function and leave the function body the same:
pub fn foo(self: Self) usize {
return foo.count;
}
pub fn foo(self: *Self) usize {
return foo.count;
}
pub fn foo(self: *const Self) usize {
return foo.count;
}
That said, I’m not sure how to answer the original question here on a broader scope. Why have pointer syntax different than C? At some level, why have anything different than C at all?
I think the consistent dereferencing syntax (especially for member variables) is a huge bonus here and reusability goes way up having consistent accessing syntax.
I like not having to add that extra .*
to access a member. But if you take the argument that’s being made to support that and apply it the other direction, ptr =
should mean ptr.* =
… If there’s documentation/discussion of that member (de)referencing rationale, I’d be happy to read that… Interestingly the language doc does not mention that you can drop .*
on member access, except in the fine print of the test (linked list, a comment near the end) under Documentation - The Zig Programming Language – and I’m not certain that a pointer to a union may drop the .*
. The documentation of the ‘c pointer’ implies that it needs the extra .*
compared to struct
. The irregularity makes me wonder.
Sorry for the off-topic turn; I take the same stance as Andrew up above – the original question seems impossible to answer.
I do not think so. One possible explanation is that it is easier to parse (both for eye and for compiler) when primary things go first, secondary things go next, for example:
- ‘
var <name>: <type>
’, but not ‘<type> <name>
’ - ‘
a = ptr^
’, but not ‘a = ^ptr
’ - ‘fn func() u32’, but not ‘
u32 fn()
’
Postfix notation is extremely easy to parse (remember Forth).
But most ordinary programming languages are sort of a postfix/infix/prefix notation mix.
I’m not saying it’s “bad”… I think it’s sometimes hard to find a good balance between these forms.
But ptr =
and ptr.* =
both make sense and have different meanings. With field access ptr.field
is nonsense unless the programmer intended ptr.*.field
You call it nonsense, I parse it as a quick way to get the address of the field, given we’ve not dereferenced the pointer to get to the value it’s pointing to. That would be regular.
If dereferencing is optional, how can you know what ptr =
and ptr.* =
do? If you start your logical voyage from ptr.*.field
and ptr.field
, both ptr =
and ptr.* =
make sense and … uh … they’re the same.
- prefix notation is also extremely easy to parse (remember lisp).
- (your three examples) but
Struct { contents }
not{ contents } Struct
: in your examples the “name” of your var is (more) “primary” than its type even though the type has way more impact than the tag you put it on. If the primary thing comes first, it surely would be the type, not the name, wouldn’t it? In contrast to the type, the name doesn’t even survive compilation… But if the name is more important than the type, a Struct initialization would surely value the contents more than its type as well?
To argue this (pointer dereference syntax) in the face of irregularity is void IMO, hence I’d side with AndrewCodeDev – why do anything different from C in the first place?
I’d be interested in seeing if there’s a thought-dump style “rationale” for these to make re-tracing the debate of pros/cons and its decision palpable
I think the operator overloading is probably the biggest reason for it not being done “normally”. Andrew has been quite adamant about there being no operator overloading. if you see x * y
, You know there is multiplication happening.
Now you could argue that de-referencing is a urnary operation and multiplication is a binary operation, and so it the two can’t really be ambigious, but i think the principle is a good one, so i’m willing to work with ptr.*
syntax.
EDIT:
Though now, I think about it and realize that &value
is technically an overload of the &
operator, either taking a reference, or as a bitwise and.
Me too yes, at the end of the day, what a language allows you to do, is more important than the syntax of said language. Zig syntax is fine the way it is. Sure some quirks here and there, but nothing that’s really extraordinary.
For the ‘&’ I think it’s fine, you usualy use it for an assignment so it’s in context not really overloaded.
const foo : u32 = 5;
const bar = &foo;
const baz = 1 & foo;
in that context I think that its clear enough to most people what’s going on.
Yes, and I repeat - most programming languages are in between of these two (Forth and Lisp) poles. Some things are infix, some are postfix, some are prefix. In essence, this is the reason of “holywar”-like discussions
I mean importance/primacy for reading (parsing), not for code being generated.
When a compiler sees, say, var x: u32 = 0
, it kinda “thinks” like this:
var
- ok, next token is a namex
- here it is, a name, there should be:
next:
- here it is- and so on
In other words, this form helps a compiler to predict what should follow next and thus generate good error messages. Now let’s look at int x;
:
int
… and WHAT?.. what should come next?
It may be int func()
, it may be int x
, it may be int)x
(explicit type casting).
What is more important for pointer dereference? For me it is a pointer, so it’s name should precede deref operation, so ptr^ / ptr.* = val
is better than *ptr = val
;
*ptr
vs ptr.*
– which one looks easier / more obvious?
&var
vs var.&
– same question.
One artificial way of answering might be stretching to limits:
ptr.*.&.*.& == ptr
var.&.*.&.* == var
&(*(&(*ptr))) == ptr
*(&(*(&var))) == var
The postfix versions do away with the parenthesis – in my mind, they are cleaner / clearer.
Damn!
- C
*p
, prefix, “deref a pointer”&v
, prefix, “address of a variable”
- Zig
p.*
, posfix, “take a pointer and deref it”&v
, prefix, “address of a variable”
In this sense C is more consistent, both operations are prefix, while in Zig (and Pascal) taking an address is prefix, de-referencing is postfix…
I’ve just thought about 4-th form, kinda “functional”, with which you could write assign(a, add(b,c))
or =(a, +(b,c))
instead of infix a = b + c
.
I stumbled upon this very question on Reddit, which refers to a Github issue about issues with C-style pointer operators, which in turn developed into a pointer operator reform (already mentioned by @dee0xeed but I had missed it). Those who are very patient will read all of it and understand perfectly the reasoning behind .*
and .&
; those lesser beings like me will be satisfied in knowing that deep conversations, very likely beyond me, were had, before the decision was made.
In summary, I can say that C-style pointer operators were not an option for Zig because Zig does things that C can’t, whence conflicts with C-style pointer operators arose.
IMO, given that Zig is (even if implicitely) meant as an improvement upon C, sticking to C-like ways of doing things makes sense insofar as they’re not an impairment to Zig.
Look. Let’s take some simple Zig snippet:
var x: u32 = 7;
x += 1;
const p: *const u32 = &x;
log("x = {}\n", .{p.*});
log("p = {*}\n", .{p});
and rewrite it in functional form, something like this:
@var(x, u32, 7);
inc(x);
@const(p, ^const u32, @addrof(x));
log("x = {}\n", .{@deref(p)});
log("p = {^}\n", .{p});
It is super-self-consistent, 'cause everything looks like a function call, including variable declarations. And no need to debate on prefix/infix/postfix.
they are somewhat consistent
a: *u8 => a.*
a: []u8 => a[0]
Zig is closer to C than any other (I mean static typing and especially manual memory management), as to improvements upon C - syntax improvements are also improvements and I personally consider var x: i32 = 0;
and fn func() i32
to be better syntax than C’s int x = 0;
and int func(void)