Can we simulate rust like traits in Zig (to add methods to a struct)?

Hello,

I find std.ArrayList’s unergonomic to use, because we have to access .items to access something inside the array.

Since Zig doesn’t allow [] operator overloading, I wrapped std.ArrayList inside my own custom struct, and added the at() and in() methods to access the data in a simpler/safer way.

The issue is, to have the same API as std.ArrayList, I need to wrap the other used methods too, such as append(), getLast() etc.

In rust, it is simpler to add a trait for the at() and in() methods, and use them without modifying rest of the code.

Is something like that possible in Zig?

Maybe something like, std.ArrayList(u32).withTraits(.{MyTrait, OtherTraits})?

Thanks.

const std = @import("std");

pub fn ArrayList(comptime T: type) type {
    return struct {
        array: std.ArrayList(T),

        const Self = @This();

        pub fn init(allocator: std.mem.Allocator) Self {
            return Self{ .array = std.ArrayList(T).init(allocator) };
        }

        pub fn initCapacity(allocator: std.mem.Allocator, n: usize) !Self {
            return Self{ .array = try std.ArrayList(T).initCapacity(allocator, n) };
        }

        pub fn deinit(self: Self) void {
            self.array.deinit();
        }

        pub fn len(self: Self) usize {
            return self.array.items.len;
        }

        pub fn capacity(self: Self) usize {
            return self.array.capacity;
        }

        pub fn append(self: *Self, x: T) !void {
            try self.array.append(x);
        }

        pub fn getLast(self: Self) T {
            return self.array.getLast();
        }

        pub fn at(self: Self, i: usize) T {
            return self.array.items[i];
        }

        pub fn in(self: *Self, i: usize) *T {
            return &self.array.items[i];
        }
    };
}

1 Like

There was usingnamespace, but it got removed.

You can just not wrap other methods. Then you can do your_list.array.append() (or maybe rename array to something shorter).

2 Likes

In Zig you’re expected to reach into struct fields fairly often. An ArrayList is not considered an abstract data type that represents a list of elements, but rather a wrapper around a slice that manages its reallocation and nothing much more than that.

I would encourage you to push past your discomfort and to see if you can gain anything from this new perspective. If after that it clicks for you, great, and if it doesn’t, at least you gave it a good attempt and it might be that Zig is not the language for you.

22 Likes

Make of it what you will, but in gdzig we use codegen to generate Zig bindings for Godot and have support for mixins.

There is some utility for us to be able to add methods/properties to some of the generated structs. Given a file like StringName.mixin.zig:

And the corresponding writeMixin function in our codegen.zig:

The resulting file string_name.zig will have the above written into its StringName struct. It’s crude, but it works.

2 Likes

Most likely I will follow this advice, since it’s simpler, and requires less work.

For most important/used methods I can write a simple wrapper (like i did before), but for less used methods, it’s probably better to just directly access the array and call it.

Thanks.

2 Likes

Tbh, even though I’m full in on C which also doesn’t have private fields and generally is much less ‘safety-aware’ than Zig, I still feel ‘discomfort’ when having to reach directly into the ‘guts’ of Zig stdlib structs which clearly mix ‘public access’ and ‘private implementation detail’ members in the same struct. E.g. accessing the embedded slice in an ArrayList is fine, but then ‘capacity’ is also there for everyone to overwrite and corrupt.

IMHO the underlying problem is that Zig’s stdlib is still too infected by C+±style OOP, ArrayList is essentially a C++ like container class, but without the protection that C++ provides against misuse.

I don’t have a good solution for this. In C I would build an API where the ‘object’ is some sort of opaque handle, and all accesses need to happen through function calls.

Maybe it would at least help to put all struct items which are not meant for public access into a nested struct called _private or something, e.g.

const Bla = struct {
    items: Slice,
    _private: struct {
        capacity,
    };
};

That way it at least becomes clear which struct items are safe to access, and which struct items are implementation details which are not part of the ‘public API’ (ok, for this an ArrayList with just two members is a too simplistic example).

E.g. the problem isn’t as much that Zig doesn’t have private struct members, but that the stdlib freely mixes public and private members in the same struct which makes it hard to decide whether an item is even meant to be accessed (e.g. it makes it harder to understand the intended usage)

7 Likes

Zig should think this way: nothing should not be accessed. When there is a need to know the capacity, go to visit it.
In fact, for me, I encountered more trouble because the Zig standard library did not set some content as pub, which led me to have to copy a copy of the standard library code and make a key statement defining the version as pub to expand when I needed to implement my own requirements.

6 Likes

Yeah I often had the same problem in C++ libraries.

Ok different proposal: readonly struct members… e.g. capacity can be read from everywhere, but only written from within the ArrayList implementation functions.

It always goes back to the idea that’s rolling around in the back of my head for like a decade now that programming languages should have fine-grained filesystem-like access rules instead of the overly simplistic public/private/const/var model.

7 Likes

Yes, I am actually adopting such a convention in practice at present: for the structure, it is not allowed to directly modify it in a way other than the API except for initialization.
However, for me, self agreement is almost enough, but I am not particularly interested in such restrictions at the language level, because such restrictions will make me worry about more problems of “wanting to access but being restricted because no pub is provided”

2 Likes

…yeah it’s a delicate balance, and also not really relevant for small-team-projects (since such teams can entirely live by coding-conventions).

It’s also not relevant for the interior of a module, or generally a module-sized piece of code (e.g. anything that’s maintained by one person).

I’d still like to have more control over visibility at the module boundary though, just to clearly communicate which parts are ‘public API’ and which parts are ‘internal implementation details’ - but without going full PIMPL.

I would probably try my best to make all modules have a flat C-style function API without leaking any internal structs, but that’s really tricky once generics come into play (I guess the lesson is to not make generic types part of a public API heh).

1 Like

The problem with this and your explanation is that items is exactly as unsafe as capacity. You are free to read it, but should not just overwrite its value willy-nilly.
In fact, those two values should only be set together because capacity has to match the allocated items.

Edit: I’m not saying you don’t have a point. Only that both fields in this example would have to be private.

5 Likes

There was concrete opaque types a while ago, which covers a lot of ground here. It’s possible right now, it’s not even difficult, but Zig is almost certainly not going to make it any easier than it is.

I think the aversion to not having private or read-only fields is 90% or more just discomfort at the thought that it allows Bad Things to occur, and just a sliver of actual problems which actually arise from meddling with a struct’s fields. I can’t say I’ve ever had to fix a bug arising from code I wrote which messes with the internal logic of a stdlib container, it’s possible to do it, but not likely to occur by accident.

3 Likes

I am not sure what willy-nilly means (programs need to be correct anyway), but the documentation clearly states that .items is meant to be directly accessed and I think it is good the way it is.

Also using ensureUnusedCapacity followed by passing unusedCapacitySlice to a function that needs a buffer and returns how many bytes it has written, followed by writing directly to list.items.len += written_bytes is a valid use case and because of ensureUnusedCapacity the later operations can’t fail, it also avoids creating an unnecessary intermediate buffer.

I think Zig is fine the way it is, if Zig wanted to (which it doesn’t seem to) become a more complex language, then I think a superset version of Zig could make sense that adds a kind of constraint/contract system, that lets you describe invariants with code to precisely track that certain properties aren’t invalidated (or even some optional extension like clr), but I don’t like adding some kind of half-solution that just restricts the programmers freedom from expressing the program they wanted to write.

There are already enough programming languages that restrict the programs you are allowed to write in them, towards those the language designer is most comfortable writing a compiler for.

3 Likes

I agree. The std library doc is pretty sparse (due to other priorities so far I assume), but if it were completed then (I assume) it would clearly specify which fields are part of the stable interface. Then there would be much less of a need/desire for a naming convention or pub keyword. Do you agree?

1 Like

Sorry if I wasn’t very clear here.

First, I agree 100% that it is fine as it is. I’m not arguing for or against anything here currently.

All I was trying to say is that I don’t see why items and capacity should be treated differently from one another. In my opinion, they share the same access pattern. It is completely fine and useful to read them (items for getting to those items and capacity because sometimes you want to know if some amount of items can still be added without allocation). It is not useful to write them directly. Sure, the docs say you should access items, but I don’t believe that anyone is talking about overwriting that pointer.

Edit: By pointer, I mean slice. Like this: array_list.items = my_own_slice.

I think that is the only part I disagree with, when somebody writes to the len field of items that already is write access to the slice and it is clearly useful in some situations. Also there may be cases I haven’t thought of where somebody actually wants to write or replace the whole slice. (Who knows maybe somebody implements a moving garbage collector, some kind of runtime introspection / editor thing, or something like that)

I would go further than that:
The benefit of having these fields be public read/write access is that you always can figure out how they are used exactly and then opt in towards using them in whatever way that is useful for you. The benefit is precisely in being able to ignore perceived abstraction boundaries and instead just deal with the reality of the underlying implementation and how it is used in a program.

It also makes it relatively easy to create free standing functions that receive a pointer to the data structure and allow you to implement some obscure functionality that is missing from the api surface and only useful to you.

Without having to copy the entire data structure and having to replace the type throughout your program, just so that you can add one small function, that is already compatible with the existing data layout, just wasn’t thought of by the designer of the data structure.

For lowlevel code I don’t see much value in access semantics and for higher level code I would want something like capability based security, but I think in Zig that would be implemented within whatever program or api you create and then you would make use of os security mechanisms like page access rights, process isolation and sandboxing to keep people from hacking fields they don’t have the right capability for.

To get back to the topic question:
I think one answer in Zig is that instead of writing a method you always can just write a freestanding function and call that instead, you don’t need to use method call syntax for everything. Sometimes using function(&instance, 5) is way better than forcing your types to be “class-like”.

4 Likes

That is opposite of what we’re taught to do in other languages so hopefully I’m not giving a biased reaction, but isn’t there a desire to evolve/improve the std library over time, and have some flexibility in changing the fields in the process?

In my code, which will eventually be published as a library, I’m planning to document which fields in the API are subject to change and should not be relied on. To me that’s better than using an opaque type of some sort – trying that would be swimming upstream since the language isn’t designed for it. The idea in the thread linked above, of bit casting a byte array at every entry point, is overkill to me.

2 Likes

I am not arguing for using an opaque type (people can do that if they want to, I rarely care about that, rarely want to make things opaque on a lowlevel), so I am a bit confused, I expressed mostly that I don’t like access restrictions on a lowlevel and rather want that people are free to do what they want with fields.

The part you quoted was specifically about being able to reach into data-structures and do what you want with them, instead of only accessing them via their predefined methods.

I think it is good when the designer of the data-structure documents intended use, what I was trying to express is that sometimes it is useful to understand how exactly a data-structure works and use it beyond what was intended by the author (or maybe just not explicitly advocated for).

I understand, sorry that I wasn’t clear. I was wondering how using the internals can be practical when the fields can change over time. Would it be ok if your code breaks when you upgrade to a new version of the library? Or is there some implied rule (that I’m missing) that fields in an API cannot be changed after 1.0, or something similar? I’m probably misunderstanding.

2 Likes

I can’t say how that works for others, but for me personally upgrading code and dependencies is a manual process that potentially involves vetting the new code and taking ownership of it, if necessary.

If I write code that somehow breaks some invariant or relies on fields that disappear in an updated version of a library, than it is on me to fix my code.
(Or if the library really goes in a direction where it no longer supports my use case, than I may have to take ownership of some part of its code, maybe by copying some type, or even forking it)

more context and nuance

Ideally the library makes it clear what parts are intended vs not intended, so that the user will know whether something they are doing is future compatible, if they care about that.

I think the confusion between us comes from the fact that I was mostly talking about writing applications and it seems you are more talking from the perspective of writing libraries.

I would agree that if you write libraries that depend on other libraries, then it might not be a good idea to reach too far down into the implementation details of those other libraries. (I still wouldn’t out-rule it completely, but if you do it I would suggest to have a bunch of tests, asserts, etc. to make sure that whatever assumptions you have added beyond what was promised by those dependencies, that they are easily discovered as broken assumptions, when you upgrade dependencies)

But it also depends a lot on what your library actually documents as the sanctioned api.

Basically whenever you reach into fields that aren’t meant to be touched by what the api says, then you are opting into more responsibility, it is no longer on the library author, because you have already violated the rules laid out by the library author, now it is the users responsibility to make sure that what they are doing works out.

That said it would be good to have validation tools that can say whether some api was used according to its supported api, or beyond that.

I guess to say it another way there is api-safe mode and there is an unofficial unsafe mode where you don’t have to care about what the library promises, but that comes with the risk of shooting your own foot.

97% of the time you just can get away with the api-safe version, but sometimes you just want to do something that isn’t part of the api. And not every case warrants patching or forking the library just for that.

2 Likes