Index Types

Yesterday I stumbled upon a small video about Ada.

One of the features of the language are Index Types. As in binding an array to a specific integer type for indexing.

This look like this:

type Index is range 0 .. 10;
type Array is array (Index) of integer;

These ranges don’t even need to start with 0 but can be e.g. -90 .. 90 (the ranges are inclusive in Ada) and also define the size of the array.

At runtime it’s then checked if the index can be converted to that type (or it’s unchecked if the type of the index will always fit because e.g. the range is smaller or the same).

Additionally it can be used to document to which array(s) an index belongs to.

Since ranged integers as a proposal are already accepted for Zig and since Zig encourages to use DOD (which encourages indexes over pointers), I wonder if something like this may also be a decent fit for Zig.

What do the others here think about it?

6 Likes

To me, this sounds like a functionality subset of allow integer types to be any range · Issue #3806 · ziglang/zig · GitHub. I regularly use usize to index my arrays/slices, as it will scale with the address space of the target machine, so I don’t have to be paranoid about overflowing my index integers. If I do need to practice DOD (space savings, storing some number of of these integer indexes), I think Zig’s current @Int semantics are plenty fine (and provide enough friction) to help with implementing them for when you really need them, while avoiding over-using them. That’s just me though. ¯\_(ツ)_/¯

3 Likes

What are the actual benefits that zig does not already have in a different form?

Associating an index with a collection(s)? We already have that by creating a wrapping api!

Preventing out of bounds accesses? We already have that, mostly at runtime, but in the right circumstances at compile time as well! And it can be improved with a wrapping api.

The only benefit I can see is reducing the friction, and marginally stronger bounds built-in.

The usage experience of the native index type is consistent with [*]T. Many item pointers can add or subtract a signed offset, and directly subtracting them can yield a signed offset. The native index type allows reverse for loops.

2 Likes

I don’t understand what you are trying to contribute to the discussion??

It may not quite match the description of the OP (it only considers indices starting from 0 and does not account for negative indices). In my ideal native indexing, the following behavior is supported:
(Edit: Added wrap-around operation)

const std = @import("std");
pub fn Index(comptime Backing_or_len: anytype) type {
    const IndexBacking: type = switch (@typeInfo(@TypeOf(Backing_or_len))) {
        .type => @as(?type, switch (@typeInfo(Backing_or_len)) {
            .int => |int_info| if (int_info.signedness == .signed) null else Backing_or_len,
            else => null,
        }) orelse @compileError("Backing must be unsinged int."),
        .comptime_int, .int => if (Backing_or_len == 0) u0 else std.math.IntFittingRange(0, Backing_or_len - 1),
        else => comptime unreachable,
    };
    return packed struct(IndexBacking) {
        child: Backing,
        pub const Backing = IndexBacking;
        pub const len: comptime_int = switch (@typeInfo(@TypeOf(Backing_or_len))) {
            .type => std.math.maxInt(Backing) + 1,
            .comptime_int, .int => Backing_or_len,
            else => unreachable,
        };
        pub const SignedBacking = if (len == 0 or len == 1) i0 else std.math.IntFittingRange(-len + 1, len - 1);
        pub fn fromUnverified(x: Backing) @This() {
            std.debug.assert(x < len);
            return .{ .child = x };
        }
        pub fn add(self: @This(), offset: anytype) @This() {
            const offset_int = switch (@typeInfo(@TypeOf(offset))) {
                .int => offset,
                .comptime_int => if (offset < 0) @as(SignedBacking, offset) else @as(Backing, offset),
                else => @compileError("offset must be int"),
            };
            const result: Backing = switch (@typeInfo(@TypeOf(offset_int)).int.signedness) {
                .unsigned => self.child + @as(Backing, offset),
                .signed => @intCast(@as(SignedBacking, self.child) + @as(SignedBacking, offset)),
            };
            return .fromUnverified(result);
        }
        pub fn sub(self: @This(), offset: anytype) @This() {
            const offset_int = switch (@typeInfo(@TypeOf(offset))) {
                .int => offset,
                .comptime_int => if (offset < 0) @as(SignedBacking, offset) else @as(Backing, offset),
                else => @compileError("offset must be int"),
            };
            const result: Backing = switch (@typeInfo(@TypeOf(offset_int)).int.signedness) {
                .unsigned => self.child - @as(Backing, offset),
                .signed => @intCast(@as(SignedBacking, @intCast(self.child)) - @as(SignedBacking, offset)),
            };
            return .fromUnverified(result);
        }
        pub fn addWrap(self: @This(), offset: anytype) @This() {
            if (len == 0) @compileError("addWrap on zero length index.");
            if (len == 1) return self;
            if (len == std.math.maxInt(Backing) + 1) {
                const gap: Backing = @truncate(offset);
                return .{ .child = self.child +% gap };
            }
            const gap: Backing = switch (@typeInfo(@TypeOf(offset))) {
                .comptime_int => @intCast(@mod(offset, len)),
                .int => |offset_int_info| blk: {
                    // Since `len == maxInt(Backing) + 1` is specialized, `len` now
                    // fits into `Backing` for all subsequent operations.
                    const Extend = switch (offset_int_info.signedness) {
                        .unsigned => if (offset_int_info.bits > @bitSizeOf(Backing)) @TypeOf(offset) else Backing,
                        .signed => if (offset_int_info.bits > @bitSizeOf(SignedBacking)) @TypeOf(offset) else SignedBacking,
                    };
                    break :blk @intCast(@mod(@as(Extend, offset), @as(Extend, len)));
                },
                else => @compileError("offset must be int"),
            };
            const threshold: Backing = len - gap;
            return .{
                .child = if (self.child >= threshold) self.child - threshold else self.child + gap,
            };
        }
        pub fn subWrap(self: @This(), offset: anytype) @This() {
            if (len == 0) @compileError("subWrap on zero length index.");
            if (len == 1) return self;
            if (len == std.math.maxInt(Backing) + 1) {
                const gap: Backing = @truncate(offset);
                return .{ .child = self.child -% gap };
            }
            const gap: Backing = switch (@typeInfo(@TypeOf(offset))) {
                .comptime_int => @intCast(@mod(offset, len)),
                .int => |info| blk: {
                    // Since `len == maxInt(Backing) + 1` is specialized, `len` now
                    // fits into `Backing` for all subsequent operations.
                    const Extend = switch (info.signedness) {
                        .unsigned => if (info.bits > @bitSizeOf(Backing)) @TypeOf(offset) else Backing,
                        .signed => if (info.bits > @bitSizeOf(SignedBacking)) @TypeOf(offset) else SignedBacking,
                    };
                    break :blk @intCast(@mod(@as(Extend, offset), @as(Extend, len)));
                },
                else => @compileError("offset must be int"),
            };
            return .{
                .child = if (self.child >= gap) self.child - gap else self.child + (len - gap),
            };
        }
        pub fn offsetSigned(a: @This(), b: @This()) SignedBacking {
            return @as(SignedBacking, a.child) - @as(SignedBacking, b.child);
        }
        pub fn offsetUnsigned(higher: @This(), lower: @This()) Backing {
            return higher.child - lower.child;
        }
    };
}

test "Index basic" {
    const MyIdx = Index(100);
    var idx = MyIdx{ .child = 10 };
    idx = idx.add(@as(MyIdx.Backing, 5));
    try std.testing.expect(idx.child == 15);
    idx = idx.add(@as(MyIdx.SignedBacking, -5));
    try std.testing.expect(idx.child == 10);
    const a = MyIdx{ .child = 20 };
    const b = MyIdx{ .child = 50 };
    try std.testing.expect(a.offsetSigned(b) == -30);
    try std.testing.expect(b.offsetSigned(a) == 30);
    try std.testing.expect(b.offsetUnsigned(a) == 30);
    const Idx16 = Index(16);
    try std.testing.expect(Idx16.Backing == u4);
    try std.testing.expect(Idx16.SignedBacking == i5);
    try std.testing.expect(Idx16.len == 16);
    const Idx17 = Index(17);
    try std.testing.expect(Idx17.Backing == u5);
    try std.testing.expect(Idx17.SignedBacking == i6);
    const IdxU16 = Index(u16);
    try std.testing.expect(IdxU16.Backing == u16);
    try std.testing.expect(IdxU16.SignedBacking == i17);
    try std.testing.expect(IdxU16.len == 65536);
}

test "Index wrap" {
    const MyIdx = Index(10);
    const start = MyIdx.fromUnverified(5);
    try std.testing.expectEqual(@as(u4, 2), start.addWrap(@as(u8, 7)).child);
    try std.testing.expectEqual(@as(u4, 8), start.subWrap(@as(u8, 7)).child);
    try std.testing.expectEqual(@as(u4, 0), start.addWrap(@as(u32, 25)).child);
    try std.testing.expectEqual(@as(u4, 7), start.subWrap(@as(i32, -2)).child);
    const U8Idx = Index(u8);
    const max_val = U8Idx.fromUnverified(255);
    try std.testing.expectEqual(@as(u8, 0), max_val.addWrap(@as(u32, 1)).child);
    const zero_val = U8Idx.fromUnverified(0);
    try std.testing.expectEqual(@as(u8, 255), zero_val.subWrap(@as(u32, 1)).child);
    const Tiny = Index(1);
    const idx = Tiny.fromUnverified(0);
    try std.testing.expectEqual(@as(u0, 0), idx.addWrap(99).child);
    try std.testing.expectEqual(@as(u0, 0), idx.subWrap(99).child);
}

test "Index panic 1" {
    const MyIdx = Index(50);
    const idx = MyIdx{ .child = 40 };
    _ = idx.add(20);
}

test "Index panic 2" {
    const MyIdx = Index(u8);
    const idx = MyIdx{ .child = 0 };
    _ = idx.sub(@as(u8, 1));
}

test "Index panic 3" {
    const MyIdx = Index(u8);
    const idx = MyIdx{ .child = 0 };
    _ = idx.sub(@as(i8, 1));
}

test "Index panic 4" {
    const MyIdx = Index(u8);
    const a = MyIdx{ .child = 20 };
    const b = MyIdx{ .child = 50 };
    _ = a.offsetUnsigned(b);
}

2 Likes

The problem with this is that there is no issue with intermediate values going outside the range, so long as the end result is in it.

Your code doesn’t allow that. zigs existing checks do. IDK about ADA, where op got this idea.

I don’t know the form it should take, but ranged integers, would bring a lot of benefit in my opinion, they reduce friction where it should be reduced, because encoding invariants should always be the “easy path”, it makes the code explicit, it would probably allow to catch a lot of issues at comptime, it could potentially be used in combination with optional, such that the optional part is stored in unused bits of the underlying backing integer type, It could reduce the cast noise, allow for loop iteration with negative ranges, it could potentially lead to code that’s safer and more optimized. It’s not a fundamental necessity of course, and the added complexity needs to be weight in, but I do believe it would bring a lot to the table.

2 Likes

I agree, and I am in favour of arrays using ranged ints for indexing. As well as wherever else they better encode invariants.

But IDK if the reasons expressed so far are good enough to make it happen.

I am arguing to help develop our understanding on the matter.

1 Like

Sure and I agree with you, it’s probably quite a bit of complexity on the compiler side, which is why it needs to be considered and discussed.

Currently it’s already possible to achieve a similar effect using enum as an alternative for protecting the indexing with a dedicated type, so one might argue that the language is already capable of doing that.

Which I would argue is true, but also against the Zen of the language, as what makes Zig so great is really it’s dedication to make the right way also the easy way. And nobody would argue against the fact that having compiler/language support is always more ergonomic than userspace solution.

So the real question is not whether ranged integers or alias types would be better, I think everyone would agree they are.

But whether the complexity is worth the outcome, from a user/developer standpoint.

To that I would probably say that the balance is a net/positive. because of

  • readbility / explicitness
  • typesafety / comptime checks
  • ergonomics
  • optimization opportunity (speed/code size)
  • potential reduction in noise (from casting)

with some potential downside that would be worth investigating.

  • compiler complexity
  • potential bugs on something quite fundamental (indexing in general) which could be challenging.
  • slower compilation
  • added complexity downstream for tooling too
  • added complexity in the language specification ?
  • will it make type resolution/inference harder ?

so yeah it’s not an easy win, but I still think it’s a path worth investigating seriously, and the quicker the better 0.17 would be really nice with this feature added. This way they can still walk back on it, or iterate to find the right flavor for the language.

4 Likes

Compared to allowing temporary index values that exceed the range, it might make more sense to calculate the final offset before obtaining the index.

However, wraparound operations on index are indeed a valuable operation, which I had overlooked, so I have supplemented the original implementation.

1 Like

This is already possible in Zig. Non-exhaustive enums as the index type work as a custom index.

const std = @import("std");

const Range = struct {
    min: comptime_int,
    max: comptime_int,
};

fn RangeArray(comptime range: Range, comptime V: type) type {
    const capacity = range.max - range.min + 1; // Be inclusive
    const offset = -range.min;
    return struct {
        arr: [capacity]V = undefined,
        const Index = enum(i32) {
            invalid = std.math.minInt(i32),
            _,

            pub fn from(i: i32) Index {
                if (i < range.min or i > range.max) {
                    return .invalid;
                }
                return @enumFromInt(i);
            }

            pub fn isValid(index: Index) bool {
                if (index == .invalid) return false;
                const i: i32 = @intFromEnum(index);
                if (i < range.min or i > range.max) return false;
                return true;
            }
        };
        pub fn get(self: @This(), index: Index) ?V {
            if (index == .invalid) return null;
            return self.arr[@intCast(@intFromEnum(index) + offset)];
        }
        pub fn set(self: *@This(), index: Index, v: V) void {
            std.debug.assert(index.isValid());
            self.arr[@intCast(@intFromEnum(index) + offset)] = v;
        }
        // If you are missing operator overloading or want to be creative...
        pub fn @"[]"(self: @This(), index: Index) ?V {
            return self.get(index);
        }
        pub fn @"[] = "(self: *@This(), index: Index, v: V) void {
            self.set(index, v);
        }
    };
}

pub fn main() void {
    const Arr = RangeArray(.{ .min = -90, .max = 90 }, isize);
    var arr = Arr{};
    arr.set(.from(12), -12);
    arr.set(.from(13), 13);
    arr.set(.from(-90), 3);
    arr.@"[] = "(.from(90), 900000);

    std.debug.assert(arr.get(.from(-100)) == null);
    std.debug.assert(arr.get(.from(12)) == -12);
    std.debug.assert(arr.get(.from(-90)) == 3);
    std.debug.assert(arr.@"[]"(.from(90)) == 900000);
}

3 Likes

The usual advantage of moving something into the type system: the compiler gracefully takes away the option of doing it wrong.

Index types are a great idea, and it took a lot of self-discipline for me to not fill another couple screens of text in #3806 about how great that part will be. It does kinda come along with the territory so, why lead the puck.

7 Likes

I’m coming late to this discussion, sorry if this has been covered. I think one of the best things in Pascal (yes, Pascal!) was being able to define a set of constants, such as Season being winter, spring, summer and fall (just like an enum) and then declare an array[Season] which could be indexed as: season[spring] += 1. It might be considered just syntactic sugar, but I think it made the language so much more expressive. I would love to have something like this in zig.

7 Likes

Ada also has “modular types”, which are a great fit for indices into arrays when the array represents a ring. Since Zig has +% and such, we shouldn’t literally need that, it just kind of falls out of having integer ranges which can be sized to an array length, and then using modular operators on that.

Looking at it the other way, because we have modular add, we shouldn’t have a type where + means +%. Since it exists, it’s clearer to use it.

3 Likes

Another aspect which I think hasn’t been mentioned: If you have distinct index types, this can prevent programming errors, e.g. when k: FooIndex = 1, it cannot be used for indexing into a Bar array.

Since Zig has for loops to iterate over elements without the need for an index variable, this is not as relevant as in other languages. OTOH, it is more relevant when arrays are used in “programming without pointers”.

5 Likes

As is the case so many times, @matklad has already written a nice article about this here:

3 Likes

Newtype index is a bit “we have index types at home” though.

An index type is a contract between an array and a complementary type: the array only allows indexing with that type, and the type only allows values which are valid indices for that array.

The newtype pattern doesn’t[1]. It does make it substantially easier to get things right, and, we do have it at home, so, yeah use it, by all means.

But if we do get ranged integer types, we can have the real thing, they’re a prerequisite. I think that would be great.


  1. Special case of a power-of-two array, which is a substantive special case to have! But a special case, and only coving “only valid slots”, not the agreement between types. ↩︎

Idk, but in >40 years of programming all kind of programs in quite a lot of languages (including Pascal way back then), I never missed this range restriction property of index types.

I see the point, but I honestly think introducing it into the language isn’t worth the effort, in particular for a language like Zig.

1 Like

I think this is the wrong framing, if you think about it no feature is really needed beyond what C provides, the point of a feature is to enable, enforce, or steer you in doing something a certain way, ranged integers are not needed per say that I agree, but they offer strong ergonomics, are expressive and readable, the whole point of Zig in my opinion is that you can read it and get it in one pass, ranged integers just adds more information, which is great, and again can help with optimization

2 Likes