Can we get rid of usize?

So I’ve got a generics based serializer for MessagePack. With tons of abuses of generics all over the place :).

One of them is this:

pub fn EncodeError(comptime T: type) type {
    if (@sizeOf(usize) > 4 and containsSlice(T)) {
        return error{
            NoSpaceLeft,
            // MessagePack only supports up to 32 bit lengths of arrays.
            SliceLenTooLarge,
        };
    } else {
        return error{NoSpaceLeft};
    }
}

/// Encode the value to MessagePack bytes.
pub fn encode(value: anytype, out: []u8) EncodeError(@TypeOf(value))![]u8 {
    return try encodeCustom(value, out, .{});
}

The reason for this code is that MessagePack does not support arrays with lengths longer than max unsigned 32 bit integer.

So the length type of a slice matters to me, and its usize. I’m a little miffed about this, because every time I use a slice I need to think “what architecture am I on”. This kind of goes against the zen:

  • Edge cases matter. (there is some friction here in handling usize edge cases).

but usize also supports the the vague unwritten zen I have in my head:

  • Zig is not a language that attempts to abstract away the properties of the hardware. (The hardware has a pointer size).

related:

Also related:

Another primary need for usize is for de-bloating generics. Imagine if ArrayList and HashMap data structures were also generic across the index type (currently hard-coded to usize). That would be a lot of unnecessary bloat. usize helps the same machine code to be able to handle multiple different max capacities.

For instance, your arrays limit to u32 capacity, however, someone might use your package in an application that otherwise uses arrays with u64 capacity. In this case usize means the same machine code can serve both use cases, although it will require use of @intCast in some places. That’s the tradeoff.

Going back to that issue that I linked, I mentioned this:

The address spaces would be user overridable in the root source file. This would be especially useful for a freestanding target.

This would effectively be a way for an application to choose the size of usize, and consequently the max capacity of all data structures that map their storage to virtual memory addresses.

4 Likes

I think the fact that the native pointer size is 64 bits and msgpack supports only 32 bits lengths should not be mixed together. What you should do is try to std.math.cast the usize to u32 and error if the cast is not possible.

Use the uNN types when you need exact sizes, use usize when you need the native pointer size.

How the application manages their lengths so that they won’t go over u32 limit for msgpack de/serialization will be the application level code problem.

I would not special case the error set here with generic function, as this would make a programmer on a 32bit platform ignore the 64bit platform error scenario. If they only target 32bit platform, they can explicitly point that error scenario to unreachable.

3 Likes

I realized this a while later and removed the usize check. I think you are right that it would just make it more difficult to write code for multiple platforms.

pub fn EncodeError(comptime T: type) type {
    if (containsSlice(T)) {
        return error{
            NoSpaceLeft,
            /// MessagePack only supports up to 32 bit lengths of arrays.
            /// If usize is 32 bits or smaller, this is unreachable.
            SliceLenTooLarge,
        };
    } else {
        return error{NoSpaceLeft};
    }
}
1 Like