Serializing a pointer as bytes

dude_the_builder · March 9, 2022, 8:04pm

If I want to store a pointer to a data structure like ArrayList as bytes, is it wise to do this:

std.mem.asBytes(list_ptr);

versus this:

std.mem.asBytes(&@ptrToInt(list_ptr));

What makes me think the second option is better is that the size of the resulting byte slice is always @sizeOf(usize) whereas the first option could change if the data structure changes in the future?

natecraddock · March 10, 2022, 2:50am

Note: I’m assuming you are serializing the bytes into something like a file, and then reading back at some later time. More context would help me (or others) give a better answer.

the first option could change if the data structure changes in the future

This is true. If the type pointed to changes, then the byte representation may be larger or smaller.

What makes me think the second option is better is that the size of the resulting byte slice is always @sizeOf(usize)

Your second example is only creating a list of bytes that represents the pointer address itself, not the value at that address. If I have a pointer that stores the address 0x205fac, then the output bytes in your example would be { 0xac, 0x5f, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00 } on a little-endian architecture. So the resulting slice does have a length of @sizeOf(usize), but I don’t think this is what you want. I could be wrong though

I think what you are implying by your question is the problem of writing a struct to a file, and supporting reading that file in the future if that struct layout ever changes. This is a hard problem, and I’m not sure of a “best” solution.

Here are some thoughts:

Try to change the struct as little as possible. Obviously requirements change and you cannot see the future, but put some reasonable effort into what the type requires to reduce future changes.
Don’t forget to use a packed struct which has a defined memory layout
Version your structs. If the first byte has some ID or version information you can read the first byte to determine how to read the remaining bytes. This would likely require keeping a “legacy” struct in your code if it ever changes, or just reading the required data a piece at a time.
If you expect to be doing a lot of serialization, it might be smart to design a system to handle the hard work for you.

I could share a lot more information, but before I do I would rather know if I am on the right track in answering

dude_the_builder · March 10, 2022, 10:45am

Those are really good thoughts on serializing. I especially found interesting the idea of versioning. But in my case, I’m just trying to reduce as much as possible the bytes stored in an array during program execution. Like in a string interning scenario where storing just the bytes of the address is less than storing the bytes for the pointer and the bytes for the length. This does require dealing with *[]const u8 instead of just []const u8 all the time, and only is useful in the running program because the address will change from one execution to the next. I’m wondering if I could somehow use smaller addresses, knowing I’m not going to need all that address space (i.e. from 64bit to 32 or even 16?)

kristoff · March 10, 2022, 2:19pm

To use smaller “pointers” you will need to do something like what Andrew does in the compiler: put stuff inside arrays and use indexes as pointers. If you can ensure that your array will never grow past a certain size, then you will be able to use a smaller int type to index into it. In Zig’s self-hosted compiler a zig file is required to be smaller than 4gb precisely because we’re using u32s to index into the token list.

natecraddock · March 10, 2022, 7:05pm

I thought I might have been going off in the wrong direction. Oh well! What @kristoff shared is definitely a good idea.