Thank you guys for all the input. It feels like I have aquired a footgun with a sensitive trigger here
@IntegratedQuantum I’ve seen that post, it felt like very related, but I couldn’t make the clear connection…
@dee0xeed that looks like a plan. In my case, the strings in question aren’t particularly long, and the maximum length can be determined upfront.
@dude_the_builder good catch, that one came in just recently, before, I was experiencing the issue described in the question only for the abbreviation of the tz. It’s the same problem I think. However, the allocator I’m using in the function you looked at is only used to store the content of the TZif file (the tz rules essentially), not the names.
@LucasSantos91 could you elaborate a bit on this? I’m having trouble to see where this self-reference is an issue; assuming I use a buffer to store the bytes of the name of the struct’s instance, that name describes just this instance. So there shouldn’t be a conflict, no?
I am not @LucasSantos91, may be they meant other issues . One problem I see with this self-referential struct is copying. If the obj is copied, one needs to be careful to properly set the str slice in the copy to point to the new buffer. Otherwise the str will point to the old buf in the source object. Storing index or length would not have this problem. Rust, for example, refuses to compile self-referential structs and requires unsafe to deal with them. Self-ref structs are useful but, as anything containing long-living pointers, should be treated carefully.
Sorry @dee0xeed my brain misfunctioned and I thought we are talking about in place resize.
realloc only works sometimes so it can’t be your only strategy, you could use realloc with a dedicated FixedBufferAllocator, but at that point you could just use a BoundedArray(u8) instead, for example using its fromSlice method.
Also if your realloc succeeds, you leak memory.
You are missing defer _ = gpa.deinit() which is why the gpa never gets deinitialized, keeping it from complaining about the memory leaks (in debug mode).
If you don’t specify options gpa picks defaults based on your -Doptimize=... optimization mode, resulting in no checks for ReleaseSmall and ReleaseFast.
If you actually specify:
var gpa = std.heap.GeneralPurposeAllocator(.{ .safety = true }){};
You always get the safety checks, but we are getting of topic here.
My point was, that I don’t like using the gpa without calling its deinit method in an example, because it could lead people to think, that your code doesn’t have any leaks, when you are just not checking for them. So I think you should always use deinit. And if you intend to leak, then use an arena to make it obvious to the reader.
ok, so what about this option: storing the string’s data in a buffer within the struct, then have a function that returns a []const u8 pointer to get the “string representation”?
Example from the TZif parser:
pub const Timetype = struct {
// some more fields...
name_data: [6:0]u8,
pub fn abbreviation(self: Timetype) []const u8 {
return std.mem.sliceTo(self.name_data[0..], 0);
}
// some more methods...
}
I’ll have to do some tests again but I remember having some issues (invalid output) from this as well.
A far as I could understand this is somewhat similar to what BoundedArray (which has been mentioned 2 times already) is doing. But I did not understand how to overwrite buffer (from the beginning) using it’s API. Should one use Writer interface?
Good question, actually. Perhaps something like this: std.debug.print("{s}", .{s.buf[0..s.strlen]}); assuming that struct s contains array buf and actual string length strlen.
No, it was really stupid question (that was a sort of temporary mental cloudiness from my side), one just have to to take a slice, no problem - see my last example, get function.
Your get function is a very useful pattern for these use-cases. Short-lived temporary slices attached to arrays are idiomatic. But it could become problematic if a slice outlives its parent or points to the wrong side of the copy.
I may be missing some larger context here but why not simply store the bytes in an array list and upon update clear while retaining capacity? If the update is smaller than the original, no new allocs are performed. If it’s larger, you expand.
If on the other hand you want to store multiple strings, you could use a single array list of bytes and return start index and length for any newly added string to avoid use-after-move issues.
I checked it with 0.12.0-dev.1769+bf5ab5451 (which I currently have on my machine, a bit outdated), works as with 0.11.0. It follows, something has been changed henceforward.
I don’t think it’s a good idea.
Yes, it stops working correctly, I got this output:
$ ./toy-str
(3 bytes)
bbbb (8 bytes)
Lengths are correct, but not the buffer content.
But I can not wise up quickly why it happens like this.
Inside function everything is correct.
You put a copy on stack, made a slice of that copy,
returned that slice, but stack is already changed
and that slice became invalid.
Something like this.
So, pass an instance by reference, not by value.