Best practice: pre-allocate and reuse memory

Hey,

as a small private project I’m thinking about building a TUI interface for managing my files (although there are already plenty out there). It should also be a good chance for learning some more Zig.

So far, I already have built some TUIs, but all with Rust and ratatui framework. And while its a great framework, its often heavy on CPU usage which might be due to stuff like allocating (just an unqualified guess)…

Now I’m wondering whats the best way to deal with that in a language like Zig with manual memory management. Since I’ve no experience with other such languages, it would be great to hear some hints/concepts/ideas etc.

I have not written any code yet, so there is nothing to show. But almost every TUI has a similar concept:

  • a relativley fixed terminal window which represents kind of a grid (columns and rows). (I leave aside resizing the window for now)
  • this grid renders chars/symbols and some escape codes (for coloring/cursor stuff etc.)
  • this rendering is repeated constantly. E.g. on input, which changes the rendered interface, on tick, to automatically render changes happening through polling etc.
  • rerendering is only fully stopped when the program ends (and all resources are released)

In the end this means the total size of bytes “allocated” by the grid has more or less an upper bound which can be calculated and is releases at the end of the program. Every rerendering should reuse/overwrite the former grid (or only parts of it, if only changed cells are written).

Normally, such a TUI rendering runs in a loop which tracks inputs and ticks. Thus, in my imagination it should be possible to pre-allocate some memory for the grid and reuse it on every loop-run instead of always allocating in-time and releasing before the next loop-round.

So far, I haven’t decided if I’ll use a TUI lib or if I should write the rendering code myself. This also depends on the thoughts regarding memory-architecure. I’ve read through some forum posts (e.g. this and this) and also looked through Zig docs.

But my experience is just too small to make a final decision. E.g. is StackFallbackAllocator or FixedBufferAllocator a valid option for such a case? Or should it be more flexible? And while I’m always willing to learn from mistakes, it would be hard to rewrite big portions only because the initial decision was wrong. Furthermore, I don’t want to use LLMs for that since I’m no big fan. I prefer to figer it out on my own with inter-human support by the community :smiley:

I know this post is kind of unspecific (sorry for that). Nevertheless, I’m thankful for every hint. And be it only a link to a good explanation I’ve missed so far.

3 Likes

The specific situation may need to be analyzed according to the architecture, but I often use this practice: each thread holds a circular buffer of a StackFallbackArena (adapted from StackFallbackAllocator), obtains an Arena from it when a task begins, and returns it when the task ends. A thread supports multiple concurrent tasks.

1 Like

Yes, pre-reserving a buffer that you write into for rendering is basically always a good idea.

With the very 80s concept of rendering fixed-size, tiled “characters”, you may want to go with an 80s-style approach, where you actually reserve two different buffers for rendering:

  • A buffer of size “(screen width / character width) * (screen height / character height)” which is used for character IDs
  • A buffer of size “number of character types * character width * character height” which is used for pixel values for each character

This is basically how old consoles like the SNES worked, with the caveat that the characters used index colour and stored the palettes in yet another buffer (known as CGRAM).

It’s not really necessary to use StackFallbackAllocator or FixedBufferAllocator to reserve a big buffer - since you only ever need to allocate one buffer, you can easily use a normal DebugAllocator for that and it’ll be just as fast.

Faster still is to just declare it as a var and let the operating system reserve it for you.
But a good reason not to do that is if you want to implement window resizing - it’s not too bothersome to detect a window resize, allocate a new buffer with the number of characters required for the new width and height, row-by-row memcpy the old values into it and free the old buffer. And, replacing the old buffer with the new buffer will mean that it’s still valid to free it on program exit.

A super simple struct for resizable, allocated 2D buffers might look like this:

const CharacterBuffer = struct{
  ids_ptr: [*]u32,
  width: u16,
  height: u16,
  
  /// Caller owns the returned memory.
  pub fn init(alc: std.mem.Allocator, width: u16, height: u16) error{OutOfMemory}!CharacterBuffer {
    const ids = try alc.alloc(u32, @as(usize, width) * @as(usize, height));
    @memset(ids, 0); // Initialise to an empty buffer
    
    return .{
      .ids_ptr = ids.ptr,
      .width = width,
      .height = height,
    };
  }

  /// Release all allocated memory.
  pub fn deinit(self: CharacterBuffer, alc: std.mem.Allocator) void {
    alc.free(self.ids_ptr[0..@as(usize, self.width) * @as(usize, self.height)]);
  }

  /// Resize and reallocate the buffer, freeing the former allocation.
  /// The placement of the old characters is left unchanged.
  /// This is not the most performant solution - the most performant solution would be to utilise
  /// std.mem.Allocator functions like remap() and realloc().
  pub fn resize(self: *CharacterBuffer, alc: std.mem.Allocator, width: u16, height: u16) error{OutOfMemory}!void {
    const old_memory = self.ids_ptr[0..@as(usize, self.width) * @as(usize, self.height)];
    defer alc.free(old_memory);

   // Initialise new memory as empty, then row-by-row memcpy our old memory into it.
    const new_memory = try alc.alloc(u32, @as(usize, width) * @as(usize, height));
    @memset(new_memory, 0);
    for(0..@min(height, self.height)) |y| {
      @memcpy(
        new_memory[y * width..y * width + @min(width, self.width)],
        old_memory[y * self.width..y * self.width + @min(width, self.width)],
      );
    }

    self.ids_ptr = new_memory.ptr;
    self.width = width;
    self.height = height;
  }
};

For convenience, you could even have your character type be something like an enum, and have the buffer be a buffer of this enum instead of raw numbers, so instead of memsetting it to 0 on init, you could memset it to something like .BLANK.

3 Likes

by the way, shout out to libvaxis for great TUI ergonomics!

5 Likes

Just my 2

Start with a fixed Cell grid and preallocate current/previous frame buffers, mutate in place and avoid that heap work in the render loop? I think? I’d just use the normal allocator for app lifetime state. Where i’m only 90% is like and a resettable scratch allocato?

FixedBufferAllocator can be good for bounded scratch, but StackFallbackAllocator is not the main design decision here. But i could be off base, allocators make me trip

I might not even know what i’m tlking about i’m doing something with allocators too and i’m all goofed up

@tholmes Thank you for the detailed answer. That gives me a very good idea how this might work. If I find the time, I’ll try it out with a super simple implementation like your code example. And be it only for the purpose of learning.

Of course, I was looking out for some libs. And libvaxis seems to be the most polished and best maintained option. However, it still only supports 0.15.1. That might be fine for most cases, but since I want to write a little program which is heavy on interaction with files/dirs, I’m going to use the new std.Io interface of 0.16-dev. Otherwise, I would have to rewrite everything in the near future and since I’m still learning Zig, it doesn’t make much sense in my eyes to learn a deprecated interface for file interaction etc. Hopefully, libvaxis is soon to support 0.16, but I’m not sure if I want to wait for that to happen (which is, of course, totally my personal problem :grin: ).

Thats how I feel too, most of the time. However, thanks for your ideas

1 Like