Code review for two (small) structs related to vulkan buffers (no vulkan experience is required)

Hey friends, i have written two structs, one is a generic struct for vulkan buffers, and another one that tracks the buffer and is used as a function return type

in total, the two structs are 300 lines big (mostly due to switch statements, they’re really not that complex)

those two structs are used in my game project, the asset manager deals with the generic buffers via the tracked buffer struct, the tracked buffer structs is handed out to other structs like meshes, so creating a mesh is as simple as init(data) and deletion is deinit(self, allocator)

No knowledge of vulkan is needed. (although appreciated). vulkan’s setup is bindless descriptors.

tracker struct exists to abstract away the raw vulkan information and provide a more easy to deal with struct

(by the way, the structs may show support for textures, but that’s just placeholder, no texture support yet)

what i’m asking for a review is, is this a good use of generic structs? i feel like the use of generic, and the tagged unions, and the slowly growing switches are getting out of hand.

i also heap allocate the buffers, which isn’t great, but it’s the only way because of the fact i need a long-term pointer to them.

the link to the code is here.

https://codeberg.org/DuneTruffle/code-review/src/branch/main/two_structs.zig

2 Likes

If you encapsulate the pointer and BufferType enum, into a tagged union, you can get rid of the ptr casts; and more importantly, easily reduce that big switch with an inline prong.

Some of these may be personal opinion, but here goes:
As a concept, I don’t think this is a bad idea. The code can be improved though.

Your tracked buffer probably should use a union (BufferType).
Your init would stay similar, but in most other functions you can simply:

switch (self.buffer) {
   inline else => |buffer| buffer.method(...),
}

So to the personal nitpick:
init should never allocate the return value. It either should return by value, or you should use a inplace init instead.

inplace init gives the caller more flexibility about the storage, it also gives the init function a stable address as zig has no pinned struct concept:

var foo: Bar = undefined;
try foo.init(...);

pub fn init(self: *Bar) !void {
    self.* = .{ ... };
}

I would initialize these to undefined and drop the optional for these:

       .storage_index = null,
       .buf = null,
1 Like

Thats a good point, thanks! managed to merge those two fields together.

thanks for the nitpick, gave me an idea and i managed to remove the heap allocation altogether.
can’t remove optional from storage_index as it’s get a bit complicated and the best solution is to just keep it as an optional

I’m not sure that making it generic is buying you much. For instance, if two buffers differ only by binding, they will be different types, so you can’t put them in an array, and it may result in code bloat.
Vulkan took great care to do operations in batches, which is a big part of what makes it better than OpenGL, but your API is not leveraging that. updateSets, for instance, should receive an array of buffers to update. Of course, in order for this to happen, they’ll have to be the same type, and you can’t have two buffers with the same binding, so we’re back to the first point.
Remember that crossing API boundaries is an important bottleneck, so whenever you do, make sure you drag as much data as possible, so you’ll make fewer trips.

/// copies memory from cpu_data to the staging buffer's gpu memory
                    pub fn copyMemory(self: *Buf, gc: *const GraphicsContext, cpu_data: []const options.T) !void {
                        defer gc.dev.unmapMemory(self.memory);

                        const gpu_data: [*]options.T = @ptrCast(@alignCast(self.data));
                        @memcpy(gpu_data, cpu_data);
                        try gc.dev.flushMappedMemoryRanges(&.{vk.MappedMemoryRange{
                            .memory = self.memory,
                            .offset = 0,
                            .size = vk.WHOLE_SIZE,
                        }});

In this function, why copy the data? Just tell the user to write the data at the correct address in the first place.
Also, same as before, you can flush a whole bunch of data at once, so why one slice at a time? With a single call, you could update multiple regions, of multiple buffers.
I don’t get why you’re unmapping the memory here. I don’t see where it was mapped, but I presume it was during initialization. If you don’t map it again before using it, that’s undefined behavior. But you shouldn’t be mapping and unmapping memory in the middle of a program. Map it once, during initialization, and unmap it once, at deinitialization. In between, just reuse the same address.

2 Likes

im not sure im following, it’s a C pointer to a slice, i tried a bunch of ways to set cpu_data to it directly, but the only way that works is by memcpy. i’d appreciate further elaboration.
need to also copy it by value, as cpu_data is soon freed after this function.

as for batching, i’m working on it. thinking of a system where after sending draw calls, i init an array of GPU commands, anything that requires the GPU gets its commands appended to the array, and just before the next draw call, i submit all the commands at once and wait for them. thoughts?

^ i would be doing the same thing to updateSets

as for the unmap memory thingy, good catch, i actually didn’t notice it being there.
as for the generic comment, yeah i agree, i don’t want it to be generic, as the buffer has the same fields regardless, it’s just that the functions get different parameters. and the best way i thought to resolve this is by using a generic type.

  1. Staging buffers are a poor idea on modern hardware. RBAR basically obsoletes them and makes streaming of assets dramatically easier. In addition, you can do a lot of buffer processing on the CPU before shoveling it over to the GPU. I don’t know about mobile, but I’m assuming you’re not on mobile if you’re using Zig.

  2. Vulkan has a very limited number of allocations that it will support. Vulkan wants you to map a small number of very large memory chunks and then parcel them up yourself. Zig and Vulkan are quite compatible on this front. However, it does mean that you want a small number of VkBuffers and to lean into vkGetBufferDeviceAddress much, much more strongly.

  3. Don’t write Vulkan like the C++ guys do with terminal object-itis (which this looks very much like). This is Zig–write like the C guys. Think in terms of functions, arenas, and data flow rather than objects, ownership, and allocation. For example, you should be able to move a single pointer/index and wipe out all the allocations for a single frame. If you have to individually deallocate them one by one, that’s working against both Zig and Vulkan.

  4. Descriptor sets have multiple ways of doing things and baking that handling in here is likely a bad idea for the future.

  5. vk.WHOLE_SIZE is NOT your friend–track the actual number. You don’t know what Vulkan thinks vk.WHOLE_SIZE is unless you shove your own number in and Vulkan barks because it disagrees.

  6. This is purely my opinion, but, in general, don’t abstract Vulkan like this–it’s not worth the effort and grief. Your abstraction needs to operate at a level higher than this to be useful.

4 Likes

Instead of asking the user to pass you the data after it is ready, the user should ask for the pointer before it’s ready. Instead of writing the data on their end and then passing it to you, they’ll write directly at the data’s final address, so there’s no copying.

Could ypu explain how to use RBAR with Vulkan? I’ve heard a lot about it, but not much about actually using. Also, my understanding is that RBAR support changes according to GPU manufacturer, with different sizes and performance characteristics. Is that true? If so, how you handle those differences?

The standard Vulkan memory way? You hunt for a memory type with MEMORY_PROPERTY_DEVICE_LOCAL_BIT | MEMORY_PROPERTY_HOST_VISIBLE_BIT | MEMORY_PROPERTY_HOST_COHERENT_BIT that also points to a memory heap with >256MB size. The memory you get back is RBAR (Resizeable BAR). To the end programmer, it’s just memory.

As for differences, this is no harder than managing the differences between staging buffers. You couldn’t “guarantee” that a staging buffer was 256MiB, so you had to deal with the card giving you a weird memory size anyway only this time it might be bigger rather than smaller. If anything, this is better because staging buffers generally had their own submit queues which had all manner of weird performance quirks (load the staging area via PCI and then use some internal queue to copy that memory somewhere else) while RBAR simply runs at the PCI bus speed of your graphics card (which is the limit anyway) and puts the data directly where it needs to be.

Things that you can do on the CPU are all covered by VK_EXT_host_image_copy which is orthogonal to (but may take advantage of) RBAR.

1 Like
  1. i did some brief research on ReBar, for my use case, i think it could work for per-frame allocations and maybe some others, but i think i can get rid of this implementation entirely if point 2 is implemented
  2. I am looking into buffer device address, thank you for the advice. i think with it, i can get rid of updating descriptor sets, and a single flushing of memory would be enough, if i understand it correctly.
  3. That is true, but hard to work on due to how broad it is, i think you’re referring to per-frame allocations and yeah i can see that working greatly, i think with buffer device addresses it could work better with regards to resources that live for more than a frame. i would have to see.
  4. That’s fine, it’s not for an engine or for someone else to interact with.
  5. that is a good point, thank you.
  6. i think with buffer device addresses, this abstraction would be entirely removed.
1 Like

The main problem is that Vulkan documentation is garbage. Annoyingly, AI has helped a lot on this front. Modern Vulkan has sort of converged to SPIR-V and Slang with the extensions VK_KHR_buffer_device_address, VK_KHR_dynamic_rendering, VK_EXT_host_image_copy, and VK_EXT_extended_dynamic_state.

The main thrust of buffer device addresses is that you don’t have so stupidly much ceremony to get to the thing you want and pass it around. Things look much more like an IPC (inter-process communication) call instead of a graphics card transaction. And, to be honest, having buffer device addresses makes the obvious limitations of VkImage super annoying. This is especially so now that Windows has converged to SPIR-V support and Slang seems to have enough momentum.

I really ought to sit down and write a “Vulkan For Those Who Hate C++ and 3D” tutorial (the problem is I think the worldwide audience would be maybe 2 people :thinking: ). There is an awful lot of stuff you can do with a modern graphics card with compute shaders and buffer device addresses and never touch a triangle and the associated 3D pipeline units.

The vk.WHOLE_SIZE and abstractions are just painful advice from someone who has been there before. Not even the Vulkan driver writers always agree what it means. Similarly the abstraction, I have written your abstraction (several different times) and I always regret it and wind up tearing it out for something higher level. It’s like writing a “Lock” or “synchronized” in Java; you will rip that out and use something from java.util.concurrent so it’s better to think harder and start with the higher abstraction.

4 Likes

Bit off-topic but good point about “modern vulkan”.
Is very good guide on how vulkan looks with “modern” extensions.

Modern quoted, because all this works on very old GPUs, but the extensions itself simplify vulkan usage quite a bit.

2 Likes

The big problem I have with Vulkan articles like that one is their usage of the VMA (VulkanMemoryAllocator).

Needing the VMA was also more of an issue back before RBAR (resizeable bar) and buffer device addresses. If your allocations are limited in size, you have to twist yourself into a pretzel to create pools and index frames to those pools and hit the right transfer queues and … If you have RBAR, you allocate a gigabyte, point into that with buffer device addresses, and bump allocate. Zig people especially resonate with that (funny, no?).

In addition, knowing your memory access patterns is vital for writing GPU stuff. If you are abstracting that, Why are you subjecting yourself to Vulkan? There are so many easier libraries and toolkits to use that if you don’t need the performance (aka DirectX 11 levels), you should use those.

My analogy is that using Vulkan is like using a user-space networking stack. Yes, you can get maximum performance, but the pain is significant.

1 Like

Yeah, I don’t like how often people are directed at VMA. But it might be C++ culture thing.

update: i am at a basic BDA implementation, i am really surprised at how easy it was to write, and not to mention, pointers with Slang are easy as pie.
being able to replace almost all shader resource declarations with just one pointer from a push constant feels great.

the only draw back it seems is that you have write an allocator implementation that also defrags memory without invalidating pointers to used memory.
Not terribly difficult but that is one more thing to write for this big project.

oh, also, with this new BDA implementation, the buffer implementation in the repo is basically removed. which is great.

i unfortunately have to use the old way for textures and images, but besides that. it’s working great.

3 Likes

I’d read it.

3 Likes

audience of 4 people now.

3 Likes

I would for sure read such a tutorial!!

1 Like

update: i also just realized something, memory is 1 to 1, no std430 or scalar data layout, you get the exact same data you sent from the CPU. that’s awesome!

i had so many bugs related to memory alignment over the many months of graphics programming, and while i gotten good at it, it’s good to have the drivers assume less on my behalf.