Is there something I can do to improve the perfomance of my program in debug builds?

I’m making a NES emulator and after implementing this code (I basically added an interface with a VTable) the debug builds became unusable due to the performance hit. I’m using the LLVM backend for the debug builds.

Is there something I can do to improve the performance?

1 Like

If you’re asking which code changes might have caused the slowness, I suggest profiling.

However, this new logging is a little suspicious:

    std.log.debug("Mapper 0: Attempted write to PRG ROM at ${X:04} = ${X:02}", .{ addr, value });

I don’t think this logging is the culprit, I didn’t see it when executing the emulator.

I’ve been meaning to familiarise myself with some performance profiling tools, and I thought this would be a good chance to try out flame graphs.

It looks like a large amount of time is being spent on memcopies in the PPU. If I had to guess why, it’s because of this function declaration in ppu.zig:

 fn mirror_vram_addr(self: Self, addr: u16) u16 {

Which should be

fn mirror_vram_addr(self: *Self, addr: u16) u16 {

Essentially every time you read a mirrored address, you’re copying all of VRAM and everything else in the ppu struct.

I’m not sure what there is in the way of safety systems to prevent these kinds of mistakes, perhaps a more experienced zig user could point out what options there are for linting or otherwise preventing these?

10 Likes

There’s a few issues that could be relevant (and probably more that I’m missing):

More importantly, though, it’s worth noting that parameter reference optimization is (largely) going away, so making an intentional choice between T/*T/*const T in code now will future-proof it:

4 Likes

This probably isn’t a “safety system” in the way you were hoping, but what I’d do if forced to make an emulator would be just represent the PPU as a namespace-level variable containing other namespace-level variables instead of a struct instance with members - that way, access to it (and its members) is always by reference no matter what.

That’s probably good enough for most projects, but there might be a good reason for having multiple instances of the emulator running at once. At the very least, it gives you flexibility and extensibility for the future.
It seems to me like there are plenty of times where you could run into something like this. It would be nice to be able to mark a struct as pass-by-reference only, for circumstances like these, where you would prefer to explicitly memcpy when necessary. Perhaps there are existing linting tools for this, but I’m not familiar with any.

1 Like

Another thing that emulators might stumble over with the new x86 backend is that large switch-statements (for instance in CPU decoders) might not be compiled into a jump table but instead do a linear search (it’s a bit more nuanced than this though, there must be some gaps or no-op prongs - the LLVM backend can deal with those, while the self-hosted x86 backend has different heuristics for when to switch to a jump table).

In any case, this totally killed the debug performance in my emulator to a point where it can no longer run in realtime. I last checked a couple of weeks ago though, not sure what the current state is.

3 Likes

When programmers use syntax like fn mirror_vram_addr(self: Self, addr: u16) u16, I assume this emphasizes the read-only nature of self. Therefore, I think an appropriate change would be fn mirror_vram_addr(self: *const Self, addr: u16) u16.
I’m wondering, compared to parameter reference optimization, isn’t it safer to have the inverse optimization semantics? For example, if programmers are encouraged to always write self: *const Self function parameters instead of self: Self, the program might optimize to the latter when Self is small, rather than the other way around, optimizing to the former when Self is large. On the other hand, if programmers use self: Self, it indicates that they are emphasizing the potential for aliasing and therefore shouldn’t be optimized.

1 Like

Related:

And:

4 Likes

Thanks for taking the time to profile the code.

1 Like

No worries. I had a lot of fun poking around in the project, it’s very cool so far.

3 Likes

eliminate hidden pass-by-reference footguns

I have a question about this and I don’t want to pollute the github issue so I’m asking here. My question is about this part:

The Zig compiler’s optimizer will gain the ability to notice that a function is pure, and promote parameters to references accordingly. This will solve the use cases which PRO set out to solve: generic implementations of functions like hash or eql, which are typically pure, can pass their generic parameters by reference.

I’m having trouble understanding why this is necessary, since with the removal of PRO we must be careful at all times to pass a *const instead of a value for large structs. Does anyone have an example showing why the generic case is special?

i’m not sure, but I suspect that the problem is that those functions are declared with T as their parameter type, but T frequently is such that PRO would be a useful optimization.

Yes, thank you, and I understand that in an abstract sense. My question is under what conditions T would not be a *const X (or *X) when X is a large struct, if we’re all following the rule of passing large structs by reference.

Edit: Maybe the answer is “you can do almost anything with comptime, so it’s possible for T to be a large struct”. If so then I can accept that, I was just wondering if someone had run into a case where this this can happen, since it was a motivation for PRO in the first place.

let me expand my answer, since I thought it was clear but it might not be: std.HashMap(T) stores values of type T and uses user-provided functions (or library provided functions in the case of AutoHashMap) with T in their signature. the std.HashMap structure is useful even when T is large enough to benefit from PRO, but does not currently special-case the expected signature of these functions.

In other words, you need to write a function fn hash(item: T) u64 (or whatever the real signature is), even when T is large enough that the copy is onerous.

1 Like

Thank you, that’s a good example! I don’t know why it wasn’t obvious to me but I guess I needed something concrete to make it clear.