Is there something I can do to improve the perfomance of my program in debug builds?

LaBatata101 · October 23, 2025, 11:30pm

I’m making a NES emulator and after implementing this code (I basically added an interface with a VTable) the debug builds became unusable due to the performance hit. I’m using the LLVM backend for the debug builds.

Is there something I can do to improve the performance?

jumpnbrownweasel · October 23, 2025, 11:53pm

If you’re asking which code changes might have caused the slowness, I suggest profiling.

However, this new logging is a little suspicious:

    std.log.debug("Mapper 0: Attempted write to PRG ROM at ${X:04} = ${X:02}", .{ addr, value });

LaBatata101 · October 24, 2025, 12:28am

I don’t think this logging is the culprit, I didn’t see it when executing the emulator.

Naois · October 24, 2025, 2:48am

I’ve been meaning to familiarise myself with some performance profiling tools, and I thought this would be a good chance to try out flame graphs.

It looks like a large amount of time is being spent on memcopies in the PPU. If I had to guess why, it’s because of this function declaration in ppu.zig:

 fn mirror_vram_addr(self: Self, addr: u16) u16 {

Which should be

fn mirror_vram_addr(self: *Self, addr: u16) u16 {

Essentially every time you read a mirrored address, you’re copying all of VRAM and everything else in the ppu struct.

I’m not sure what there is in the way of safety systems to prevent these kinds of mistakes, perhaps a more experienced zig user could point out what options there are for linting or otherwise preventing these?

squeek502 · October 24, 2025, 4:30am

There’s a few issues that could be relevant (and probably more that I’m missing):

Pass by reference "optimization" copies the entire struct on the stack when taking its address. (this didn't happen in stage1) · Issue #16343 · ziglang/zig · GitHub
Inefficient handling of initialization to undefined with structs, unions, optionals and error unions · Issue #24313 · ziglang/zig · GitHub
sometimes there is an unwanted memcpy when passing large structs by-value · Issue #17580 · ziglang/zig · GitHub

More importantly, though, it’s worth noting that parameter reference optimization is (largely) going away, so making an intentional choice between T/*T/*const T in code now will future-proof it:

github.com/ziglang/zig

eliminate hidden pass-by-reference footguns

opened 08:40PM - 01 Aug 20 UTC

ghost

proposal accepted

[Accepted Proposal](https://github.com/ziglang/zig/issues/5973#issuecomment-2380…332493) ----- Zig 0.6.0 (not master). This is related to, actually maybe a subset of, https://github.com/ziglang/zig/issues/4021 / https://github.com/ziglang/zig/issues/3696 (this issue doesn't involve result copy elision). I understand that this was intended to be a feature of zig: args passed as values "may" be silently translated to pass-by-reference by the compiler. I think the intent was to stop the user from passing const pointers as an "optimization". But it's also a footgun, sort of like https://github.com/ziglang/zig/issues/2915. The problem occurs when you have another non-const pointer aliasing the same memory as the argument value. ```zig const std = @import("std"); const Thing = struct { value: u32, }; const State = struct { thing: Thing, }; fn inner(state: *State, thing: Thing) void { std.debug.warn("before: {}\n", .{thing.value}); // prints 10 state.thing.value = 0; std.debug.warn("after: {}\n", .{thing.value}); // prints 0 } pub fn main() void { var state: State = .{ .thing = .{ .value = 10 }, }; inner(&state, state.thing); } ``` The behavior here depends on the compiler implementation. It seems that right now, if `thing` is a struct value, it's passed by reference. But if it's a bare `u32`, it's passed by value. I don't know if it will always be this simple (I assume there are plans to pass "small" structs by value.) The workaround for this situation is to make an explicit copy using a redundant-seeming optimization, probably accompanied by a comment explaining what's going on. Or else to restructure the code at a higher level, but then this footgun will still be lurking in the shadows. I think that any optimistic "assume no aliases" optimization ought to be opt-in rather than opt-out. That would mean, either go back to the C way of things, or add a new syntax (some symbol that means "compiler can choose between value and const pointer"). Either way, a plain argument should always be passed by value. What do others think?

tholmes · October 24, 2025, 5:21am

This probably isn’t a “safety system” in the way you were hoping, but what I’d do if forced to make an emulator would be just represent the PPU as a namespace-level variable containing other namespace-level variables instead of a struct instance with members - that way, access to it (and its members) is always by reference no matter what.

Naois · October 24, 2025, 6:06am

That’s probably good enough for most projects, but there might be a good reason for having multiple instances of the emulator running at once. At the very least, it gives you flexibility and extensibility for the future.
It seems to me like there are plenty of times where you could run into something like this. It would be nice to be able to mark a struct as pass-by-reference only, for circumstances like these, where you would prefer to explicitly memcpy when necessary. Perhaps there are existing linting tools for this, but I’m not familiar with any.

floooh · October 24, 2025, 7:02am

Another thing that emulators might stumble over with the new x86 backend is that large switch-statements (for instance in CPU decoders) might not be compiled into a jump table but instead do a linear search (it’s a bit more nuanced than this though, there must be some gaps or no-op prongs - the LLVM backend can deal with those, while the self-hosted x86 backend has different heuristics for when to switch to a jump table).

In any case, this totally killed the debug performance in my emulator to a point where it can no longer run in realtime. I last checked a couple of weeks ago though, not sure what the current state is.

npc1054657282 · October 24, 2025, 9:58am

When programmers use syntax like fn mirror_vram_addr(self: Self, addr: u16) u16, I assume this emphasizes the read-only nature of self. Therefore, I think an appropriate change would be fn mirror_vram_addr(self: *const Self, addr: u16) u16.
I’m wondering, compared to parameter reference optimization, isn’t it safer to have the inverse optimization semantics? For example, if programmers are encouraged to always write self: *const Self function parameters instead of self: Self, the program might optimize to the latter when Self is small, rather than the other way around, optimizing to the former when Self is large. On the other hand, if programmers use self: Self, it indicates that they are emphasizing the potential for aliasing and therefore shouldn’t be optimized.

matklad · October 24, 2025, 11:02am

github.com/carbon-language/carbon-lang

docs/design/values.md

trunk

# Values, variables, and pointers

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

<!-- toc -->

## Table of contents

-   [Values, objects, and expressions](#values-objects-and-expressions)
    -   [Expression categories](#expression-categories)
        -   [Value acquisition](#value-acquisition)
        -   [Direct initialization](#direct-initialization)
        -   [Copy initialization](#copy-initialization)
        -   [Temporary materialization](#temporary-materialization)
-   [Binding patterns and local variables with `let` and `var`](#binding-patterns-and-local-variables-with-let-and-var)
    -   [Local variables](#local-variables)

This file has been truncated. show original

LaBatata101 · October 24, 2025, 12:54pm

Thanks for taking the time to profile the code.

Naois · October 24, 2025, 12:58pm

No worries. I had a lot of fun poking around in the project, it’s very cool so far.

jumpnbrownweasel · October 24, 2025, 5:07pm

eliminate hidden pass-by-reference footguns

I have a question about this and I don’t want to pollute the github issue so I’m asking here. My question is about this part:

The Zig compiler’s optimizer will gain the ability to notice that a function is pure, and promote parameters to references accordingly. This will solve the use cases which PRO set out to solve: generic implementations of functions like hash or eql, which are typically pure, can pass their generic parameters by reference.

I’m having trouble understanding why this is necessary, since with the removal of PRO we must be careful at all times to pass a *const instead of a value for large structs. Does anyone have an example showing why the generic case is special?

alanza · October 24, 2025, 5:40pm

i’m not sure, but I suspect that the problem is that those functions are declared with T as their parameter type, but T frequently is such that PRO would be a useful optimization.

jumpnbrownweasel · October 24, 2025, 5:55pm

Yes, thank you, and I understand that in an abstract sense. My question is under what conditions T would not be a *const X (or *X) when X is a large struct, if we’re all following the rule of passing large structs by reference.

Edit: Maybe the answer is “you can do almost anything with comptime, so it’s possible for T to be a large struct”. If so then I can accept that, I was just wondering if someone had run into a case where this this can happen, since it was a motivation for PRO in the first place.

alanza · October 24, 2025, 6:37pm

let me expand my answer, since I thought it was clear but it might not be: std.HashMap(T) stores values of type T and uses user-provided functions (or library provided functions in the case of AutoHashMap) with T in their signature. the std.HashMap structure is useful even when T is large enough to benefit from PRO, but does not currently special-case the expected signature of these functions.

In other words, you need to write a function fn hash(item: T) u64 (or whatever the real signature is), even when T is large enough that the copy is onerous.

jumpnbrownweasel · October 24, 2025, 6:41pm

Thank you, that’s a good example! I don’t know why it wasn’t obvious to me but I guess I needed something concrete to make it clear.