Key semantics of std.debug.assert

I am utterly confused… and will study this thread.
Wondering if i should use
if (comptime lib.is_paranoid) assert(bitpos > 0);
or just leave the asserts as is.

As I mention a couple of posts up (emphasis added here):

Broadly speaking, my advice is to assume that the optimizer will do the right thing. If you have a performance problem, and discover that the optimizer has failed to make optimal choices, then go ahead and add an explicit check (with a comment documenting the reason for its existence).

I can essentially guarantee that LLVM will elide bitpos > 0 in 100% of cases, and even if it didn’t, you would not be able to measure the performance delta.

Ok. understood.
When writing a chess engine every cycle counts.
The confusing thing - for me - seems that an assert can actually give (or am I wrong?) a hint to the compiler.
Just as an experiment I use the above is_paranoid before each and every assert and we go down 10 million nodes per second.

In any case: I can throw away the assumption that assert does not exist in releasemodefast.

You are right from a technical perspective.
But programming languages are made for mere humans.
And it’s a fact that assert is working so very differently in Zig in comparison to other languages (even though the difference is just: it’s no different from all other functions in Zig) confuses even experienced developers who like the language and try their best.

So would it hurt to add a sentence or two to the documentation for newcomers?

I don’t think this is ridiculous.

5 Likes

It’s reasonable to document the detailed implications of assert in different release / optimization modes, and provide the guidance on how to achieve mode-dependent behavior similar to many common languages.

It doesn’t need to go into a comparison with other languages that have their own variations on assertions, whether that’s C, Python, Eiffel, etc.

Same kind of thing when documenting packed should explain the different types of packing (bit-packing, padding removal, … ) and how to achieve them (even just a note where to find the info on the other kind packing) would add clarity and take the docs from minimal reference to useful application info.

There is no need to be dismissive or rude, and conflating this one specific topic with every function in the library is very disingenuous, and a strawman of what is being suggested here.

I personally understand the difference now, so it wasn’t for my sake I suggested a minor edit to a comment of a single function. Even if maximum-level pedantry has deemed that it is technically unnecessary, it seemed, even if only anecdotally from the thread, fairly common that some users were coming to the same incorrect conclusion for this one specific function, which could be easily prevented with essentially no effort or negative side-effects.

I will concede the issue, my intention was to keep the discussion constructive, and I feel like it is starting to devolve into something else.

7 Likes

Lets tone down the passions here. Ultimately we are just talking about the semantics of one function in the standard library. Try to cast thing charitably before responding.

If anything good does come from this discussion, it is that we have some clarity and challenging of mental models. We often forget what assumptions we bring to things.

5 Likes

In a major codebase I worked on many years ago, we built a variety of asserts to achieve various ends. Some were explicitly debug-only. Others were live in release modes. Some were terminating, others not. Some would trigger a breakpoint but not termination, or in debug builds, pop a dialog and offer the choice of breakpoint, terminate, or continue. In all cases these are assertions.

The docs for assert are a great place to go into some detail on what unreachable means, because it’s key to understanding what assert does and doesn’t do.

2 Likes

The current documentation about unreachable actually does call out that it is how assert is implemented.

2 Likes

I think this is a good motivation for fleshing out the langref section about unreachable a bit. This discussion shouldn’t be limited to assert() IMO, something like this is a common pattern:

switch (foo()) {
    .bar, .baz => {},
    else => unreachable,
}
1 Like

From the last hour experimenting things, it seems that in this kind of situations:

if (condition) {
    unreachable;
}

condition is always evaluated, as in the evalutation of condition is never elided, whatever is the current optimization mode. So when you call std.debug.assert, it seems the evaluation of the argument is never elided, side effects or not. If you assure me that I’m wrong, then I will take your words for granted and never ever bother you with that.

If it’s true, then AFAIU it isn’t missed optimization. unreachable in fact forces the evaluation.

I understand this (again, if it’s true) has nothing to do with Zig and is due to LLVM, and nothing in the documentation suggests that condition would be elided for sure. But in that case wouldn’t explicity checking optimization mode in std.debug.assert be a net win when using the LLVM backend? This is the question that makes me think I’m missing something. Maybe the semantics of unreachable differ between Zig and LLVM?

Just to be clear, I’m not suggesting that anything should be added / modified anywhere, nor that the docs are unclear. I’ve never had any doubt that assert is a normal function. I just want to make sure I’m not misunderstanding something about unreachable. Sorry if this isn’t the right time or place to ask.

I’m not much of a blogger, but felt inspired to attempt a summary of sorts: https://cryptocode.github.io/blog/docs/assert/

9 Likes

Are you sure that you are testing this properly? It should be very easy to find scenarios where the compiler optimizes out code used to compute an assert condition. Here’s one such example: https://zig.godbolt.org/z/nbjr9Mx66.

3 Likes

counter example

1 Like

A mutex qualifies as a side effect

Note that a mutex is not a fundamental side effect. It’s implemented as a syscall, which is implemented either as an extern function call, or as volatile inline asm.

Notably, if you compile with -fsingle-threaded, the Mutex implementation is simply checking a boolean, and therefore is no longer a side effect. example

6 Likes

Good point, I’ll clarify that section tomorrow

As demonstrated by your example, it seems obvious I’m not… Sorry for wasting your time.

What’s funny though:

const std = @import("std");

export fn demo(input: u32) u32 {
    std.debug.assert(junkCalculation(input) != 123456);
    return input * 123;
}

fn junkCalculation(input: u32) u32 {
    var prng: std.Random.DefaultPrng = .init(input);
    for (0..59) |_| {
        _ = prng.random().int(u32);
    }
    return prng.random().int(u32);
}

junkCalculation is elided, but:

const std = @import("std");

export fn demo(input: u32) u32 {
    std.debug.assert(junkCalculation(input) != 123456);
    return input * 123;
}

fn junkCalculation(input: u32) u32 {
    var prng: std.Random.DefaultPrng = .init(input);
    for (0..60) |_| {
        _ = prng.random().int(u32);
    }
    return prng.random().int(u32);
}

And junkCalculation is not elided.

I also noticed that your example doesn’t optimize out the call in ReleaseSmall.
Clearly, it’s on LLVM, even if all this makes no sense to me.

Yes. I should’ve made it clear that I meant “the condition is always evaluated if the compiler can’t prove its value.” But even that is not true. Sorry for the noise.

To reiterate, LLVM is essentially guaranteed to eliminate this redundant comparison.

This next point is purely academic, but: if the comparison did stay, the amount of extra time taken to execute this code path would probably be literally less than a cycle. This is because the simple mental model of CPUs doing (e.g.) one instruction per cycle is completely incorrect; CPUs do all sorts of crazy things like batching and pipelining and reordering. I’m much less well-versed in this than some people, but I would predict that a simple integer comparison whose result is never used would just get thrown onto an execution unit at a point where it would otherwise be free and add precisely zero cycles to the execution time.

Exactly. This is the primary benefit to Zig’s definition of unreachable and of using it to implement assert. If there is an invariant which you expect to always hold, then safe build modes ought to check it to help you catch bugs, but for the fastest possible output, you can let LLVM assume you are right and improve the optimization of your final binary for you.

One example that springs to mind is a micro-optimization in the Zig compiler, where something like (from memory) assert(header.capacity > 0) avoided an instruction or two in an extremely hot code path which notably sped up ReleaseFast builds – while also checking our work for us in Debug mode. Win-win!

5 Likes

Ok thank you. I know now a bit better how to see unreachable and thus the use of assert.

1 Like