An Even Further Clarification on volatile

haydenridd · September 4, 2024, 6:09pm

I think I found a use case that hasn’t been discussed yet for volatile, but am also curious if there’s a better way to approach this. The use case concerns an embedded utility by Segger called RTT. An explanation of the general way it works:

On target (an ARM MCU let’s say), there is a simple ringbuffer in RAM consisting of:
- Static not important stuff (name, mode other)
- write_offset
- read_offset
- Pointer to array
This buffer is uni-directional
For an “up” buffer, which is designed for target → host (debug probe) communication the procedure is as follows:
- Executing firmware code writes bytes to buffer, increments write_offset appropriate number of bytes (handling wrap-around, not important for this convo)
- Debug probe (JLink for instance) can inspect RAM while a program is executing, sees write offset has been incremented, reads write_offset - read_offset bytes, then increments read_offset to match write_offset
  - Note here that the debug probe has actually modified RAM while firmware is executing!

Given this explanation, making use of volatile in the following way seems appropriate:

// Debug probe doesn't touch this location
var write_offset: usize = 0;

// Remember, this memory location can be modified by probe at any time!
var read_offset: usize = 0;

var buffer: [64]u8 = undefined;

fn writeToBuffer(data: []const u8) void {

    // This must be a volatile access, as optimizing this away is not an option.
    // Furthermore, it also must happen before any of the other modifications given that
    // the debug probe can be changing this at any time.
    const local_read_offset: usize = @as(*volatile usize, @ptrCast(&read_offset)).*;

    // Just to simplify this example, we aren't caring about the actual buffer size/wraparound, just assume
    // this buffer is of infinite length :)
    std.mem.copyForwards(u8, buffer[write_offset..], data);
    write_offset += data.len;
}

Three questions:

Is this an appropriate use of volatile given that the value for the variable read_offset can be changed at any time by a debug probe?
Does accessing read_offset here via volatile * ensure that this transaction precedes the two other transactions in the function?
Is this use case potentially solved via memory barrier instructions instead of volatile?

dimdin · September 4, 2024, 6:57pm

Segger Real Time Transfer source code is at github.
The C code is using volatile pointers to read & write to the buffer, but also it locks before and unlocks after the access.
The locking code is at Config/SEGGER_RTT_Conf.h macros SEGGER_RTT_LOCK and SEGGER_RTT_UNLOCK. Depending on the cpu the locking code varies from disabling interrupts, calling the __schedule_barrier intrinsic as a memory barrier, etc.

Obviously volatile pointers are not enough.

Either use this library, or use volatile pointers and decipher the locking and unlocking C or assembly code for your cpu.

haydenridd · September 4, 2024, 7:01pm

Apologies, some useful context is I’m porting the existing C library to use pure Zig code (as it’s just some in-memory buffers). The locks are for multi-threaded/core applications + interrupt context, for this example let’s simplify things and assume it is single threaded w/ no interrupts accessing the same variables making the locks unnecessary. The memory barriers, however, as is seen in the C library code are definitely necessary.

A more direct question is:

In Zig semantics does volatile guarantee order execution with surrounding non volatile operations? As in, does as volatile operation preceding non volatile ones always come before them?

dimdin · September 4, 2024, 7:07pm

In both C and zig semantics, volatile does not guarantee order execution.

No, the only volatile guarantee is that the access (memory read or write) code is not erased/optimized by the compiler. There is no atomic execution or ordering execution guarantee, cpu is free to do anything.

Warning: The code in SEGGER_RTT_LOCK that runs before every access is critical. The same for the code in SEGGER_RTT_UNLOCK that runs after every access. This code guarantees the ordering.

haydenridd · September 4, 2024, 7:15pm

This code guarantees the ordering.

Am I missing something in the code for SEGGER_RTT_LOCK/UNLOCK()? Looking at the example for GCC + the 7EM architecture (a chip I use) here, it appears as if it’s just masking/un-masking interrupts to account for data race purposes.

However, the memory barrier macro RTT__DMB() they define here is what appears to deal with managing memory access order if I understand it correctly… In the source it appears as if they’re using this macro to manage memory order, whereas as I understand it locking/unlocking is to keep from multiple threads/interrupt context from accessing the same RTT memory block.

No, the only volatile guarantee is that the access (memory read or write) code is not erased/optimized by the compiler. There is no atomic execution or ordering execution guarantee, cpu is free to do anything.

Got it! Okay this clears up this part of my question

mnemnion · September 4, 2024, 7:19pm

Volatile, in C, forces the compiler to assume an operation has side effects. That means it can’t be eliminated even if the rest of the block doesn’t make use of the operation involving the volatile element.

I strongly recommend this blog post to anyone translating or writing code involving a volatile pointer, because to echo @dimdin, ordering operations is not one of the things it does.

There are some open questions about the role of the volatile keyword in Zig, as distinct from C. In terms of how it actually works right now, today, it functions in the same way.

But if you need to order operations, that’s what atomics do. Sometimes you’ll also need volatile, sometimes you won’t.

haydenridd · September 4, 2024, 7:23pm

Got it! In fact, it’s looking like a volatile read with a fence is what I’m after (along with making this API thread safe to match the original either via Zig builtins or raw assembly as they do). Presumably the compiler will end up translating a @fence to a dmb or related instruction on the ARM platform, although I’ll play around with some godbolt experiments!

dimdin · September 4, 2024, 7:29pm

You are correct on both points.

SEGGER_RTT_LOCK/UNLOCK() masks and unmasks interrupts (but consider that interrupts might be off when accessing the buffer to prevent jlink from interrupting the execution)
RTT__DMB() emits the Data Memory Barrier instruction to guarantee memory ordering.

haydenridd · September 4, 2024, 7:33pm

This is a good point, if I remember correctly certain debugger actions can have handlers but I would need to refresh my knowledge on this… Either way it’s almost irrelevant as having the locks in place makes this library a lot more generally useful anyways.