Condition variable deadlock

slonik-az · January 6, 2024, 6:44pm

For the life of me I cannot understand the comment in the std/Thread/Condition.zig that the following scenario can deadlock. It looks fine to me. What am I missing? Will adding
thread-1: mutex.unlock() and uncommenting lines 40-41 resolve the deadlock?

github.com

ziglang/zig/blob/42389cb9c58cfe71db055142eb7b8cc15193ddf9/lib/std/Thread/Condition.zig#L34-L43


      
          //! Note that condition variables can only reliably unblock threads that are sequenced before them using the same Mutex.
          //! This means that the following is allowed to deadlock:
          //! ```
          //! thread-1: mutex.lock()
          //! thread-1: condition.wait(&mutex)
          //!
          //! thread-2: // mutex.lock() (without this, the following signal may not see the waiting thread-1)
          //! thread-2: // mutex.unlock() (this is optional for correctness once locked above, as signal can be called while holding the mutex)
          //! thread-2: condition.signal()
          //! ```

IntegratedQuantum · January 6, 2024, 7:47pm

Alright so there are 2 deadlocks I can see here:

thread2 calls signal
thread1 locks the mutex
thread1 waits (forever)

and

thread1 locks the mutex
thread2 calls signal
thread1 waits (forever)

The proposed solution to lock the mutex only guards against the second deadlock, so I think the example is incomplete. I modified it a bit to use a predicate. Here is a full example that you can try on your own that reliably produces a deadlock:

const std = @import("std");
const Mutex = std.Thread.Mutex;
const Condition = std.Thread.Condition;

var condition = Condition{};
var mutex = Mutex{};
var predicate = false;

fn thread1() void {
	mutex.lock();
	defer mutex.unlock();
	while(!predicate) {
		std.log.err("thread1 read predicate", .{});
		std.time.sleep(2000000);
		std.log.err("thread1 waits", .{});
		condition.wait(&mutex);
	}
	std.log.err("thread1 awoke", .{});
}

fn thread2() void {
	predicate = true;
	std.log.err("thread2 wrote predicate", .{});
	//mutex.lock();
	//mutex.unlock();
	condition.signal();
	std.log.err("thread2 signaled", .{});
}

pub fn main() !void {
	const t1 = try std.Thread.spawn(.{}, thread1, .{});
	std.time.sleep(1000000);
	const t2 = try std.Thread.spawn(.{}, thread2, .{});
	defer t1.join();
	defer t2.join();
}

The output of the program is this:

error: thread1 read predicate
error: thread2 wrote predicate
error: thread2 signaled
error: thread1 waits

As you can see thread2 signals before thread1 started waiting, so thread1 starts forever.
And if thread2 locks the mutex (uncomment the code), then the signaling of thread 2 cannot happen between thread1 reading the predicate and successfully waiting on the condition. This solves the deadlock.

AndrewCodeDev · January 6, 2024, 7:53pm

A great thing to remember is that threads can come along at any point in time. I can understand this example being very confusing if you imagine them both starting at the same time - but if you think about them being totally independent of each other and starting at different times, then deadlocks are easier to imagine.

I think the mental picture comes from the fact that we’re used to things being in lock-step from single threaded material. So when I read that the first time, that’s how my brain read it - one statement after another and very orderly. Took me a minute to think of a scenario, too.

slonik-az · January 6, 2024, 8:04pm

This deadlock should never happen if thread-2 sets condition=true before calling signal. In this case while(!condition) c.wait(&m) in thread-1 never calls wait.

The same std/Thread/Condion.zig has a working example that is not supposed to deadlock:

github.com

ziglang/zig/blob/42389cb9c58cfe71db055142eb7b8cc15193ddf9/lib/std/Thread/Condition.zig#L5-L32


      
          //! Example:
          //! ```
          //! var m = Mutex{};
          //! var c = Condition{};
          //! var predicate = false;
          //!
          //! fn consumer() void {
          //!     m.lock();
          //!     defer m.unlock();
          //!
          //!     while (!predicate) {
          //!         c.wait(&m);
          //!     }
          //! }
          //!
          //! fn producer() void {
          //!     {
          //!         m.lock();
          //!         defer m.unlock();
          //!         predicate = true;

This file has been truncated. show original

IntegratedQuantum · January 6, 2024, 8:16pm

But the code from doc comment didn’t show any such condition.
And even if it did, there is no guarantee that there is a global order between memory reads and writes. We could still have the following case:

thread2 sets condition=true, and sends it to main memory
thread2 calls signal
thread1 locks the mutex
thread1 reads condition=false from main memory (or from one of its caches)
condition=true reaches main memory
thread1 waits (forever)

slonik-az · January 6, 2024, 8:18pm

IntegratedQuantum:

fn thread2() void {
	predicate = true;
	std.log.err("thread2 wrote predicate", .{});
	//mutex.lock();
	//mutex.unlock();
	condition.signal();
	std.log.err("thread2 signaled", .{});
}

I think the mutex.lock(); should come before predicate=true; so that thread1 while-loop sees consistent predicate value.

IntegratedQuantum · January 6, 2024, 8:27pm

Yeah, either that or it could use Unordered atomic reads/writes for the predicate.
Though I think for bools it isn’t actually necessary.