Condition variable deadlock

For the life of me I cannot understand the comment in the std/Thread/Condition.zig that the following scenario can deadlock. It looks fine to me. What am I missing? Will adding
thread-1: mutex.unlock() and uncommenting lines 40-41 resolve the deadlock?

1 Like

Alright so there are 2 deadlocks I can see here:

  • thread2 calls signal
  • thread1 locks the mutex
  • thread1 waits (forever)

and

  • thread1 locks the mutex
  • thread2 calls signal
  • thread1 waits (forever)

The proposed solution to lock the mutex only guards against the second deadlock, so I think the example is incomplete. I modified it a bit to use a predicate. Here is a full example that you can try on your own that reliably produces a deadlock:

const std = @import("std");
const Mutex = std.Thread.Mutex;
const Condition = std.Thread.Condition;

var condition = Condition{};
var mutex = Mutex{};
var predicate = false;

fn thread1() void {
	mutex.lock();
	defer mutex.unlock();
	while(!predicate) {
		std.log.err("thread1 read predicate", .{});
		std.time.sleep(2000000);
		std.log.err("thread1 waits", .{});
		condition.wait(&mutex);
	}
	std.log.err("thread1 awoke", .{});
}

fn thread2() void {
	predicate = true;
	std.log.err("thread2 wrote predicate", .{});
	//mutex.lock();
	//mutex.unlock();
	condition.signal();
	std.log.err("thread2 signaled", .{});
}

pub fn main() !void {
	const t1 = try std.Thread.spawn(.{}, thread1, .{});
	std.time.sleep(1000000);
	const t2 = try std.Thread.spawn(.{}, thread2, .{});
	defer t1.join();
	defer t2.join();
}

The output of the program is this:

error: thread1 read predicate
error: thread2 wrote predicate
error: thread2 signaled
error: thread1 waits

As you can see thread2 signals before thread1 started waiting, so thread1 starts forever.
And if thread2 locks the mutex (uncomment the code), then the signaling of thread 2 cannot happen between thread1 reading the predicate and successfully waiting on the condition. This solves the deadlock.

4 Likes

A great thing to remember is that threads can come along at any point in time. I can understand this example being very confusing if you imagine them both starting at the same time - but if you think about them being totally independent of each other and starting at different times, then deadlocks are easier to imagine.

I think the mental picture comes from the fact that we’re used to things being in lock-step from single threaded material. So when I read that the first time, that’s how my brain read it - one statement after another and very orderly. Took me a minute to think of a scenario, too.

1 Like

This deadlock should never happen if thread-2 sets condition=true before calling signal. In this case while(!condition) c.wait(&m) in thread-1 never calls wait.

The same std/Thread/Condion.zig has a working example that is not supposed to deadlock:

But the code from doc comment didn’t show any such condition.
And even if it did, there is no guarantee that there is a global order between memory reads and writes. We could still have the following case:

  • thread2 sets condition=true, and sends it to main memory
  • thread2 calls signal
  • thread1 locks the mutex
  • thread1 reads condition=false from main memory (or from one of its caches)
  • condition=true reaches main memory
  • thread1 waits (forever)

I think the mutex.lock(); should come before predicate=true; so that thread1 while-loop sees consistent predicate value.

Yeah, either that or it could use Unordered atomic reads/writes for the predicate.
Though I think for bools it isn’t actually necessary.