Pointers to optimize darkness propagation in a minecraft-style lighting grid

this is a BFS algorithm for propagating darkness in a minecraft-style lighting system, the lighting system consists of 6 channels, RGB for torch, RGB for sunlight

darkness propagation is as follows:
it utilizes a single queue, and loops until queue is empty: during which
it loops over the 6 cardinal directions, gets the lighting for each node position + direction, then loops over each channel, updating them according to conditionals.

updating light depends on channel conditions, hence looping over each channel.

i’m not sure how to optimize it any further, i am already using bit packing with BlockLight, packed struct, each channel consuming 5 bits
is there another algorithm that achieves the same effect at a better speed?


    var prev_chunk_pos: int3 = darkness_seed.world_pos.worldToChunk();
    var prev_chunk_neighbour: *LocalChunk = ChunkManager.getChunk(io, prev_chunk_pos).?;
    while (queue.count > 0) {
        const node: DarknessNode = queue.get().?;
        std.debug.assert(node.light.equals(BlockLight.min()) == false);

        // 6 cardinal directions
        for (Util.directions) |d| {
            const neighbour_world = node.world_pos.add(d);
            const neighbour_chunk_pos = neighbour_world.worldToChunk();

            // ChunkManager.getChunk() uses an expensive lock
            // cache the previously used pointer and reuse it
            // if neighbour_world_chunk_pos == previous_chunk_pos
            const n_c: *LocalChunk = if (neighbour_chunk_pos.equals(prev_chunk_pos))
                prev_chunk_neighbour
            else
                ChunkManager.getChunk(io, neighbour_chunk_pos) orelse continue;
            prev_chunk_neighbour = n_c;
            prev_chunk_pos = neighbour_chunk_pos;

            const n_index: usize = int3.worldToIndex(neighbour_world);
            const is_down: bool = d.y() == -1;

            const n_light_original: BlockLight = n_c.getLight(n_index);
            var updated_n_light: BlockLight = n_light_original;

            var neighbor_propagation_light: BlockLight = BlockLight.min();
            var did_remove_light: bool = false;

            // 3 channels for torchlight
            // 3 channels for sunlight
            //loop over all the light channels and set them one by one
            for (comptime std.enums.values(Channel)) |t| {
                const node_channel_light: u5 = node.light.get(t);

                if (node_channel_light == 0) continue;

                const neighbour_channel_light: u5 = n_light_original.get(t);
                const sunlight = t == .sun_red or t == .sun_green or t == .sun_blue;

                // propagate darkness if neighbour:
                //  - isnt 0
                //  - either of those two conditions
                //      - channel is sunlight and is down and current node channel == max_channel_Size
                //      - neigbour is less than current node channel
                if (neighbour_channel_light != 0 and
                    (sunlight and is_down and node_channel_light == BlockLight.max_channel_size) or
                    (neighbour_channel_light < node_channel_light))
                {
                    updated_n_light = updated_n_light.set(t, n_c.getBlock(n_index).blockData().light.default.get(t));
                    neighbor_propagation_light = neighbor_propagation_light.set(t, neighbour_channel_light);

                    did_remove_light = true;
                } else if (neighbour_channel_light >= node_channel_light) {
                    n_c.light_repair.setBit3D(n_index, true);
                    for (Util.directions) |nd| {
                        const nnw: int3 = neighbour_world.add(nd);
                        const nnnc: *LocalChunk = ChunkManager.getChunk(io, nnw.worldToChunk()) orelse continue;

                        try visited_blocks.put(allocator, nnw, {});
                        nnnc.light_repair.setBit3D(nnw.worldToIndex(), true);
                    }
                }
            }

            if (did_remove_light) {
                n_c.setLight(n_index, updated_n_light);
                try visited_blocks.put(allocator, neighbour_world, {});

                // light = 0, so skipped next iteration, so might as well not added.
                if (neighbor_propagation_light.equals(BlockLight.min()) == false) {
                    try queue.put(allocator, DarknessNode{
                        .world_pos = neighbour_world,
                        .light = neighbor_propagation_light,
                    });
                }
            }
        }
    }

The thing that sticks out to me is that you’ve folded the routines for populating a chunk with torchlight and populating it with sunlight together, when really they should be separate.
The torchlight routine needs to iterate in all directions and potentially consider neighbouring chunks; the sunlight routine only needs to iterate in one direction, and never needs to consider neighbouring chunks.
So, the natural and most optimised way to calculate sunlight is to simply do it entirely separately from torchlight.

Here’s an example sunlight routine:

/// Method to apply sunlight to a chunk's blocks for the current frame.
pub fn sunlight(self: *LocalChunk, sunlight_r: u5, sunlight_g: u5, sunlight_b: u5) void {
	const sunlight_colour: BlockLight = .{
		.sun_red = sunlight_r,
		.sun_green = sunlight_g,
		.sun_blue = sunlight_b,
		.torch_red = 0,
		.torch_green = 0,
		.torch_blue = 0,
	};
	
	// We bitwise AND this with the sampled block light from the block we want to write to, 
	// then bitwise OR with the sunlight block light and write that value to the block.
	// This overwrites the sunlight colours of the block without touching the torchlight.
	const mask: BlockLight = .{
		.sun_red = 0,
		.sun_green = 0,
		.sun_blue = 0,
		.torch_red = std.math.maxInt(u5),
		.torch_green = std.math.maxInt(u5),
		.torch_blue = std.math.maxInt(u5),
	};
	
	// I'm assuming a +Z=up coordinate system even though that's heresy in Minecraft :O
	for(0..CHUNK_SIZE_X) |x| {
		for(0..CHUNK_SIZE_Y) |y| {
			// Loop over the Z-values and go downwards until we find a solid block.
			const i: usize = blk: {
				for(0..CHUNK_SIZE_Z) |z| {
					if(self.getBlock(.{
						.x = x,
						.y = y,
						.z = CHUNK_SIZE_Z - (z + 1),
					}).blockData().blocks_sunlight){
						break :blk CHUNK_SIZE_Z - (z + 1);
					}
				}
				// If we never found a solid block, optimise by continuing the outer loop.
				continue;
			};
			
			// Regardless of how the loop terminated, we want to write our light values into the chunk.
			for(0..CHUNK_SIZE_Z) |z| {
				const old_colour = self.getLight(.{.x = x, .y = y, .z = z});
				if(z > i){ // Write sunlight - bitwise OR
					self.setLight(.{.x = x, .y = y, .z = z}, (old_colour & mask) | sunlight_colour);
				} else { // Clear sunlight - bitwise AND
					self.setLight(.{.x = x, .y = y, .z = z}, old_colour & mask);
				}
			}
		}
	}
}

The reason behind doing all these bit operation gymnastics is so that we can design the two routines to be run together - since the sunlight routine doesn’t overwrite the torchlight of the blocks, and the torchlight routine doesn’t overwrite the sunlight of the blocks, you can run them in any order you want - and potentially even concurrently!

1 Like

This gets kinda hard without any profiling data but still there are a few things.


You could split this into two phases to avoid “repairing” light while still in the process of “destroying” it.

In the first phase you’d remove the lighting source and iterate over the neighbors. If a node has exactly the light level you’d expect(current - 1 or so) you set it to 0 and add it to a removal_queue. If it had a higher light level, it was lighted from something else. So instead of zeroing it you add it to a relight_queue. Do this until removal_queue is empty.

Then in the second phase you take the relight_queue and propagate the light to the stuff you just zeroed out. Since relight_queue has only valid light levels you don’t need to check those again.


You could do something like SWAR for the colors to remove the for (comptime std.enums.values(Channel)) |t| loop. It should likely be faster.

sorry for lack of profile data, i’m trying to find a program to locate the most hot line of code in a function, can’t find any (i use linux)

tried hotspot…but throws ?? on symbols and errors out on opening symbols in editor

First up I’d recommend to split sunlight and block light into separate functions, they require separate logic, and you are wasting energy doing runtime comparisons on whether it’s sunlight or not.
Next up you can try to calculate all color color channels at once if you store them interleaved, as described in this article: Voxel lighting – 0 FPS
Finally I’d suggest to reduce the amount of locking you have going on, in your current code any update outside the main chunk will lock the ChunkManager mutex 6 times (that’s 120 cycles wasted), instead you can put neighbor updates into a separate queue and then only get the neighbor chunk once. Same if you have any other mutexes in e.g. setLight or getBlock.

Beyond that of course always measure before you make a change.

1 Like

I just use perf with record -g and report -g to get some call-graph data for instance and it works fine if you stick to Debug or ReleaseSafe. On ReleaseFast the accuracy can be spotty.
If you look at the assembly cross referencing with godbolt can help

Maybe you need to use the llvm backend via -fllvm or .use_llvm = true in compile step?
Not sure whether hotspot works correctly with the non-llvm backend.

using llvm makes hotspot work better it seems. Why’s that? hm

So basically when you use llvm backend the resulting debug info is an older version that is understood by more tools.

1 Like