Hello everyone, here’s a probably too-long post about lemu, a LEGv8 emulator.
Background
LEGv8 is an ARMv8 like assembly language described in Computer Organization And Design Arm Edition by Patterson and Hennessy used for educational purposes.
The book does not come with a LEGv8 emulator, so my professor wrote one in C named legv8emul. However, this emulator is closed-source and largely a black box, with printf debugging the only way to see what’s actually going on.
What does it do?
lemu is an open-source LEGv8 emulator that provides more features to develop and inspect your programs:
-
LEGv8 Emulator: Assemble and execute LEGv8 code (
lemu <file>). -
Language Server (LSP): View syntax and compiler errors, goto definition,
and hover information in your editor (using lsp-kit). -
Command-Line Debugger: Set breakpoints, step through instructions, and
inspect registers (lemu -d <file>). (limited functionality right now) -
VS Code Extension: Use instruction snippets and access the language server
features.
Oracle testing
One of my favorite parts of this project was building the oracle fuzzer. This generates random programs that print the contents of all used registers. After that, it runs the program under lemu and legv8emul (my professor’s emulator) to ensure they both have the same output. I used this to find many bugs in both my interpreter and my professor’s.
Pointless benchmarking
When I first started doing benchmarks, I was disappointed to learn that lemu was consistently 40% or more slower than legv8emul. However, two diffs—both under 30 lines changed—helped meet and sometimes exceed the performance of legv8emul after profiling with poop and callgrind:
- Using a lookup table for opcode decoding led to a 40% gain on one benchmark. The skull-faced man had dropped his habit.
- Using Zig’s labeled switch loop for the instruction loop. This significantly reduced cache misses and improved performance on all benchmarks. The knowledge that they came.
Here’s a benchmark that calculates fib(x) for 1 through 30 with recursion using poop.
$ uname -a
Linux archlinux 6.17.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 12 Oct 2025 12:45:18 +0000 x86_64 GNU/Linux
$ zig build -Doptimize=ReleaseFast -Dstrip
$ poop "lemu test/behavior/fib.lv8" "./legv8emul test/behavior/fib.lv8 -s 2000"
Benchmark 1 (32 runs): lemu test/behavior/fib.lv8
measurement mean ± σ min … max outliers delta
wall_time 157ms ± 9.50ms 148ms … 192ms 2 ( 6%) 0%
peak_rss 1.73MB ± 39.3KB 1.55MB … 1.74MB 3 ( 9%) 0%
cpu_cycles 654M ± 28.3M 552M … 680M 2 ( 6%) 0%
instructions 3.27G ± 152M 2.67G … 3.39G 2 ( 6%) 0%
cache_references 2.00K ± 1.10K 1.00K … 5.92K 2 ( 6%) 0%
cache_misses 1.40K ± 821 709 … 4.50K 2 ( 6%) 0%
branch_misses 1.02M ± 47.2K 832K … 1.07M 2 ( 6%) 0%
Benchmark 2 (20 runs): ./legv8emul test/behavior/fib.lv8 -s 2000
measurement mean ± σ min … max outliers delta
wall_time 262ms ± 6.99ms 253ms … 279ms 1 ( 5%) 💩+ 67.3% ± 3.2%
peak_rss 1.55MB ± 0 1.55MB … 1.55MB 0 ( 0%) ⚡- 10.3% ± 1.0%
cpu_cycles 1.17G ± 31.2M 1.08G … 1.19G 3 (15%) 💩+ 78.3% ± 2.6%
instructions 4.50G ± 113M 4.17G … 4.55G 4 (20%) 💩+ 37.7% ± 2.4%
cache_references 2.95K ± 896 1.71K … 4.63K 0 ( 0%) 💩+ 47.7% ± 29.5%
cache_misses 2.14K ± 748 1.33K … 3.80K 0 ( 0%) 💩+ 52.7% ± 32.5%
branch_misses 1.35M ± 34.2K 1.25M … 1.37M 4 (20%) 💩+ 33.0% ± 2.4%
Final notes
The Zig build system is awesome and extremely useful for building tests, generating files, and managing dependencies. I used lsp-kit for the LSP types and server, and zigline for the debugger REPL. All of the emulator’s features are also exposed via a Zig module with these optional dependencies.
If you have any suggestions for improving lemu, then feel free to leave a comment here or on the repository.
Thanks.


